Malware Data Science

Malware Data Science

Attack Detection and Attribution
by Joshua Saxe with Hillary Sanders
September 2018, 272 pp.

Download Chapter 6: Understanding Machine Learning-Based Malware Detectors
Download the code for the book from its dedicated website

Security has become a "big data" problem. The growth rate of malware has accelerated to tens of millions of new files per year while our networks generate an ever-larger flood of security-relevant data each day. In order to defend against these advanced attacks, you'll need to know how to think like a data scientist.

In Malware Data Science, security data scientist Joshua Saxe introduces machine learning, statistics, social network analysis, and data visualization, and shows you how to apply these methods to malware detection and analysis.

You'll learn how to:

  • Analyze malware using static analysis
  • Observe malware behavior using dynamic analysis
  • Identify adversary groups through shared code analysis
  • Catch 0-day vulnerabilities by building your own machine learning detector
  • Measure malware detector accuracy
  • Identify malware campaigns, trends, and relationships through data visualization

Whether you're a malware analyst looking to add skills to your existing arsenal, or a data scientist interested in attack detection and threat intelligence, Malware Data Science will help you stay ahead of the curve.

Author Bio 

Joshua Saxe is Chief Data Scientist at major security vendor, Sophos, where he leads a security data science research team. He's also a principal inventor of Sophos' neural network-based malware detector, which defends tens of millions of Sophos customers from malware infections. Before joining Sophos, Joshua spent 5 years leading DARPA funded security data research projects for the US government.

Hillary Sanders leads the infrastructure data science team at Sophos, which develops the frameworks used to build Sophos' deep learning models. Before joining Sophos, Hillary created a recipe web app and spent three years as a data scientist at Premise Data Corporation.

Table of contents 

Chapter 1: Basic Static Malware Analysis
Chapter 2: Beyond Basic Static Analysis: x86 Disassembly
Chapter 3: A Brief Introduction to Dynamic Analysis
Chapter 4: Identifying Attack Campaigns Using Malware Networks
Chapter 5: Shared Code Analysis
Chapter 6: Understanding Machine Learning-Based Malware Detectors
Chapter 7: Evaluating Malware Detection Systems
Chapter 8: Building Machine Learning Detectors
Chapter 9: Visualizing Malware Trends
Chapter 10: Deep Learning Basics
Chapter 11: Building a Neural Network Malware Detector with Keras
Chapter 12: Becoming a Data Scientist
Appendix: An Overview of Datasets and Tools

View the detailed Table of Contents
View the Index


"For those looking to become a security data scientist, or just wanting to get a comprehensive understanding of how to use data science to deal with malicious software, Malware Data Science: Attack Detection and Attribution is a superb reference to help you get there."
—Ben Rothke, RSA Conference

"If you are new to data science or machine learning, this book provides an excellent introduction to these topics."
DMFR Security

“This is a book every information security professional should consider reading due to the rapid growth and variation of malware and the increasing reliance upon data science to defend information systems.”
The Ethical Hacker

Extra Stuff 

Download the code for the book from its dedicated website.

Check out Joshua Saxe's interview about AI on the Lock and Code podcast.