The Shape of Data Cover

The Shape of Data

Geometry-Based Machine Learning and Data Analysis in R
by Colleen M. Farrelly and Yaé Ulrich Gaba
July 2023, 264 pp.

Download Chapter 4: NETWORK FILTRATION

Look Inside!

Shape of Data Back CoverShape of Data pages 8-9Shape of Data pages 26-27 Shape of Data pages 76-77Shape of Data pages 120-121Shape of Data pages 132-133Shape of Data pages 180-181

Whether you’re a mathematician, seasoned data scientist, or marketing professional, you’ll find The Shape of Data to be the perfect introduction to the critical interplay between the geometry of data structures and machine learning.

This book’s extensive collection of case studies (drawn from medicine, education, sociology, linguistics, and more) and gentle explanations of the math behind dozens of algorithms provide a comprehensive yet accessible look at how geometry shapes the algorithms that drive data analysis.

In addition to gaining a deeper understanding of how to implement geometry-based algorithms with code, you’ll explore:

  • Supervised and unsupervised learning algorithms and their application to network data analysis
  • The way distance metrics and dimensionality reduction impact machine learning
  • How to visualize, embed, and analyze survey and text data with topology-based algorithms
  • New approaches to computational solutions, including distributed computing and quantum algorithms
Author Bio 

Colleen M. Farrelly is a senior data scientist whose academic and industry research has focused on topological data analysis, quantum machine learning, geometry-based machine learning, network science, hierarchical modeling, and natural language processing. Since graduating from University of Miami with an MS in Biostatistics, Colleen has worked as a data scientist in a variety of industries, including health care, consumer packaged goods, biotech, nuclear engineering, marketing, and education. Colleen often speaks at tech conferences, including PyData, SAS Global, WiDS, Data Science Africa, and DataScience SALON. When not working, Colleen can be found writing haibun/haiga or doing any sort of water sport.

Yaé Ulrich Gaba completed his doctoral studies at the University of Cape Town (UCT, South Africa) with specialization in Topology and is presently a research associate at Quantum Leap Africa (QLA, Rwanda). His research interests are computational geometry, applied algebraic topology (topological data analysis), and geometric machine learning (graph and point-cloud representation learning). His current focus lies in geometric methods in data analysis, and his work seeks to develop effective and theoretically justified algorithms for data/shape analysis using geometric and topological ideas and methods.

Table of contents 

Chapter 1: The Geometric Structure of Data
Chapter 2: The Geometric Structure of Networks
Chapter 3: Network Analysis
Chapter 4: Network Filtration
Chapter 5: Geometry in Data Science
Chapter 6: Newer Applications of Geometry in Machine Learning
Chapter 7: Tools for Topological Data Analysis
Chapter 8: Homotopy Algorithms
Chapter 9: Final Project: Analyzing Text Data
Chapter 10: Multicore and Quantum Computing

View the Copyright page
View the detailed Table of Contents
View the Index



"The title says it all. Data is bound by many complex relationships not easily shown in our two-dimensional, spreadsheet filled world. The Shape of Data walks you through this richer view and illustrates how to put it into practice."
—Stephanie Thompson, Data Scientist and Speaker

The Shape of Data is a novel perspective and phenomenal achievement in the application of geometry to the field of machine learning. It is expansive in scope and contains loads of concrete examples and coding tips for practical implementations, as well as extremely lucid, concise writing to unpack the concepts. Even as a more veteran data scientist who has been in the industry for years now, having read this book I've come away with a deeper connection to and new understanding of my field."
—Kurt Schuepfer, Ph.D., McDonalds Corporation

“A great source for the application of topology and geometry in data science. Topology and geometry advance the field of machine learning on unstructured data, and The Shape of Data does a great job introducing new readers to the subject.”
—Uchenna “Ike” Chukwu, Senior Quantum Developer

"See how data looks not just as lists of numbers but as plots and graphs. The Shape of Data shows the reader how to visualize data sets and discover relations hidden in the numbers and sets. . . . In this age of large data sets and deep learning, data graphics are essential to scientists and engineers—just like this book."
—David S. Mazel, Principal/Manager Systems Engineer, Regulus-Group

"Everyone who works at the border of geometry and Data Science will find the book and invaluable resource and source of inspiration. It is considerate that the R-codes used in the book have readily accessible python codes. "
—Geoffrey Mboya, DPhil (Oxon), Director at Mfano Africa

"Comprehensive and exceptionally well written, The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R is impressively 'reader friendly' in organization and presentation, making it an ideal instructional resource for anyone with an interest in topology, computer hacking, or mathematical/statistical computer software."
—Midwest Book Review

Extra Stuff 

Download the Python code files here, the R code files here, and the data here.