Clustering

Clustering is an unsupervised learning approach that "clusters" data points based on the type of clustering technique. There are many varieties of clustering out there, such as hierarchical and k-means.

Common Applications

Common Industries

  • Agriscience

  • Healthcare

  • Marketing

  • Tech/Social Media

Common Problem Types

  • Anomaly Identification

  • Market Segmentation

  • Genetic/Biological Analysis

  • Recommender Systems

Code Examples

All of the code examples are written in Python, unless otherwise noted.

Containers

These are code examples in the form of Jupyter notebooks running in a container that come with all the data, libraries, and code you’ll need to run it. Click here to learn why you should be using containers, along with how to do so.
#pull container, only needs to be run once
docker pull ghcr.io/thedatamine/starter-guides:k-means-clustering

#run container
docker run -p 8888:8888 -it ghcr.io/thedatamine/starter-guides:k-means-clustering

Need help implementing any of this code? Feel free to reach out to datamine-help@purdue.edu and we can help!