In this post we're going to look at applying a class of simple unsupervised machine learning algorithms called clustering. Specifically we're going to be looking at two types of clustering apporaches called Hierarchical Agglomerative Clustering and K-Means Clustering using the Scikit-learn framework in Python. The dataset that we're going to analyze contains demand data for a large number of Part Numbers belonging to several product families. The intent is to see how we can use the nature of the demand as a feature to help us create product families.

What's Next?

We've reached the end of this blog post having explored two clustering approaches in trying to solve a supply chain business case. In the process, we performed some data wrangling and preprocessing to prepare the data for clustering and also briefly looked at different ways of improving the clustering results. This analysis is by no means complete and you could explore different evaluation methods to see how effective the clusters that we've created are. The clusters are nothing but product families and the next step in the analysis pipleine might be forecasting demand for these families. So a natural question that arises is how does the type of clustering algorithm affect the 'forecastability' of your product families? Let me know what you think!


Ben Alex Keen On Feature Scaling
George Seif On Clustering
K-Means Clustering
Hierarchical Agglomerative Clustering