Improving Fast Density Peak Clustering Using Mass Distance: m-FDPC

Authors

  • Mouloud Merbouche

    ISEP – Paris Institue of Digital Technologies, 75006 Paris, France

  • Célia Tso

    ISEP – Paris Institue of Digital Technologies, 75006 Paris, France

  • Jérémie Sublime

    ISEP – Paris Institue of Digital Technologies, 75006 Paris, France

DOI:

https://doi.org/10.30564/jeis.v7i2.11528
Received: 20 June 2025 | Revised: 13 August 2025 | Accepted: 23 August 2025 | Published Online: 30 August 2025

Abstract

In this work, we introduce m-FDPC, a mass-based variant of the Fast Density Peak Clustering (FDPC) algorithm, aimed at improving both performance and ease of use in unsupervised learning tasks. Traditional FDPC relies on Euclidean distance and requires careful parameter tuning and data normalization, which can significantly affect clustering outcomes—especially for heterogeneous or high-dimensional datasets. To address these challenges, m-FDPC replaces the conventional Euclidean metric with a mass-based distance measure derived from isolation forests, a method originally designed for anomaly detection. This substitution allows the algorithm to capture local data density and structure more naturally, while eliminating the need for normalization and simplifying the choice of key parameters such as cutoff distance and density thresholds. Comprehensive experiments on synthetic and real-world datasets demonstrate that m-FDPC not only matches or surpasses the performance of well-established clustering techniques such as DBSCAN, K-means, and Euclidean FDPC, but also offers greater robustness, scalability, and interpretability, particularly in high-dimensional or unevenly distributed data scenarios. Results evaluated through metrics like Matching Score and Silhouette Score confirm the algorithm’s superior ability to detect meaningful cluster structures with minimal user intervention. Overall, m-FDPC provides a more efficient, adaptive, and user-friendly framework for density-based clustering, making it a promising tool for diverse applications in data mining, anomaly detection, and exploratory data analysis.

References

[1] MacQueen, J.B., 1967. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1967; pp. 281–297.

[2] Ester, M., Kriegel, H.-P., Sander, J., et al., 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases With Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231.

[3] Rodriguez, A., Laio, A., 2014. Clustering by Fast Search and Find of Density Peaks. Science. 344(6191), 1492–1496. DOI: https://doi.org/10.1126/science.1242072

[4] Wang, S., Wang, D., Li, C., et al., 2016. Clustering by Fast Search and Find of Density Peaks With Data Field. Chinese Journal of Electronics. 25(3), 397–402.

[5] Liu, R., Wang, H., Yu, X., 2018. Shared-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks. Information Sciences. 450, 200–226. DOI: https://doi.org/10.1016/j.ins.2018.03.031

[6] Liu, F.T., Ting, K.M., Zhou, Z.-H., 2008. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422.

[7] Ting, K.M., Zhu, Y., Carman, M., et al., 2016. Overcoming Key Weaknesses of Distance-Based Neighbourhood Methods Using a Data Dependent Dissimilarity Measure. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1205–1214.

[8] Ahlawat, N., Awekar, A., 2022. Scaling Up Mass-Based Clustering. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 3781–3785.

[9] Floros, D., Liu, T., Pitsianis, N., et al., 2018. Sparse Dual of the Density Peaks Algorithm for Cluster Analysis of High-Dimensional Data. In Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA, 25–27 September 2018; pp. 1–14.

[10] Sieranoja, S., Fränti, P., 2019. Fast and General Density Peaks Clustering. Pattern Recognition Letters. 128, 551–558. DOI: https://doi.org/10.1016/j.patrec.2019.10.019

[11] Fukunaga, K., Hostetler, L., 1975. The Estimation of the Gradient of a Density Function, With Applications in Pattern Recognition. IEEE Transactions on Information Theory. 21(1), 32–40. DOI: https://doi.org/10.1109/TIT.1975.1055330

[12] Ankerst, M., Breunig, M.M., Kriegel, H., et al., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, USA, 1–3 June 1999; pp. 49–60.

[13] Wang, Y., Qian, J., Hassan, M., et al., 2023. Density Peak Clustering Algorithms: A Review on the Decade 2014–2023. Expert Systems with Applications. 238(7), 121860. DOI: https://doi.org/10.1016/j.eswa.2023.121860

[14] Du, M., Ding, S., Jia, H., 2016. Study on Density Peaks Clustering Based on k-Nearest Neighbors and Principal Component Analysis. Knowledge-Based Systems. 99, 135–145.

[15] Ahlawat, N., 2024. Isolation Forest Based Efficient Unsupervised Machine Learning Algorithms [PhD thesis]. Indian Institute of Technology Guwahati: Guwahati, India. pp. 1–150.

[16] Ling, D., Xiao, X., 2018. Mass-Based Density Peaks Clustering Algorithm. In Proceedings of the International Conference on Intelligent Information Processing, Nanning, China, 19–22 October 2018; pp. 40–48.

[17] Chen, L., 2009. Curse of Dimensionality. In: Liu, L., Özsu, M.T. (Eds.). Encyclopedia of Database Systems. Springer: Boston, MA, USA. pp. 545–546.

[18] Wang, J., Ji, C., Liu, F., et al., 2025. A Band Selection Approach Based on a Mass-Based Metric and Shared Nearest-Neighbours for Hyperspectral Images. IET Image Processing. 19(1), e70165. DOI: https://doi.org/10.1049/ipr2.70165

[19] Ting, K.M., Washio, T., Zhu, Y., et al., 2021. Breaking the Curse of Dimensionality With Isolation Kernel. arXiv preprint. arXiv:2109.14198. DOI: https://doi.org/10.48550/arXiv.2109.14198

[20] Bhattacharjee, P., 2024. Density-Based Mining Algorithms for Dynamic Data: An Incremental Approach [PhD thesis]. Indian Institute of Technology Guwahati: Guwahati, India. pp. 1–200.

[21] Bentley, J.L., 1975. Multidimensional Binary Search Trees Used for Associative Searching. Communications of the ACM. 18(9), 509–517. DOI: https://doi.org/10.1145/361002.361007

[22] Brown, R.A., 2015. Building a Balanced k-d Tree in O(kn log n) Time. Journal of Computer Graphics Techniques. 4(1), 50–68.

[23] Rousseeuw, P.J., 1987. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics. 20, 53–65. DOI: https://doi.org/10.1016/0377-0427(87)90125-7

[24] Davies, D.L., Bouldin, D.W., 1979. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1(2), 224–227. DOI: https://doi.org/10.1109/TPAMI.1979.4766909

[25] Vendramin, L., Campello, R.J.G.B., Hruschka, E.R., 2010. Relative Clustering Validity Criteria: A Comparative Overview. Statistical Analysis and Data Mining: The ASA Data Science Journal. 3(4), 209–235. DOI: https://doi.org/10.1002/sam.10080

[26] Ikotun, A.M., Habyarimana, F., Ezugwu, A.E., 2025. Cluster Validity Indices for Automatic Clustering: A Comprehensive Review. Heliyon. 11(2), e41953. DOI: https://doi.org/10.1016/j.heliyon.2025.e41953

[27] Akiba, T., Sano, S., Yanase, T., et al., 2019. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631.

Downloads

How to Cite

Merbouche, M., Tso, C., & Sublime, J. (2025). Improving Fast Density Peak Clustering Using Mass Distance: m-FDPC. Journal of Electronic & Information Systems, 7(2), 51–65. https://doi.org/10.30564/jeis.v7i2.11528