Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis

Qing Hou; Guangjian Wang; Xiaozheng Wang; Jiaxi Xu; Yang Xin

doi:10.30564/jcsr.v2i1.1808

Authors

Qing Hou Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China
Guangjian Wang Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China
Xiaozheng Wang Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China
Jiaxi Xu Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China
Yang Xin Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China

DOI:

https://doi.org/10.30564/jcsr.v2i1.1808

Abstract

Big data analysis has penetrated into all fields of society and has brought about profound changes. However, there is relatively little research on big data supporting student management regarding college and university’s big data. Taking the student card information as the research sample, using spark big data mining technology and K-Means clustering algorithm, taking scholarship evaluation as an example, the big data is analyzed. Data includes analysis of students’ daily behavior from multiple dimensions, and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism, votes of teachers and students, etc. At the same time, students’ absenteeism, physical health and psychological status in advance can be predicted, which makes student management work more active, accurate and effective.

Keywords:

Spark; Clustering algorithm; Big data; Data analysis; Mllib

References

[1] Yihua Huang. Understanding Big Data[M]. China Machine Press, 2014.

[2] Meiling Huang. Spark MLlib Machine Learning: Algorithm, Source Code and Actual Combat Details[M]. Publishing House of Electronics Industry, 2016. (in Chines)

[3] Aiwu Zhou, Dandan Cui, Yong Pan. An Optimization Initial Clustering Center of K-means Clustering Algorithm[J]. Microcomputer and Its Applications, 2011, 30(13): 1-3.

[4] Weizhong Zhao, Huifang Ma, Yanxiang Fu, et al. Research on Parallel K-means Algorithm Design Based on Hadoop Platform[J]. Computer Science, 2011(10): 166-168.

[5] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.

[6] Jianpei Zhang, Yue Yang, Jing Yang, et al. Algorithm for Initialization of K-Means Clustering Center Based on Optimized-Division[J]. Journal of System Simulation, 2009, 21(9): 2586-2589.

[7] The Apache Software Foundation. Apache Mahout: Scalable Machine Learning and Data Mining [EB/ OL], 2014.

[8] F Wang, Z Liu. Optimization method of distributed K-means algorithm based on Spark. Computer Engineering and Design, 2019; 40(6): 1595-1600. DOI: https://doi.org/10.16208/j.issn1000-7024.2019.06.017

[9] Y Qu, W Deng, F Hu, et al. Algorithm for ordering points to identify clustering structure based on spark. Computer Science, 2018; 45(1): 97-102+107. DOI: https://doi.org/10.11896/j.issn.1002-137X.2018.01.015

[10] M Xu, C Yu, H Shen. Research on K-means algorithm of spark parallelization. Microelectronics & Computer, 2018, 35(5): 95-99.

[11] Liu P, Teng J, Zhang G, et al. Parallel K-means algorithm for massive texts on spark. The 2nd CCF Big Data Conference, 2014. (in Chinese). Available from: http://mahout.apache.org/

Volume 7 | 2025

Vol.7 Iss.3

Vol.7 Iss.2

Vol.7 Iss.1

Volume 6 | 2024

Vol.6 Iss.4

Vol.6 Iss.3

Vol.6 Iss.2

Vol.6 Iss.1

Volume 5 | 2023

Vol.5 Iss.4

Vol.5 Iss.3

Vol.5 Iss.2

Vol.5 Iss.1

Volume 4 | 2022

Vol.4 Iss.4

Vol.4 Iss.3

Vol.4 Iss.2

Vol.4 Iss.1

Volume 3 | 2021

Vol.3 Iss.4

Vol.3 Iss.3

Vol.3 Iss.2

Vol.3 Iss.1

Volume 2 | 2020

Vol.2 Iss.4

Vol.2 Iss.3

Vol.2 Iss.2

Vol.2 Iss.1

Volume 1 | 2019

Vol.1 Iss.3

Vol.1 Iss.2

Vol.1 Iss.1

Announcements

Congratulations! Prof. Jerry Chun-Wei Lin was selected as World's Top 2% Scientists 2023!

Member of ALPSP

Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis

Authors

DOI:

Abstract

Keywords:

References

Downloads

How to Cite

Issue

Article Type

License