Apache Hadoop Architecture, Applications, and Hadoop Distributed File System


  • Pratit Raj Giri Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal
  • Gajendra Sharma Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal




The data and internet are highly growing which causes problems in management of the big-data. For these kinds of problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for the availability of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network. Hadoop Framework maintains fsImage and edits files, which supports the availability and integrity of data. This paper includes cases of Hadoop implementation, such as monitoring weather, processing bioinformatics.


Hadoop, FsImage, HDFS, Apache, MapReduce


[1] Personal Data: The Emergence of a New Asset Class. Meglena Kuneva, European Consumer Commissioner, March 2009 (Online). Available: http://www3.weforum.org/PersonalDataNewAsset_Report_2011.pdf (Accessed Dec.10, 2021).

[2] Apache Hadoop Documentation (Online). Available: https://hadoop.apache.org/old/ (Accessed Dec.10, 2021).

[3] Directed Acyclic Graph (Online). Available: https://en.wikipedia.org/wiki/Directed_acyclic_graph (Accessed Dec.10, 2021).

[4] Vasuja, R., Bhandaralia, A., Chuchra, K., 2018. Daemons of Hadoop: An Overview. International Journal of Engineering Research & Technology(IJERT). 07.

[5] Papakyriakou, D., 2018. Benchmarking Raspberry Pi 2 Hadoop Cluster. International Journal of Computer Applications (0975-8887). 178(42), 37-47.

[6] Alnasir, J., Shanahan, H.P., 2018. The application of Hadoop in Structural Bioinformatics. Oxford University Press. pp. 16-35. DOI: https://doi.org/10.1093/bioinformatics

[7] Fakhreldin, M.A., 2019. Weather Data Analysis Using Hadoop: Applications and Challenges. Joint Conference on Green Engineering Technology & Applied Computing.DOI: https://doi.org/10.1088/1757-899X/551/1/012044

[8] Begum, G., Huq, S.Z.U., Kumar, A.P.S., 2020. Sandbox security model for Hadoop file system. Journal of Big Data. 7, 82. DOI: https://doi.org/10.1186/s40537-020-00356-z

[9] World, H., 2022. JobTracker and TaskTracker. Hadoop In Real World (Online). Available: https://www.hadoopinrealworld.com/jobtracker-and-tasktracker/.(Accessed 18 Mar 2022).

[10] Mothukuri, V., Cheerla, S., Parizi, R., et al., 2021.BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability. Blockchain: Research and Applications. 2(4),100032. (Accessed 15 March 2022) DOI: https://doi.org/10.1016/j.bcra.2021.100032

[11] HDFS | HDFS Architecture | Components Of HDFS.Analytics Vidhya, 2022. (Online). Available: https://www.analyticsvidhya.com/blog/2020/10/hadoopdistributed-file-system-hdfs-architecture-a-guideto-hdfs-for-every-data-engineer/. (Accessed 08 Mar 2022).

[12] Hadoop and How Does It Work, simplilearn, 2022.(Online). Available: https://www.simplilearn.com/tutorials/hadoop-tutorial/what-is-hadoop. (Accessed 10 Mar 2022).

[13] Hadoop vs Spark vs Flink – Big Data Frameworks Comparison, Data Flair, 2022. (Online). Available:https://data-flair.training/blogs/hadoop-vs-spark-vsflink/. (Accessed 01 Mar 2022).

[14] Hadoop vs. Spark: What’s the Difference? 2022.(Online). Available: https://www.ibm.com/cloud/blog/hadoop-vs-spark. (Accessed 08 Mar. 2022).


How to Cite

Giri, P. R., & Sharma, G. (2022). Apache Hadoop Architecture, Applications, and Hadoop Distributed File System. Semiconductor Science and Information Devices, 4(1), 14–20. https://doi.org/10.30564/ssid.v4i1.4619


Article Type