AgamottoEye: Recovering Request Flow for Cloud Systems via Log Analysis
DOI:
https://doi.org/10.30564/jcsr.v1i2.1239Abstract
Cloud applications are implemented on top of different distributed systems to provide online service. A service request is decomposed into multiple sub-tasks, which are dispatched to different distributed systems components. For cloud providers, monitoring the execution of a service request is crucial to promptly find problems that may compromise cloud availability. In this paper, we present AgamottoEye, to automatically construct request flow from existing logs. AgamottoEye addresses the challenges of analyzing interleaved log instances, and can successfully extract request flow spread across multiple distributed systems. Our experiments with Hadoop2/YARN show that AgamottoEye can analyze 25,050 log instances in 57.4s, and the extracted request flow information is helpful with error detection and diagnosis.
Keywords:
Cloud applications; Log analysis; Request flowReferences
[2] MR AM hangs when one node goes bad. https://issues.apache.org/jira/browse/MAPREDUCE-3228
[3] Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using Magpie for request extraction and workload modelling.. In OSDI, 2004, 4: 18.
[4] Dhruba Borthakur et al. HDFS architecture guide. Hadoop Apache Project, 2008, 53.
[5] Yen-Yang Michael Chen, Anthony J Accardi, Emre Kiciman, David A Patterson, Armando Fox, and Eric A Brewer. Path-based failure and evolution management, 2004.
[6] Rodrigo Fonseca, George Porter, Randy H Katz, Scott Shenker, and Ion Stoica. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX conference on Networked systems design & implementation. USENIX Association, 2007, 20–20.
[7] Anil K Jain and Richard C Dubes. Algorithms for clustering data, 1988.
[8] T.J. Watson Libraries. https://github.com/wala/WALA
[9] Raja R Sambasivan, Rodrigo Fonseca, Ilari Shafer, and Gregory R Ganger. So, you want to trace your distributed system? Key design insights from years of practical experience. Technical Report. Technical Report, 2014. CMU-PDL-14.
[10] Raja R Sambasivan, Alice X Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory R Ganger. Diagnosing Performance Changes by Comparing Request Flows.. In NSDI, 2011, 5: 1.
[11] Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report. Technical report, Google, Inc, 2010.
[12] Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013, 5.
[13] Xiao Yu, Pallavi Joshi, Jianwu Xu, Guoliang Jin, Hui Zhang, and Guofei Jiang. Cloudseer: Workflow monitoring of cloud infrastructures via interleaved logs. In ACM SIGPLAN Notices, 2016, 51. ACM: 489–502.
[14] Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm. Non-Intrusive Performance Profiling for Entire Software Stacks Based on the Flow Reconstruction Principle.. In OSDI. 2016, 603–618.
[15] Xu Zhao, Yongle Zhang, David Lion, Muhammad Faizan Ullah, Yu Luo, Ding Yuan, and Michael Stumm. lprof: A Non-intrusive Request Flow Profiler for Distributed Systems.. In OSDI, 2014, 14: 629– 644.