Evidence-Constrained Incident Visualization Cards for Distributed Cloud Logs: A UI/UX Framework for Turning Hadoop, OpenStack, and ZooKeeper Logs into Actionable SRE Design Interfaces

Jiayi  Nie; Ge Liu; Chengliang  Li; Tracey  Zou

doi:10.51903/ijgd.v4i1.3703

Authors

Jiayi Nie Operation Research, Columbia University, NY, USA
Ge Liu Computer Science, USC, CA, USA
Chengliang Li Information Studies, Trine University, VA, USA
Tracey Zou Computer Science, UCB, CA, USA

DOI:

https://doi.org/10.51903/ijgd.v4i1.3703

Keywords:

AIOps, incident visualization, UI/UX, anomaly detection, root cause analysis, Loghub-2.0, SRE, large language models

Abstract

Site reliability engineers do not need another opaque anomaly detector as much as they need interfaces that turn noisy distributed-system logs into fast, defensible incident decisions. This paper presents a reproducible UI/UX framework for LLM-assisted incident visualization cards. The framework converts raw and structured Loghub-2.0 benchmark logs from Hadoop, OpenStack, and ZooKeeper into evidence-grounded cards containing an incident headline, affected service and node, anomaly timeline, log-evidence badges, suspected root cause, confidence, recommended next action, severity hierarchy, and operator decision buttons. The empirical evaluation was conducted on the complete 2,000-line benchmark slice for each selected system, giving 6,000 log lines, 207 unique templates, and 120 fixed 50-line analysis windows. We implemented deterministic parsing, TF-IDF clustering, template-frequency z-score scoring, Isolation Forest, Local Outlier Factor, TF-IDF KMeans distance scoring, and a constrained LLM-style card generator. Drain-lite normalization achieved purity of 0.999 on Hadoop, 1.000 on OpenStack, and 1.000 on ZooKeeper against the Loghub EventId reference. The hybrid risk ensemble achieved Precision@5 of 0.800 on Hadoop, 0.800 on OpenStack, and 0.600 on ZooKeeper; the template-frequency z-score was the strongest single risk model on the ZooKeeper logs with AUPRC 0.934. The generated card interface increased the deterministic decision-readiness score from 0.133 for a raw log list and 0.486 for a plain-text summary to 0.944 for the visual incident card. These results show that evidence-preserving visual organization can improve the operational usefulness of log analytics outputs without claiming to replace human incident ownership.

References

Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 93-104. https://doi.org/10.1145/342009.335388

Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), Article 15. https://doi.org/10.1145/1541880.1541882

Chen, Y., & Li, M. (2025). From hand-drawn sketches to interactive web prototypes: A reproducible vision-language approach with structural and visual consistency evaluation. Journal of Technology Informatics and Engineering, 4(2), 364–384. https://doi.org/10.51903/jtie.v4i2.490

Chen, Y., Xie, H., Ma, M., Kang, Y., Gao, X., Shi, L., Cao, Y., Gao, X.-C., Fan, H., Wen, M., Zeng, J., Ghosh, S., Zhang, X., Lin, Q., Rajmohan, S., & Zhang, D. (2024). Automatic root cause analysis via large language models for cloud incidents. Proceedings of the Nineteenth European Conference on Computer Systems, 674-688. https://doi.org/10.1145/3627703.3629553

Cui, T., Ma, S., Chen, Z., Xiao, T., Tao, S., Liu, Y., Zhang, S., Lin, D., Liu, C., Cai, Y., Meng, W., & Pei, D. (2024). LogEval: A comprehensive benchmark suite for large language models in log analysis. arXiv. https://arxiv.org/abs/2407.01896

Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285-1298. https://doi.org/10.1145/3133956.3134015

Fariha, A., Gharavian, V., Makrehchi, M., Rahnamayan, S., Alwidian, S., & Azim, A. (2024). Log anomaly detection by leveraging LLM-based parsing and embedding with attention mechanism. 2024 IEEE Canadian Conference on Electrical and Computer Engineering, 859-863. https://doi.org/10.1109/CCECE59415.2024.10667308

Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX: Results of empirical and theoretical research. In P. A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 139-183). North-Holland. https://doi.org/10.1016/S0166-4115(08)62386-9

He, M., Tong, J., Duan, C., Cai, H., Li, Y., & Huang, G. (2024). LLMeLog: An approach for anomaly detection based on LLM-enriched log events. 2024 IEEE 35th International Symposium on Software Reliability Engineering, 132-143. https://doi.org/10.1109/ISSRE62328.2024.00023

He, P., Zhu, J., He, S., Li, J., & Lyu, M. R. (2016). An evaluation study on log parsing and its use in log mining. 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 654-661. https://doi.org/10.1109/DSN.2016.66

He, P., Zhu, J., Zheng, Z., & Lyu, M. R. (2017). Drain: An online log parsing approach with fixed depth tree. 2017 IEEE International Conference on Web Services, 33-40. https://doi.org/10.1109/ICWS.2017.13

Jiang, Z., Liu, J., Huang, J., Li, Y., Huo, Y., Gu, J., Chen, Z., Zhu, J., & Lyu, M. R. (2024). A large-scale evaluation for log parsing techniques: How far are we? Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3650212.3652123

Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. https://doi.org/10.1109/ICDM.2008.17

Makanju, A., Zincir-Heywood, A. N., & Milios, E. E. (2009). Clustering event logs using iterative partitioning. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1255-1264. https://doi.org/10.1145/1557019.1557154

Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., Sun, P., & Zhou, R. (2019). LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 4739-4745. https://doi.org/10.24963/ijcai.2019/658

Munzner, T. (2014). Visualization analysis and design. CRC Press.

Nielsen, J. (1994). Usability inspection methods. John Wiley & Sons.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you? Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778

Samanta, S., Chatterjee, O., Mohapatra, P., de Magalhaes, A., Gupta, H., & Rahane, A. (2024). Efficient incident summarization in ITOps: Leveraging entity-based grouping. Proceedings of the 2024 International Workshop on Software Engineering for AI in the Enterprise. IBM Research.

Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of the IEEE Symposium on Visual Languages, 336-343. https://doi.org/10.1109/VL.1996.545307

Xu, W., Huang, L., Fox, A., Patterson, D., & Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 117-132. https://doi.org/10.1145/1629575.1629587

Yushan Chen, & Evelyn Chan. (2023). Multimodal UI Representation Learning: Ablation of Screenshot, Wireframe, and View-Hierarchy Proxies on an Uploaded 168-Screen Dataset. Journal of Advanced Computing Systems , 3(1), 1-15. https://doi.org/10.69987/JACS.2023.30101

Zhu, J., He, S., He, P., Liu, J., & Lyu, M. R. (2023). Loghub: A large collection of system log datasets for AI-driven log analytics. 2023 IEEE 34th International Symposium on Software Reliability Engineering.

Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R. (2019). Tools and benchmarks for automated log parsing. 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings, 121-124. https://doi.org/10.1109/ICSE-Companion.2019.00051

Evidence-Constrained Incident Visualization Cards for Distributed Cloud Logs: A UI/UX Framework for Turning Hadoop, OpenStack, and ZooKeeper Logs into Actionable SRE Design Interfaces

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

menunew