Privacy and Data-Integrity Risk Cards for LLM Agents: A UI/UX Design Framework for Secure Human Oversight under Prompt-Injection Attacks
DOI:
https://doi.org/10.51903/ijgd.v4i1.3699Keywords:
LLM agents, prompt injection, UI/UX design, risk communication, AgentDojo, tool-use securityAbstract
Large language model (LLM) agents increasingly combine natural-language reasoning with external tools that can send messages, update records, book services, or transfer money. This capability changes prompt injection from a text-output problem into an interface problem: before an agent acts, the user must be able to determine whether the proposed action matches the original task, moves sensitive data, or modifies consequential state. This paper presents Privacy and Data-Integrity Risk Cards, a visual confirmation pattern. The card translates security-relevant runtime facts into source-trust labels, data-sensitivity chips, permission chips, data-flow arrows, consequence previews, a calibrated risk badge, and safer-alternative controls. The evaluation uses AgentDojo v0.1.11 source definitions and saved run traces from 18 agent-pipeline runs. The dataset contains 629 injected security cases and 97 benign user-task traces across Workspace, Slack, Travel, and Banking, yielding 13,068 trace-level UI samples. The evaluation is a deterministic oversight-decision proxy that measures how much risk-relevant information each interface condition exposes rather than how real users behave. Compared with a Plain Agent Log and a Text-only Security Warning, the Risk Card produced the highest proxy attack-recognition rate, the lowest benign false-positive rate, and the clearest score separation between injected and benign traces. Among traces where the injection goal was executed, the proxy approval rate was 3.8% for the Risk Card, compared with 13.0% for the Plain Agent Log and 5.9% for the Text-only Warning. These findings support the Risk Card as a UI risk-communication framework and a candidate interface for human-subject validation rather than a validated deployed defense.
References
Acquisti, A., Brandimarte, L., & Loewenstein, G. (2015). Privacy and human behavior in the age of information. Science, 347(6221), 509-514. https://doi.org/10.1126/science.aaa1465
Amershi, S., Weld, D., Vorbrock, J., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for human-AI interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-13. https://doi.org/10.1145/3290605.3300233
Bravo-Lillo, C., Cranor, L. F., Downs, J. S., & Komanduri, S. (2013). Bridging the gap in computer security warnings: A mental model approach. IEEE Security & Privacy, 11(2), 18-26. https://doi.org/10.1109/MSP.2013.50
Chen, Y., & Li, M. (2025). From hand-drawn sketches to interactive web prototypes: A reproducible vision-language approach with structural and visual consistency evaluation. Journal of Technology Informatics and Engineering, 4(2), 364–384. https://doi.org/10.51903/jtie.v4i2.490
Cranor, L. F. (2008). A framework for reasoning about the human in the loop. Proceedings of the 1st Conference on Usability, Psychology, and Security.
Debenedetti, E., Zhang, J., Balunovic, M., Beurer-Kellner, L., Fischer, M., & Tramèr, F. (2024). AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. Advances in Neural Information Processing Systems, 38, Datasets and Benchmarks Track. https://openreview.net/forum?id=m1YYAQjO3w
Egelman, S., Cranor, L. F., & Hong, J. (2008). You've been warned: An empirical study of the effectiveness of web browser phishing warnings. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1065-1074. https://doi.org/10.1145/1357054.1357219
Felt, A. P., Ha, E., Egelman, S., Haney, A., Chin, E., & Wagner, D. (2012). Android permissions: User attention, comprehension, and behavior. Proceedings of the Eighth Symposium on Usable Privacy and Security. https://doi.org/10.1145/2335356.2335360
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you've asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv. https://arxiv.org/abs/2302.12173
Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2024). Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks. arXiv. https://arxiv.org/abs/2302.05733
Lin, J., Amini, S., Hong, J. I., Sadeh, N., Lindqvist, J., & Zhang, J. (2012). Expectation and purpose: Understanding users' mental models of mobile app privacy through crowdsourcing. Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 501-510. https://doi.org/10.1145/2370216.2370290
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Huang, M., Dong, Y., & Tang, J. (2023). AgentBench: Evaluating LLMs as agents. arXiv. https://arxiv.org/abs/2308.03688
Norman, D. A. (2013). The design of everyday things: Revised and expanded edition. Basic Books.
Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2023). Gorilla: Large language model connected with massive APIs. arXiv. https://arxiv.org/abs/2305.15334
Perez, F., & Ribeiro, I. (2022). Ignore previous prompt: Attack techniques for language models. arXiv. https://arxiv.org/abs/2211.09527
Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Tian, Y., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., & Sun, M. (2023). ToolLLM: Facilitating large language models to master 16000+ real-world APIs. arXiv. https://arxiv.org/abs/2307.16789
Ruan, Y., Dong, H., Wang, A., Pitis, S., Zhou, Y., Ba, J., Dubois, Y., Maddison, C. J., & Hashimoto, T. (2024). Identifying the risks of LM agents with an LM-emulated sandbox. arXiv. https://arxiv.org/abs/2309.15817
Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36.
Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504. https://doi.org/10.1080/10447318.2020.1741118
Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., & Cranor, L. F. (2009). Crying Wolf: An empirical study of SSL warning effectiveness. Proceedings of the 18th USENIX Security Symposium, 399-416.
Willison, S. (2023). Prompt injection attacks against applications powered by large language models. Simon Willison's Weblog.
Wogalter, M. S. (Ed.). (2006). Handbook of warnings. Lawrence Erlbaum Associates.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations.
Yushan Chen, & Evelyn Chan. (2023). Multimodal UI Representation Learning: Ablation of Screenshot, Wireframe, and View-Hierarchy Proxies on an Uploaded 168-Screen Dataset. Journal of Advanced Computing Systems , 3(1), 1-15. https://doi.org/10.69987/JACS.2023.30101
Zhan, Q., Liang, Z., Ying, Z., & Kang, D. (2024). InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. arXiv. https://arxiv.org/abs/2403.02691
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Wenhao Su, Hengning Rao, Emma Ma

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









5.png)
