Privacy and Data-Integrity Risk Cards for LLM Agents: A UI/UX Design Framework for Secure Human Oversight under Prompt-Injection Attacks

Wenhao  Su; Hengning  Rao; Emma  Ma

doi:10.51903/ijgd.v4i1.3699

Authors

Wenhao Su Computer Science, UCSD, CA, USA
Hengning Rao Electrical and Computer Engineering, UIUC, IL, USA
Emma Ma Interaction Design, Northeastern University, MA, USA

DOI:

https://doi.org/10.51903/ijgd.v4i1.3699

Keywords:

LLM agents, prompt injection, UI/UX design, risk communication, AgentDojo, tool-use security

Abstract

Large language model (LLM) agents increasingly combine natural-language reasoning with external tools that can send messages, update records, book services, or transfer money. This capability changes prompt injection from a text-output problem into an interface problem: before an agent acts, the user must be able to determine whether the proposed action matches the original task, moves sensitive data, or modifies consequential state. This paper presents Privacy and Data-Integrity Risk Cards, a visual confirmation pattern. The card translates security-relevant runtime facts into source-trust labels, data-sensitivity chips, permission chips, data-flow arrows, consequence previews, a calibrated risk badge, and safer-alternative controls. The evaluation uses AgentDojo v0.1.11 source definitions and saved run traces from 18 agent-pipeline runs. The dataset contains 629 injected security cases and 97 benign user-task traces across Workspace, Slack, Travel, and Banking, yielding 13,068 trace-level UI samples. The evaluation is a deterministic oversight-decision proxy that measures how much risk-relevant information each interface condition exposes rather than how real users behave. Compared with a Plain Agent Log and a Text-only Security Warning, the Risk Card produced the highest proxy attack-recognition rate, the lowest benign false-positive rate, and the clearest score separation between injected and benign traces. Among traces where the injection goal was executed, the proxy approval rate was 3.8% for the Risk Card, compared with 13.0% for the Plain Agent Log and 5.9% for the Text-only Warning. These findings support the Risk Card as a UI risk-communication framework and a candidate interface for human-subject validation rather than a validated deployed defense.

References

Acquisti, A., Brandimarte, L., & Loewenstein, G. (2015). Privacy and human behavior in the age of information. Science, 347(6221), 509-514. https://doi.org/10.1126/science.aaa1465

Amershi, S., Weld, D., Vorbrock, J., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for human-AI interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1-13. https://doi.org/10.1145/3290605.3300233

Bravo-Lillo, C., Cranor, L. F., Downs, J. S., & Komanduri, S. (2013). Bridging the gap in computer security warnings: A mental model approach. IEEE Security & Privacy, 11(2), 18-26. https://doi.org/10.1109/MSP.2013.50

Chen, Y., & Li, M. (2025). From hand-drawn sketches to interactive web prototypes: A reproducible vision-language approach with structural and visual consistency evaluation. Journal of Technology Informatics and Engineering, 4(2), 364–384. https://doi.org/10.51903/jtie.v4i2.490

Cranor, L. F. (2008). A framework for reasoning about the human in the loop. Proceedings of the 1st Conference on Usability, Psychology, and Security.

Debenedetti, E., Zhang, J., Balunovic, M., Beurer-Kellner, L., Fischer, M., & Tramèr, F. (2024). AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. Advances in Neural Information Processing Systems, 38, Datasets and Benchmarks Track. https://openreview.net/forum?id=m1YYAQjO3w

Egelman, S., Cranor, L. F., & Hong, J. (2008). You've been warned: An empirical study of the effectiveness of web browser phishing warnings. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1065-1074. https://doi.org/10.1145/1357054.1357219

Felt, A. P., Ha, E., Egelman, S., Haney, A., Chin, E., & Wagner, D. (2012). Android permissions: User attention, comprehension, and behavior. Proceedings of the Eighth Symposium on Usable Privacy and Security. https://doi.org/10.1145/2335356.2335360

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you've asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv. https://arxiv.org/abs/2302.12173

Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2024). Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks. arXiv. https://arxiv.org/abs/2302.05733

Lin, J., Amini, S., Hong, J. I., Sadeh, N., Lindqvist, J., & Zhang, J. (2012). Expectation and purpose: Understanding users' mental models of mobile app privacy through crowdsourcing. Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 501-510. https://doi.org/10.1145/2370216.2370290

Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang, C., Shen, S., Zhang, T., Su, Y., Sun, H., Huang, M., Dong, Y., & Tang, J. (2023). AgentBench: Evaluating LLMs as agents. arXiv. https://arxiv.org/abs/2308.03688

Norman, D. A. (2013). The design of everyday things: Revised and expanded edition. Basic Books.

Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2023). Gorilla: Large language model connected with massive APIs. arXiv. https://arxiv.org/abs/2305.15334

Perez, F., & Ribeiro, I. (2022). Ignore previous prompt: Attack techniques for language models. arXiv. https://arxiv.org/abs/2211.09527

Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Tian, Y., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., & Sun, M. (2023). ToolLLM: Facilitating large language models to master 16000+ real-world APIs. arXiv. https://arxiv.org/abs/2307.16789

Ruan, Y., Dong, H., Wang, A., Pitis, S., Zhou, Y., Ba, J., Dubois, Y., Maddison, C. J., & Hashimoto, T. (2024). Identifying the risks of LM agents with an LM-emulated sandbox. arXiv. https://arxiv.org/abs/2309.15817

Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36.

Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504. https://doi.org/10.1080/10447318.2020.1741118

Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., & Cranor, L. F. (2009). Crying Wolf: An empirical study of SSL warning effectiveness. Proceedings of the 18th USENIX Security Symposium, 399-416.

Willison, S. (2023). Prompt injection attacks against applications powered by large language models. Simon Willison's Weblog.

Wogalter, M. S. (Ed.). (2006). Handbook of warnings. Lawrence Erlbaum Associates.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations.

Yushan Chen, & Evelyn Chan. (2023). Multimodal UI Representation Learning: Ablation of Screenshot, Wireframe, and View-Hierarchy Proxies on an Uploaded 168-Screen Dataset. Journal of Advanced Computing Systems , 3(1), 1-15. https://doi.org/10.69987/JACS.2023.30101

Zhan, Q., Liang, Z., Ying, Z., & Kang, D. (2024). InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. arXiv. https://arxiv.org/abs/2403.02691

Privacy and Data-Integrity Risk Cards for LLM Agents: A UI/UX Design Framework for Secure Human Oversight under Prompt-Injection Attacks

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

menunew