Visualizing the Right Counseling Support: Evidence-Linked Recommendation Cards for Explainable Mental Health Intake Interfaces

Yifan  Zhang; Hailey  Zhang

doi:10.51903/ijgd.v3i1.3722

Authors

Yifan Zhang Department of Counseling and Clinical Psychology, Teachers College, Columbia University
Hailey Zhang Department of Electrical and Computer Engineering, Carnegie Mellon University, PA, USA

DOI:

https://doi.org/10.51903/ijgd.v3i1.3722

Keywords:

Clinical Text Classification, Counseling Decision Support, Evidence-Linked Cards, Explainable AI, Mental Health Intake

Abstract

This study presents and evaluates a reproducible interface pipeline for turning mental-health counseling case material into structured, evidence-linked decision-support cards. The aim is not to replace therapists or to claim clinical effectiveness, but to make computational suggestions visible, reviewable, and interruptible in a clinician-facing intake-support interface. To clarify scope, the evaluated engine is a local text-classification and extractive-evidence pipeline rather than a free-form proprietary LLM generator. Six datasets were analyzed: CounselingBench, Graph2Counsel, CounselBench-Eval, a CBT distortion test set, a staged CBT response-quality dataset, and AnnoMI. The primary task classified CounselingBench counselor questions and answer options into five counseling-support competencies. The best probability-producing model, TF-IDF word features with stochastic-gradient log-loss classification, achieved 0.677 accuracy and 0.532 macro-F1 on 405 held-out cases. Evidence extraction generated three evidence chips per case and reached 0.993 evidence/full-prediction agreement, indicating fidelity to the implemented classifier rather than clinical sufficiency. At a 0.70 confidence threshold combined with risk-term routing, the interface released 7.2% of cards for routine review, achieved 0.897 accuracy on released cards, and routed 97.7% of model errors to human review. External checks showed that Graph2Counsel strategy prediction achieved 0.610 micro-F1, CBT response acceptability reached 0.807 accuracy, and AnnoMI therapist-behavior classification reached 0.693 macro-F1. The findings support the card as a cautious information-architecture prototype: it can expose recommendation category, confidence, model evidence, risk flag, next-step question, and human-review action, while leaving final interpretation and clinical appropriateness to the therapist.

References

Amershi, S., Weld, D., Vorst, G., Burrell, S., Kamar, E., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300233

Beck, J. S. (2011). Cognitive behavior therapy: Basics and beyond (2nd ed.). Guilford Press.

De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. arXiv. https://doi.org/10.48550/arXiv.2311.14693

Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv. https://doi.org/10.48550/arXiv.1702.08608

Horvitz, E. (1999). Principles of mixed-initiative user interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 159-166. https://doi.org/10.1145/302979.303030

Jason Kuhn, Yushan Chen, & Evelyn Chan. (2024). AI-Driven Mobile UI Pattern Recognition and Design Topic Mining on RICO: Semantic Clustering and Screenshot-Based Topic Classification. Journal of Advanced Computing Systems , 4(5), 67-83. https://doi.org/10.69987/JACS.2024.40506

Li, Y., Yao, J., Bunyi, J. B. S., Frank, A. C., Hwang, A., & Liu, R. (2025). CounselBench: A large-scale expert evaluation and adversarial benchmark of large language models in mental health counseling. arXiv. https://arxiv.org/abs/2506.08584

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

Mandal, A., Chatterjee, A., Klakow, D., & Lauscher, A. (2026). Graph2Counsel: Clinically grounded synthetic counseling session generation. arXiv. https://arxiv.org/abs/2604.20382

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1-38. https://doi.org/10.1016/j.artint.2018.07.007

Molnar, C. (2022). Interpretable machine learning (2nd ed.). Leanpub.

Nguyen, V. C., Chen, Y., & collaborators. (2025). Do large language models align with core mental health counseling competencies? Findings of the Association for Computational Linguistics: NAACL 2025.

Norman, D. A. (2013). The design of everyday things (Revised and expanded ed.). Basic Books.

Panigutti, C., Beretta, A., Fadda, D., Giannotti, F., Pedreschi, D., Perotti, A., & Rinzivillo, S. (2023). Co-design of human-centered, explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems, 13(4), Article 21. https://doi.org/10.1145/3587271

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778

Schoonderwoerd, T. A. J., Jorritsma, W., Neerincx, M. A., & van den Bosch, K. (2021). Human-centered XAI: Developing design patterns for explanations of clinical decision support systems. International Journal of Human-Computer Studies, 154, Article 102684. https://doi.org/10.1016/j.ijhcs.2021.102684

Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe and trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504. https://doi.org/10.1080/10447318.2020.1741118

Stade, E. C., Stirman, S. W., Ungar, L. H., Boland, C., Schwartz, H. A., Yaden, D. B., Sedoc, J., DeRubeis, R. J., Willer, R., & Eichstaedt, J. C. (2024). Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation. npj Mental Health Research, 3, Article 12. https://doi.org/10.1038/s44184-024-00056-z

Wampold, B. E., & Imel, Z. E. (2015). The great psychotherapy debate: The evidence for what makes therapy work (2nd ed.). Routledge.

Wu, Z., Balloccu, S., Kumar, V., Helaoui, R., Reiter, E., & Reforgiato Recupero, D. (2023). Anno-MI: A dataset of expert-annotated counselling dialogues. Future Internet, 15(3), 110. https://doi.org/10.3390/fi15030110

Yushan Chen, & Evelyn Chan. (2023). Multimodal UI Representation Learning: Ablation of Screenshot, Wireframe, and View-Hierarchy Proxies on an Uploaded 168-Screen Dataset. Journal of Advanced Computing Systems , 3(1), 1-15. https://doi.org/10.69987/JACS.2023.30101

Visualizing the Right Counseling Support: Evidence-Linked Recommendation Cards for Explainable Mental Health Intake Interfaces

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

menunew