Visualizing the Right Counseling Support: Evidence-Linked Recommendation Cards for Explainable Mental Health Intake Interfaces
DOI:
https://doi.org/10.51903/ijgd.v3i1.3722Keywords:
Clinical Text Classification, Counseling Decision Support, Evidence-Linked Cards, Explainable AI, Mental Health IntakeAbstract
This study presents and evaluates a reproducible interface pipeline for turning mental-health counseling case material into structured, evidence-linked decision-support cards. The aim is not to replace therapists or to claim clinical effectiveness, but to make computational suggestions visible, reviewable, and interruptible in a clinician-facing intake-support interface. To clarify scope, the evaluated engine is a local text-classification and extractive-evidence pipeline rather than a free-form proprietary LLM generator. Six datasets were analyzed: CounselingBench, Graph2Counsel, CounselBench-Eval, a CBT distortion test set, a staged CBT response-quality dataset, and AnnoMI. The primary task classified CounselingBench counselor questions and answer options into five counseling-support competencies. The best probability-producing model, TF-IDF word features with stochastic-gradient log-loss classification, achieved 0.677 accuracy and 0.532 macro-F1 on 405 held-out cases. Evidence extraction generated three evidence chips per case and reached 0.993 evidence/full-prediction agreement, indicating fidelity to the implemented classifier rather than clinical sufficiency. At a 0.70 confidence threshold combined with risk-term routing, the interface released 7.2% of cards for routine review, achieved 0.897 accuracy on released cards, and routed 97.7% of model errors to human review. External checks showed that Graph2Counsel strategy prediction achieved 0.610 micro-F1, CBT response acceptability reached 0.807 accuracy, and AnnoMI therapist-behavior classification reached 0.693 macro-F1. The findings support the card as a cautious information-architecture prototype: it can expose recommendation category, confidence, model evidence, risk flag, next-step question, and human-review action, while leaving final interpretation and clinical appropriateness to the therapist.
References
Amershi, S., Weld, D., Vorst, G., Burrell, S., Kamar, E., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300233
Beck, J. S. (2011). Cognitive behavior therapy: Basics and beyond (2nd ed.). Guilford Press.
De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. arXiv. https://doi.org/10.48550/arXiv.2311.14693
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv. https://doi.org/10.48550/arXiv.1702.08608
Horvitz, E. (1999). Principles of mixed-initiative user interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 159-166. https://doi.org/10.1145/302979.303030
Jason Kuhn, Yushan Chen, & Evelyn Chan. (2024). AI-Driven Mobile UI Pattern Recognition and Design Topic Mining on RICO: Semantic Clustering and Screenshot-Based Topic Classification. Journal of Advanced Computing Systems , 4(5), 67-83. https://doi.org/10.69987/JACS.2024.40506
Li, Y., Yao, J., Bunyi, J. B. S., Frank, A. C., Hwang, A., & Liu, R. (2025). CounselBench: A large-scale expert evaluation and adversarial benchmark of large language models in mental health counseling. arXiv. https://arxiv.org/abs/2506.08584
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
Mandal, A., Chatterjee, A., Klakow, D., & Lauscher, A. (2026). Graph2Counsel: Clinically grounded synthetic counseling session generation. arXiv. https://arxiv.org/abs/2604.20382
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1-38. https://doi.org/10.1016/j.artint.2018.07.007
Molnar, C. (2022). Interpretable machine learning (2nd ed.). Leanpub.
Nguyen, V. C., Chen, Y., & collaborators. (2025). Do large language models align with core mental health counseling competencies? Findings of the Association for Computational Linguistics: NAACL 2025.
Norman, D. A. (2013). The design of everyday things (Revised and expanded ed.). Basic Books.
Panigutti, C., Beretta, A., Fadda, D., Giannotti, F., Pedreschi, D., Perotti, A., & Rinzivillo, S. (2023). Co-design of human-centered, explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems, 13(4), Article 21. https://doi.org/10.1145/3587271
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778
Schoonderwoerd, T. A. J., Jorritsma, W., Neerincx, M. A., & van den Bosch, K. (2021). Human-centered XAI: Developing design patterns for explanations of clinical decision support systems. International Journal of Human-Computer Studies, 154, Article 102684. https://doi.org/10.1016/j.ijhcs.2021.102684
Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe and trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504. https://doi.org/10.1080/10447318.2020.1741118
Stade, E. C., Stirman, S. W., Ungar, L. H., Boland, C., Schwartz, H. A., Yaden, D. B., Sedoc, J., DeRubeis, R. J., Willer, R., & Eichstaedt, J. C. (2024). Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation. npj Mental Health Research, 3, Article 12. https://doi.org/10.1038/s44184-024-00056-z
Wampold, B. E., & Imel, Z. E. (2015). The great psychotherapy debate: The evidence for what makes therapy work (2nd ed.). Routledge.
Wu, Z., Balloccu, S., Kumar, V., Helaoui, R., Reiter, E., & Reforgiato Recupero, D. (2023). Anno-MI: A dataset of expert-annotated counselling dialogues. Future Internet, 15(3), 110. https://doi.org/10.3390/fi15030110
Yushan Chen, & Evelyn Chan. (2023). Multimodal UI Representation Learning: Ablation of Screenshot, Wireframe, and View-Hierarchy Proxies on an Uploaded 168-Screen Dataset. Journal of Advanced Computing Systems , 3(1), 1-15. https://doi.org/10.69987/JACS.2023.30101
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Yifan Zhang, Hailey Zhang

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









5.png)
