LLM-Style Evidence Cards for Scientific Search Interfaces: A UI/UX Design Framework for Retrieval Transparency, Ranking Trust, and Visual Evidence Hierarchy

Jiaying  Jin

doi:10.51903/ijgd.v3i2.3698

Authors

Jiaying Jin Applied Analytics, Columbia University, NY, USA

DOI:

https://doi.org/10.51903/ijgd.v3i2.3698

Keywords:

evidence cards, scientific search, UI/UX design, visual communication, retrieval transparency, SciFact, ranking trust, evidence hierarchy, explainable information retrieval

Abstract

Scientific search systems increasingly provide ranked documents, passages, and automated summaries, yet users must still determine whether retrieved evidence supports a claim, whether the presented information is sufficient for inspection, and why a result is highly ranked. This paper proposes an evidence-card UI/UX framework for scientific search that transforms retrieved articles into structured evidence cards containing a claim anchor, extractive evidence summary, support/refute/insufficient badge, confidence cue, citation cue, ranking rationale, and expandable source text. The framework is designed as a visual communication layer for retrieval transparency and evidence inspection rather than as a new claim-verification model or a live LLM system. Evaluation was conducted using the BEIR SciFact retrieval benchmark, the original SciFact train/dev datasets, and a SciFact-Open candidate-pool stress test. On the 300-query BEIR SciFact test set, the BM25-dominant hybrid baseline achieved nDCG@10 = 0.6667 and Recall@10 = 0.7858, while the proposed evidence-card pipeline achieved nDCG@10 = 0.6621 and Recall@10 = 0.7763. On the SciFact dev set, gold evidence appeared within the top three evidence-card candidates for 84.6% of evidence-bearing claims, and selected rationale sentences matched gold rationale annotations for 44.0% of gold evidence-document pairs. Interface-level analysis showed that the proposed card design increased the evidence visibility index from 0.2643 to 0.5920 and reduced the estimated first-pass scan-burden proxy from 115.50 to 84.08 seconds. These results suggest that evidence cards improve transparency by making relevance, uncertainty, confidence, and ranking rationale visible while preserving access to source evidence.

References

Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133-143.

Card, S. K., Mackinlay, J. D., & Shneiderman, B. (Eds.). (1999). Readings in information visualization: Using vision to think. Morgan Kaufmann.

Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S. (2020). SPECTER: Document-level representation learning using citation-informed transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2270-2282.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171-4186.

Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv.

Hearst, M. A. (2009). Search user interfaces. Cambridge University Press.

Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422-446.

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 6769-6781.

Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. (2016). When(ish) is my bus? User-centered visualizations of uncertainty in everyday, mobile predictive systems. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 5092-5103.

Kuhlthau, C. C. (1991). Inside the search process: Information seeking from the user's perspective. Journal of the American Society for Information Science, 42(5), 361-371.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuettler, H., Lewis, M., Yih, W.-t., Rocktaeschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

Liao, Q. V., Gruen, D., & Miller, S. (2020). Questioning the AI: Informing design practices for explainable AI user experiences. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1-15.

Marchionini, G. (2006). Exploratory search: From finding to understanding. Communications of the ACM, 49(4), 41-46.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). MTEB: Massive text embedding benchmark. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2014-2037.

Nielsen, J. (1994). Usability engineering. Morgan Kaufmann.

Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv.

Norman, D. A. (2013). The design of everyday things: Revised and expanded edition. Basic Books.

Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643-675.

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 3982-3992.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you? Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the web. Journal of the American Society for Information Science and Technology, 53(2), 145-161.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389.

Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of the IEEE Symposium on Visual Languages, 336-343.

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285.

Thakur, N., Reimers, N., Rueckle, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 1-17.

Wadden, D., Lin, S., Lo, K., Wang, L. L., van Zuylen, M., Cohan, A., & Hajishirzi, H. (2020). Fact or fiction: Verifying scientific claims. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 7534-7550.

Wadden, D., Lo, K., Wang, L. L., Cohan, A., & Hajishirzi, H. (2022). SciFact-Open: Towards open-domain scientific claim verification. Findings of the Association for Computational Linguistics: EMNLP 2022, 4719-4734.

Ware, C. (2012). Information visualization: Perception for design (3rd ed.). Morgan Kaufmann.

LLM-Style Evidence Cards for Scientific Search Interfaces: A UI/UX Design Framework for Retrieval Transparency, Ranking Trust, and Visual Evidence Hierarchy

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

menunew