Cureus. 2025 Sep;17(9): e92590
Objective While Large Language Models (LLMs) show great promise for various medical applications, their black-box nature and the difficulty of reproducing results have been noted as significant challenges. In contrast, conventional text mining is a well-established methodology, yet its mastery remains time-consuming. This study aimed to determine if an LLM could achieve literature analysis outcomes comparable to those from traditional text mining, thereby clarifying both its utility and inherent limitations. Methods We analyzed the abstracts of 5,112 medical papers retrieved from PubMed using the single keyword "text mining." We used Google Gemini 2.5 (Google Inc., Mountain View, CA, USA) and instructed it to extract distinctive words, concepts, trends, and co-occurrence network concepts. These results were then qualitatively compared with those obtained from conventional text mining tools, VOSviewer and KH Coder. Results Google Gemini appeared to conceptually aggregate individual words and identify research trends. The concepts for co-occurrence networks also showed visual similarity to the networks generated by the traditional tools. However, the LLM's analytical output was based on its own unique interpretation and could not be directly compared with the statistically derived co-occurrence patterns. Furthermore, since this study relied on a visual comparison of network diagrams rather than rigorous quantitative analysis, the conclusions remain qualitative. Conclusion Google Gemini indicated an ability to extract keywords, concepts, and trends. A co-occurrence network visually similar to those generated by conventional text mining tools was created. While it showed particular strengths in conceptual summarization and trend detection, its limitations - including its black-box nature, reproducibility challenges, and subjective interpretations - became apparent. With a proper understanding of these constraints, LLMs may serve as a valuable complementary tool, with the potential to accelerate literature analysis in medical research.
Keywords: co-occurrence network; large language model; medical literature analysis; pubmed database; text mining