Comparative Evaluation of Large Language Models for Intent Classification in Indonesian Text

Markus Karjadi; Handri Santoso

doi:10.51967/tepian.v6i2.3355

Submitted

May 19, 2025

Accepted

May 29, 2025

Published

June 1, 2025

Download

PDF

Statistic

Read Counter : 0 Download : 0

Abstract

Large Language Models (LLMs) have shown tremendous potential in intent classification tasks, yet their practical deployment in low-resource language environments remains underexplored. This study presents an informatics-based evaluation framework to compare three LLM architectures—GPT-Neo (fine-tuned), Mistral, and Phi-2.0 (zero-shot inference)—on Indonesian intent classification. The methodology integrates classic informatics approaches such as stratified sampling, label encoding, model evaluation using Scikit-learn, and a REST API-based local inference pipeline via the Ollama framework. The study also benchmarks computational efficiency by profiling execution times on consumer-grade hardware. GPT-Neo achieved 100% accuracy after fine-tuning, while Mistral and Phi-2.0 scored approximately 55% and 18%, respectively, in zero-shot settings. The hybrid architecture designed in this work demonstrates how LLMs can be systematically evaluated and deployed using lightweight, modular informatics workflows. Results suggest that fine-tuned lightweight models are viable for high-accuracy deployment, while zero-shot models enable rapid prototyping under constrained resources.

Keywords

Informatics Framework Intent Classification GPT-Neo Fine-Tuning Indonesian NLP Ollama REST API Large Language Models (LLMs) Lightweight AI Deployment

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Karjadi, M., & Santoso, H. (2025). Comparative Evaluation of Large Language Models for Intent Classification in Indonesian Text. TEPIAN, 6(2), 97–105. https://doi.org/10.51967/tepian.v6i2.3355

Download Citation

References

Hadi, M. U., Al-Tashi, Q., Qureshi, R., Shah, A., et al. (2023). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. ResearchGate. https://www.researchgate.net/publication/372278221
Du, M. (2024). Research on application of financial large language models. ResearchGate. https://www.researchgate.net/publication/387443957
Karlsson, B., et al. (2024). SEACrowd: A multilingual multimodal data hub and benchmark suite for Southeast Asian languages. ResearchGate. https://www.researchgate.net/publication/381471071
Subramanian, S., et al. (2025). Small language models (SLMs) can still pack a punch: A survey. ResearchGate. https://www.researchgate.net/publication/387953927
Suhaeni, C., & Yong, H.-S. (2023). Mitigating class imbalance in sentiment analysis through GPT-3-generated synthetic sentences. ResearchGate. https://www.researchgate.net/publication/373504217
Ferrag, M. A., et al. (2024). Generative AI and large language models for cyber security: All insights you need. ResearchGate. https://www.researchgate.net/publication/380756562
Shao, M., et al. (2024). Survey of different large language model architectures: Trends, benchmarks, and challenges. ResearchGate. https://www.researchgate.net/publication/383976933
Yigit, Y., et al. (2024). Critical infrastructure protection: Generative AI, challenges, and opportunities. ResearchGate. https://www.researchgate.net/publication/389652344
Zhang, F., Chen, Y., Lu, S., & Liu, Y. (2023). LLM4Code: Survey on large language models for source code. arXiv preprint arXiv:2302.08091. https://arxiv.org/abs/2302.08091
Prottasha, N. J., Mahmud, A., Sobuj, M. S. I., Bhat, P., Kowsher, M., Yousefi, N., & Garibay, O. O. (2024). Parameter-efficient fine-tuning of large language models using semantic knowledge tuning. ResearchGate. https://www.researchgate.net/publication/384887177
Zhang, J., et al. (2024). When LLMs meet cybersecurity: A systematic literature review. ResearchGate. https://www.researchgate.net/publication/388723406
Xu, H., et al. (2024). Large language models for cyber security: A systematic literature review. ResearchGate. https://www.researchgate.net/publication/383064112
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27. https://arxiv.org/pdf/1406.2661
Raji, I. D., Bender, E. M., & Mitchell, M. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. AI Now Institute. https://www.academia.edu/81208251/Closing_the_AI_accountability_gap
Hu, J., Wang, R., & Liu, X. (2022). Synthetic data generation for financial inclusion: Case studies and policy implications. International Journal of Economics and Finance, 14(3), 56–68. https://zenodo.org/records/14928919/files/Synthetic%20Data%20Generation.pdf?download=1
Feng, S. (2024). Integrating artificial intelligence in financial services: Enhancements, applications, and future directions. Applied and Computational Engineering, 69, 19–24. https://doi.org/10.54254/2755-2721/69/20241455
Mirishli, S. (2025). Regulating AI in financial services: Legal frameworks and compliance challenges. arXiv preprint arXiv:2503.14541. https://arxiv.org/pdf/2503.14541
Maple, C., et al. (2023). The AI revolution: Opportunities and challenges for the finance sector. The Alan Turing Institute and FCA. arXiv preprint arXiv:2308.16538. https://arxiv.org/pdf/2308.16538
U.S. Department of the Treasury. (2024, December). Uses, opportunities, and risks of artificial intelligence in financial services. https://home.treasury.gov/system/files/136/Artificial-Intelligence-in-Financial-Services.pdf
FinRegLab. (2025, January). The future we make: Leveraging AI in financial services. AI Symposium Report. https://finreglab.org

Comparative Evaluation of Large Language Models for Intent Classification in Indonesian Text

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References

References