Combining Retrieval-Augmented Generation and Fine-tuning of Large Language Models to Enhance Port Industry Question-Answering Systems

Authors

  • Xinqiang Hu

    College of Computing and Information Technologies, National University, Manila 1008, Philippines

  • Mideth Abisado

    College of Computing and Information Technologies, National University, Manila 1008, Philippines

DOI:

https://doi.org/10.30564/fls.v7i6.9143
Received: 18 March 2025 | Revised: 7 April 2025 | Accepted: 21 May 2025 | Published Online: 6 June 2025

Abstract

In this research, we develop a new hybrid architecture that combines Retrieval-Augmented Generation (RAG) and LLMs (Large Language Models) in order to address the specific gaps in the domain question answering systems for the maritime port industry. Our approach mitigates the generic LLMs’ limitations concerning domain-specific queries through a combination of knowledge retrieval specific to the industry and adaptive modelling with implemented parameters. The overarching evaluation protocol designed for investigating the approach was both quantitative and qualitative by using expert judgement which showed marked improvement in justifiable gains across multi-dimensional stand-alone approaches regarding factual correctness, accuracy of use of maritime terms, and compliance with relevant policies. The hybrid system achieved 23% improvement in nDCG@5 scores alongside exceeding 90% accuracy in terminology used in maritime context, maintaining sub-second response times under typical operational loads. The domain experts we consulted in the study were particularly impressed by the balance the system struck between factual precision and contextual understanding of complex operational scenarios. Such improvement enables decision-makers for critical operational environments to greatly trust the system within their active contexts. This research demonstrates a practical methodology for balancing the adaptation of a domain to the computational algorithms of a system in specialised professional application domains that require high factual precision but allow for context interpretation.

Keywords:

Retrieval-Augmented Generation (RAG); Large Language Models (LLMs); Maritime Port Industry; Domain-Specific Knowledge; Hybrid architecture

References

[1] Woering, R., 2025. Enhancing Customer Support Chatbots with LLMs: Comparative Analysis of Few-Shot Learning, Fine-Tuning, and RAG including the Proposal of an Integrated Architecture [Master's thesis]. Utrecht University: Utrecht, Netherlands. pp. 1–146. DOI: https://doi.org/10.33540/1774

[2] Oroz, T., 2024. Comparative Analysis of Retrieval Augmented Generator and Traditional Large Language Models [Master's thesis]. Technische Universität Wien: Vienna, Austria. pp. 1–65. DOI: https://doi.org/10.34726/hss.2024.118825

[3] Yang, L., Chen, H., Li, Z., et al., 2024. Give us the Facts: Enhancing Large Language Models With Knowledge Graphs for Fact-Aware Language Modeling. IEEE Transactions on Knowledge and Data Engineering. 36(7), 3091–3110. DOI: https://doi.org/10.1109/TKDE.2024.3360454

[4] Fan, W., Ding, Y., Ning, L., et al., 2024. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), New York, NY, USA, 25––29 August 2024; pp. 6491–6501. DOI: https://doi.org/10.1145/3637528.3671470

[5] Li, B., Qi, P., Liu, B., et al., 2023. Trustworthy AI: From Principles to Practices. ACM Computing Surveys. 55(9), 1–46. DOI: https://doi.org/10.1145/3555803

[6] Hu, J., Shen, L., Albanie, S., et al., 2023. LoRA: Low-Rank Adaptation of Large Language Models for Domain-Specific Applications. Transactions on Machine Learning Research. 12(3), 175–193.

[7] Lewis, P., Perez, E., Piktus, A., et al., 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. DOI: https://doi.org/10.48550/arXiv.2005.11401 (cited 8 April 2025).

[8] Karpukhin, V., Oguz, B., Min, S., et al., 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Virtual Event, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 6769–6781. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.550

[9] Guu, K., Lee, K., Tung, Z., et al., 2020. Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Vienna, Austria (Virtual), 13–18 July 2020; PMLR: Cambridge, MA, USA. pp. 3929–3938. Available from: https://dl.acm.org/doi/abs/10.5555/3524938.3525306

[10] Brown, T., Mann, B., Ryder, N., et al., 2020. Language Models are Few-Shot Learners. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual Event, 6–12 December 2020; MIT Press: Cambridge, MA, USA. pp. 1877–1901. Available from: https://dl.acm.org/doi/abs/10.5555/3495724.3495883

[11] Devlin, J., Chang, M.W., Lee, K., et al., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 4171–4186. DOI: https://doi.org/10.18653/v1/N19-1423

[12] Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; MIT Press: Cambridge, MA, USA. pp. 5998–6008. Available from: https://dl.acm.org/doi/10.5555/3295222.3295349

[13] Touvron, H., Lavril, T., Izacard, G., et al., 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint. arXiv:2302.13971.DOI: https://doi.org/10.48550/arXiv.2302.13971

[14] Chowdhery, A., Narang, S., Devlin, J., et al., 2023. PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research. 24(240), 1–113. Available from: http://jmlr.org/papers/v24/22-1144.html (cited 8 April 2025).

[15] Hoffmann, J., Borgeaud, S., Mensch, A., et al., 2022. Training Compute-Optimal Large Language Models. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; MIT Press: Cambridge, MA, USA. pp. 30016–30030. DOI: https://doi.org/10.48550/arXiv.2203.15556

[16] Zhang, S., Roller, S., Goyal, N., et al., 2022. OPT: Open Pre-trained Transformer Language Models. arXiv preprint. arXiv:2205.01068. DOI: https://doi.org/10.48550/arXiv.2205.01068

[17] Ouyang, L., Wu, J., Jiang, X., et al., 2022. Training language models to follow instructions with human feedback. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; MIT Press: Cambridge, MA, USA. pp. 27730–27744. DOI: https://doi.org/10.48550/arXiv.2203.02155

[18] Schulman, J., Wolski, F., Dhariwal, P., et al., 2017. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347. arXiv.org: Ithaca, NY, USA. DOI: https://doi.org/10.48550/arXiv.1707.06347

[19] Christiano, P.F., Leike, J., Brown, T., et al., 2017. Deep reinforcement learning from human preferences. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; MIT Press: Cambridge, MA, USA. pp. 4299–4307. DOI: https://doi.org/10.48550/arXiv.1706.03741

[20] Wei, J., Bosma, M., Zhao, V.Y., et al., 2022. Finetuned Language Models are Zero-Shot Learners. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), Virtual Event, 25–29 April 2022; OpenReview.net: Cambridge, MA, USA. DOI: https://doi.org/10.48550/arXiv.2109.01652

[21] Min, S., Lyu, X., Holtzman, A., et al., 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, UAE, 7–11 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 11048–11064. DOI: https://doi.org/10.18653/v1/2022.emnlp-main.759

[22] Dong, Q., Li, L., Dai, D., et al., 2023. A Survey on In-context Learning. arXiv preprint arXiv:2301.00234. arXiv.org: Ithaca, NY, USA. DOI: https://doi.org/10.48550/arXiv.2301.00234

[23] Liu, J., Shen, D., Zhang, Y., et al., 2022. What Makes Good In-Context Examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, Dublin, Ireland, 26 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 100–114. DOI: https://doi.org/10.18653/v1/2022.deelio-1.10

[24] Zhao, Z., Wallace, E., Feng, S., et al., 2021. Calibrate Before Use: Improving Few-Shot Performance of Language Models. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Event, 18–24 July 2021; PMLR: Cambridge, MA, USA. pp. 12697–12706. DOI: https://doi.org/10.48550/arXiv.2102.09690

[25] Garigliotti, D., Johansen, B., Vigerust Kallestad, J., et al., 2024. EquinorQA: large language models for question answering over proprietary data. In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024), Santiago de Compostela, Spain, 19–24 October 2024; IOS Press: Amsterdam, Netherlands. pp. 4563–4570. DOI: https://doi.org/10.3233/FAIA241049

[26] Jeon, J., Sim, Y., Lee, H., et al., 2025. ChatCNC: Conversational machine monitoring via large language model and real-time data retrieval augmented generation. Journal of Manufacturing Systems. 79, 504–514. DOI: https://doi.org/10.1016/j.jmsy.2025.01.018

[27] Vizniuk, A., Diachenko, G., Laktionov, I., et al., 2025. A Comprehensive Survey of Retrieval-Augmented Large Language Models for Decision Making in Agriculture: Unsolved Problems and Research Opportunities. Journal of Artificial Intelligence and Soft Computing Research. 15(2), 115–146. DOI: https://doi.org/10.2478/jaiscr-2025-0007

[28] Yang, R., Fu, M., Tantithamthavorn, C., et al., 2025. RAGVA: Engineering Retrieval Augmented Generation-based Virtual Assistants in Practice. Journal of Systems and Software. 226, 112436. DOI: https://doi.org/10.1016/j.jss.2025.112436

[29] Liu, X., Ji, K., Fu, Y., et al., 2022. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 61–68. DOI: https://doi.org/10.18653/v1/2022.acl-short.8

[30] Li, X.L., Liang, P., 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual Event, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA. pp. 4582–4597. DOI: https://doi.org/10.18653/v1/2021.acl-long.353

[31] Houlsby, N., Giurgiu, A., Jastrzebski, S., et al., 2019. Parameter-Efficient Transfer Learning for NLP. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA. pp. 2790–2799. DOI: https://doi.org/10.48550/arXiv.1902.00751

[32] Radford, A., Wu, J., Child, R., et al., 2019. Language Models are Unsupervised Multitask Learners. OpenAI: San Francisco, CA, USA. DOI: https://doi.org/10.48550/arXiv.1909.13723

[33] Rogers, A., Kovaleva, O., Rumshisky, A., 2020. A Primer on Neural Network Models for Natural Language Processing. Journal of Artificial Intelligence Research. 57, 615–731. DOI: https://doi.org/10.1613/jair.1.11640

Downloads

How to Cite

Hu, X., & Abisado, M. (2025). Combining Retrieval-Augmented Generation and Fine-tuning of Large Language Models to Enhance Port Industry Question-Answering Systems. Forum for Linguistic Studies, 7(6), 531–553. https://doi.org/10.30564/fls.v7i6.9143

Issue

Article Type

Article