Beyond Perception: A Comprehensive Investigation into the Advancements, Challenges & Ethical Dimensions of AI and Computer Vision
DOI:
https://doi.org/10.30564/rwas.v1i1.9577Abstract
This study presents a structured investigation into the recent advancements and practical applications of Artificial Intelligence (AI), Deep Learning (DL), Machine Learning (ML), and Computer Vision (CV), with a specific focus on their integration in domains such as healthcare, autonomous transportation, and intelligent surveillance. Through a comprehensive available knowledge investigation and thematic analysis of expert interviews, the research identifies significant progress in core areas including image classification, object detection, and autonomous navigation. The study critically examines the performance and applicability of state-of-the-art models such as Vision Transformers, YOLO, and diffusion-based architectures, particularly those developed using transfer learning and ensemble learning techniques.Experimental observations are supported by empirical data and comparative analyses, demonstrating the effectiveness of these models across varied deployment environments. However, challenges persist related to data quality, model interpretability, and ethical concerns, including algorithmic bias and lack of transparency. The findings underscore the importance of ethical AI governance and the implementation of robust data stewardship practices. Practical implications are discussed for AI developers, with emphasis on the deployment of efficient models on edge devices and in AR/VR systems. From a policy perspective, the study advocates for the development of regulatory frameworks that ensure responsible and equitable AI adoption. Future research directions include improving model generalizability, integrating multimodal data, and designing human-centric AI systems. This work aims to contribute to a more holistic understanding of AI-driven computer vision and offers a foundation for both scholarly inquiry and industrial implementation.
Keywords:
Artificial Intelligence (AI); Augmented Reality (AR); Computer Vision; Deep Learning (DL); image Processing; Machine Learning (ML); Robotics; Virtual Reality (VR)References
[1] Akhtar, Z., Rawol, A., 2025. Harnessing artificial intelligence (AI) towards the landscape of big earth data: Methods, challenges, opportunities, future directions. Journal of Geography and Cartography. 8(1), 10224. DOI: https://doi.org/10.24294/jgc10224
[2] Akhtar, Z.B., 2024. Generative artificial intelligence (GAI): From large language models (LLMs) to multimodal applications towards fine tuning of models, implications, investigations. Computing and Artificial Intelligence. 3(1), 1498. DOI: https://doi.org/10.59400/cai.v3i1.1498
[3] Akhtar, Z.B., 2024. Unveiling the evolution of generative AI (GAI): a comprehensive and investigative analysis toward LLM models (2021–2024) and beyond. Journal of Electrical Systems and Information Technology. 11, 22. DOI: https://doi.org/10.1186/s43067-024-00145-1
[4] Akhtar, Z.B., 2025. Unveiling the Evolution of Generative AI (GAI). Eliva Press: New York, USA. pp. 58.
[5] Akhtar, Z.B., 2022. A Revolutionary Gaming Style in Motion. In: Dey, I. (ed.). Computer-Mediated Communication. IntechOpen: London, UK. DOI: https://doi.org/10.5772/intechopen.100551
[6] Zhu, Y., Shen, T., 2025. The role of machine learning in enhancing computer vision processing. Proceedings of the 2nd International Scientific and Practical Conference "Innovative Technologies for Training and Educating Young People"; 14–17 January 2025; Boston, MA, USA. International Science Group: New York, USA. pp. 115.
[7] Yao, M., 2025. Applications of Artificial Intelligence in Computer Vision and Network Fields. GBP Proceedings Series. 1, 64–71.
[8] Qu, H., Rahmani, H., Xu, L., et al., 2025. Recent advances of continual learning in computer vision: An overview. IET Computer Vision. 19(1), e70013.
[9] Li, J., Zhou, Z., Yang, J., et al., 2025. Medshapenet–a large-scale dataset of 3d medical shapes for computer vision. Biomedical Engineering/Biomedizinische Technik. 70(1), 71–90.
[10] Yang, Z., Zeng, W., Jin, S., et al., 2025. Autommlab: Automatically generating deployable models from language instructions for computer vision tasks. Proceedings of the AAAI Conference on Artificial Intelligence. 39(21), 22056–22064. DOI: https://doi.org/10.1609/aaai.v39i21.34358
[11] Wang, A., Wu, H., Iwahori, Y., 2025. Advances in Computer Vision and Deep Learning and Its Applications. Electronics. 14(8), 1551.
[12] Guo, Z., Wu, X., Liang, L., et al., 2025. Cross‐domain foundation model adaptation: Pioneering computer vision models for geophysical data analysis. Journal of Geophysical Research: Machine Learning and Computation. 2(1), e2025JH000601.
[13] Duan, H., Shao, S., Zhai, B., et al., 2025. Parameter efficient fine-tuning for multi-modal generative vision models with möbius-inspired transformation. International Journal of Computer Vision. pp. 1–14. DOI: https://doi.org/10.1007/s11263-025-02398-3
[14] Khan, A.I., Al Badi, A., Alqahtani, M., 2025. Explainable Artificial Intelligence for Computer Vision and Quantum Machine Learning. Procedia Computer Science. 258, 3723–3730.
[15] Nagy, M., Lăzăroiu, G., 2022. Computer vision algorithms, remote sensing data fusion techniques, and mapping and navigation tools in the Industry 4.0-based Slovak automotive sector. Mathematics. 10(19), 3543.
[16] Darwish, D., 2025. Machine learning implementation in computer vision. In: Darwish, D. (ed.). Computer Vision Techniques and Recent Trends. Deep Science Publishing: London, UK. pp. 32–44.
[17] Pucci, R., Kalkman, V.J., Stowell, D., 2025. Performance of Computer Vision Algorithms for Fine‐Grained Classification Using Crowdsourced Insect Images. IET Computer Vision. 19(1), e70006.
[18] Fawzy, H., Elbrawy, A., Amr, M., et al., 2025. A systematic review: computer vision algorithms in drone surveillance. Journal of Robotics: Integration. 2(1).
[19] Murugan, A.S., Noh, G., Jung, H., et al., 2025. Optimising computer vision-based ergonomic assessments: sensitivity to camera position and monocular 3D pose model. Ergonomics. 68(1), 120–137.
[20] Xu, Y., Khan, T.M., Song, Y., Meijering, E., 2025. Edge deep learning in computer vision and medical diagnostics: a comprehensive survey. Artificial Intelligence Review. 58(3), 1–78.
[21] Malobický, B., Hruboš, M., Kafková, J., et al., 2025. Towards Seamless Human–Robot Interaction: Integrating Computer Vision for Tool Handover and Gesture-Based Control. Applied Sciences. 15(7), 3575.
[22] Shi, L., Guo, H., Zeng, G., et al., 2025. Key parameters and effects in image processing and aggregate–aggregate contact calculation of asphalt mixtures. Measurement. 239, 115439.
[23] Yan, F., Venegas-Andraca, S.E., 2025. Lessons from twenty years of quantum image processing. ACM Transactions on Quantum Computing. 6(1), 1–29.
[24] Khalifa, I.A., Keti, F., 2025. The Role of Image Processing and Deep Learning in IoT-Based Systems: A Comprehensive Review. European Journal of Applied Science, Engineering and Technology. 3(1), 165–179.
[25] Shamshiri, S., Liu, H., Sohn, I., 2025. Adversarial robust image processing in medical digital twin. Information Fusion. 115, 102728.
[26] Chen, H., Xiang, Q., Hu, J., et al., 2025. Comprehensive exploration of diffusion models in image generation: a survey. Artificial Intelligence Review. 58(4), 99.
[27] Nazarieh, F., Kittler, J., Rana, M.A., et al., 2025. StableTalk: Advancing Audio-to-Talking Face Generation with Stable Diffusion and Vision Transformer. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., et al. (eds.). Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15306. Springer: Cham, Switzerland. pp. 271–286. DOI: https://doi.org/10.1007/978-3-031-78172-8_18
[28] Sadek, M.G., Hassan, A.Y., Diab, T.O., Abdelhafeez, A., 2025. Advancing Text-to-Image Generation: A Comparative Study of StyleGAN-T and Stable Diffusion 3 under Neutrosophic Sets. Neutrosophic Sets and Systems. 85, 784–800.
[29] Yang, W., Wang, C., Liu, L., et al., 2025. Advancing Interior Design with AI: Controllable Stable Diffusion for Panoramic Image Generation. Buildings. 15(8), 1391.
[30] Liu, H., Xie, Q., Ye, T., et al., 2025. SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation. Proceedings of The AAAI Conference on Artificial Intelligence. 39(5), 5451–5459. DOI: https://doi.org/10.1609/aaai.v39i5.32580
[31] Jääskeläinen, P., Sharma, N.K., Pallett, H., Åsberg, C., 2025. Intersectional analysis of visual generative AI: the case of stable diffusion. AI & SOCIETY. pp. 1–22. DOI: https://doi.org/10.1007/s00146-025-02207-y
[32] Thampanichwat, C., Wongvorachan, T., Sirisakdi, L., et al., 2025. Mindful Architecture from Text-to-Image AI Perspectives: A Case Study of DALL-E, Midjourney, and Stable Diffusion. Buildings. 15(6), 972.
[33] Hsu, P.C., Yu, Z., Mise, S., Miyaji, H., 2025. Privacy-Diffusion: Privacy-Preserving Stable Diffusion Without Homomorphic Encryption. Proceedings of the 2025 IEEE International Conference on Consumer Electronics (ICCE); 11–14 January 2025; Vegas, NV, USA. pp. 1–4.
[34] Wang, C., Peng, H.Y., Liu, Y.T., et al., 2025. Diffusion models for 3D generation: A survey. Computational Visual Media. 11(1), 1–28.
[35] Chen, Y., Ruan, H., 2025. Deep Analogical Generative Design and Evaluation: Integration of Stable Diffusion and LoRA. Journal of Mechanical Design. 147(5), 051403. DOI: https://doi.org/10.1115/1.4066861
[36] Wu, W., Li, Z., He, Y., et al., 2025. Paragraph-to-image generation with information-enriched diffusion model. International Journal of Computer Vision. pp. 1–22. DOI: https://doi.org/10.1007/s11263-025-02435-1
[37] Wang, Y., Chen, X., Ma, X., et al., 2025. Lavie: High-quality video generation with cascaded latent diffusion models. International Journal of Computer Vision. 133(5), 3059–3078.
[38] Li, C., Wang, X., Miao, B., et al., 2025. An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques. Proceedings of The AAAI Conference on Artificial Intelligence. 39(5), 4670–4678. DOI: https://doi.org/10.1609/aaai.v39i5.32493
[39] Yuan, Z., Li, L., Wang, Z., Zhang, X., 2025. Protecting copyright of stable diffusion models from ambiguity attacks. Signal Processing. 227, 109722.
[40] Zhen, T., Cao, J., Sun, X., et al., 2025. Token-aware and step-aware acceleration for Stable Diffusion. Pattern Recognition. 164, 111479.
[41] Rahmatulloh, A., 2025. Custom concept text-to-image using stable diffusion Model in generative artificial intelligence. JICO: International Journal of Informatics and Computing. 1(1), 1–11.
[42] Zhang, S., Huang, J., Wu, Y., et al., 2025. Seg-diffusion: Text-to-Image Diffusion Model for Open-Vocabulary Semantic Segmentation. Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 06–11 April 2025; Hyderabad, India. pp. 1–5.
[43] Chadebec, C., Tasar, O., Benaroche, E., Aubin, B., 2025. Flash diffusion: Accelerating any conditional diffusion model for few steps image generation. Proceedings of The AAAI Conference on Artificial Intelligence. 39(15), 15686–15695. DOI: https://doi.org/10.1609/aaai.v39i15.33722
[44] Ma, Z., Zhang, Y., Jia, G., et al., 2025. Efficient diffusion models: A comprehensive survey from principles to practices. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access). 1-20. DOI: https://doi.org/10.1109/TPAMI.2025.3569700
[45] Cao, C., Yue, H., Liu, X., Yang, J., 2025. Zero-Shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model. Proceedings of The AAAI Conference on Artificial Intelligence. 39(2), 1935–1943. DOI: https://doi.org/10.1609/aaai.v39i2.32189
[46] Zheng, L., Xie, L., Zhou, J., et al., 2025. Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models. Proceedings of The AAAI Conference on Artificial Intelligence. 39(10), 10582–10590. DOI: https://doi.org/10.1609/aaai.v39i10.33149
[47] He, C., Shen, Y., Fang, C., et al., 2025. Diffusion models in low-level vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 47(6), 4630–4651.
[48] Garcia, G.M., Abou Zeid, K., Schmidt, C., et al., 2025. Fine-tuning image-conditional diffusion models is easier than you think. Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); February 26–March 06 2025; Tucson, AZ, USA. pp. 753–762.
[49] Pan, Z., Wang, K., Li, G., et al., 2025. FineDiffusion: scaling up diffusion models for fine-grained image generation with 10,000 classes. Applied Intelligence. 55(4), 309.
[50] Nijhawan, R., Verma, M., Miglani, M.K., 2025. Satellite Image Classification Through Stable Diffusion and Vision Transformers. Proceedings of the 2025 3rd International Conference on Disruptive Technologies (ICDT); 07–08 March 2025; Greater Noida, India. pp. 871–875.
[51] Xu, Y., Gu, T., Chen, W., Chen, A., 2025. Ootdiffusion: Outfitting fusion based latent diffusion for controllable virtual try-on. Proceedings of The AAAI Conference on Artificial Intelligence. 39(9), 8996–9004. DOI: https://doi.org/10.1609/aaai.v39i9.32973
[52] Li, Y., Zhang, Y., Liu, S., Lin, X., 2025. Pruning then reweighting: Towards data-efficient training of diffusion models. Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 06–11 April 2025; Hyderabad, India. pp. 1–5.
[53] Asish, S.M., Karki, B.B., Kolahchi, N., Sutradhar, S., 2025. Synthesizing Six Years of AR/VR Research: A Systematic Review of Machine and Deep Learning Applications. Proceedings of the 2025 IEEE Conference Virtual Reality and 3D User Interfaces (VR); 08–12 March 2025; Malo, France. pp. 175–185.
[54] Sharma, S., Diwakar, M., Kumar, P., et al., 2025. Deep Learning-Based Object Recognition in AR-VR Environment. Proceedings of the 2025 International Conference on Intelligent Control, Computing and Communications (IC3); 13–14 February 2025; Mathura, India. pp. 452–457.
[55] Chindiyababy, U., Mehta, V., Kakkar, P., et al., 2025. Real-Time Interaction in AR/VR Environments: A Deep Learning Approach to Human-Computer Interaction. Proceedings of the 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT); 21–22 February 2025; Bhimtal, Nainital, India. pp. 1253–1258.
[56] Lampropoulos, G., 2025. Intelligent Virtual Reality and Augmented Reality Technologies: An Overview. Future Internet. 17(2), 58.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2025 Zarif Bin Akhtar

This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.