A Novel Dataset For Intelligent Indoor Object Detection Systems


  • Mouna Afif Faculty of sciences of Monastir
  • Riadh Ayachi University of Monastir
  • Yahia said University of Monastir
  • Edwige Pissaloux University of Rouen Normandy
  • Mohamed Atri University of Monastir




Indoor Scene understanding and indoor objects detection is a complex high-level task for automated systems applied to natural environments. Indeed, such a task requires huge annotated indoor images to train and test intelligent computer vision applications. One of the challenging questions is to adopt and to enhance technologies to assist indoor navigation for visually impaired people (VIP) and thus improve their daily life quality. This paper presents a new labeled indoor object dataset elaborated with a goal of indoor object detection (useful for indoor localization and navigation tasks). This dataset consists of 8000 indoor images containing 16 different indoor landmark objects and classes. The originality of the annotations comes from two new facts taken into account: (1) the spatial relationships between objects present in the scene and (2) actions possible to apply to those objects (relationships between VIP and an object).This collected dataset presents many specifications and strengths as it presents various data under various lighting conditions and complex image background to ensure more robustness when training and testing objects detectors. The proposed dataset, ready for use, provides 16 vital indoor object classes in order to contribute for indoor assistance navigation for VIP.


Indoor object detection and recognition; Indoor image dataset; Visually Impaired People (VIP); Idoor navigation


[1] https://github.com/tzutalin/labelImg.accessed:23-08-2018

[2] https://repo.acin.tuwien.ac.at/tmp/permanent/dataset_index.php

[3] Quattoni, A., &Torralba, A. Recognizing indoor scenes.In 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009: 413-420.

[4] http://www.navvis.lmt.ei.tum.de/dataset/accessed:21-07-2018

[5] Yinda Zhang, Shuran Song, ErsinYumer, ManolisSavva, Joon-Young Lee, HailinJin, and Thomas Funkhouser.Physically-based rendering for indoor scene understanding using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 5057 – 5065.

[6] John McCormac, AnkurHanda, Stefan Leutenegger, and Andrew J Davison.SceneNet RGB-D: 5M photorealistic images of synthetic indoor trajectories with ground truth. In International Conference on Computer Vision (ICCV): 2697–2706.

[7] A. Handa, V. Patr ˘ aucean, V. Badrinarayanan, S. Stent, ˘ and R. Cipolla. SceneNet: Understanding Real World Indoor Scenes With Synthetic Data. arXiv preprint arXiv:1511.07041, 2015.

[8] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision, Springer, 2012: 746–760..

[9] A. Z. A. X. C. M. S. T. F. Shuran Song, Fisher Yu. Semantic Scene Completion from a Single Depth Image.InarXiv, 2016.

[10] S. Song and J. Xiao. Deep sliding shapes for amodal 3D object detection in rgb-d images. In CVPR, 2016.

[11] D. Thanh Nguyen, B.-S. Hua, K. Tran, Q.-H. Pham, and S.- K. Yeung. A field model for repairing 3D shapes. In CVPR, 2016.

[12] S. Gupta, P. Arbelaez, and J. Malik. Perceptual organization and recognition of indoor scenes from RGB-D images.In CVPR, 2013.

[13] K. Lai, L. Bo, and D. Fox. Unsupervised feature learning for 3D scene labeling.In ICRA.IEEE, 2014.

[14] Yi Zhang, WeichaoQiu, Qi Chen, Xiaolin Hu, and Alan Yuille.Unrealstereo: A synthetic dataset for analyzing stereo vision. arXiv preprint arXiv:1612.04647, 2016.

[15] Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu.Humancentric indoor scene synthesis using stochastic grammar.In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 5899–5908.

[16] https://github.com/Labelbox/Labelbox

[17] https://github.com/wkentaro/labelme

[18] https://rectlabel.com/

[19] Bashiri, F. S., LaRose, E., Peissig, P., &Tafti, A. P.. MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection. Data in brief, 2018, 17: 71-75.

[20] Hua, B. S., Pham, Q. H., Nguyen, D. T., Tran, M. K., Yu, L. F., &Yeung, S. K.. Scenenn: A scene meshes dataset with annotations. In 3D Vision (3DV), IEEE, Fourth International Conference on 2016: 92-101.

[21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li, “Imagenet large scale visual recognition challenge,” CoRR, vol. abs/1409.0575, 2014. [Online].Available: http://arxiv.org/abs/1409.0575

[22] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. ´ Zitnick, “Microsoft COCO: common objects in context,” Computing Research Repository, vol. abs/1405.0312, 2014. [Online].Available: http://arxiv.org/abs/1405.0312

[23] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International Journal of Computer Vision, 2015, 111: 98–136. [Online].Available:DOI: https://doi.org/10.1007/s11263-014-0733-5

[24] XIAO, Jianxiong, HAYS, James, EHINGER, Krista A., et al. Sun database: Large-scale scene recognition from abbey to zoo. In : 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 3485-3492.

[25] “Who: Vision impairment and blindness,” http://www.who.int/mediacentre/factsheets/fs282/en/accessed:2017-12-08.


How to Cite

Afif, M., Ayachi, R., said, Y., Pissaloux, E., & Atri, M. (2019). A Novel Dataset For Intelligent Indoor Object Detection Systems. Artificial Intelligence Advances, 1(1), 52–58. https://doi.org/10.30564/aia.v1i1.925


Article Type