ijier logo

Transformer-Based 3D Object Detection

Keywords:
Array, Array, Array, Array, Array
Abstract

This paper mainly studies object detection methods based on Transformer. Transformer, as a natural language processing technology, is widely used in computer vision tasks such as image classification and object detection. This paper introduces an object detection method based on scale point cloud Transformer, which provides a new research direction for object detection in the future.

Author Biographies
  1. Jiayin Li, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  2. Yixin Ma, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  3. Jiagu Pan, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  4. Xing Xu, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

References

[1] Liu S., Cao Y., Huang W., etc. Radar point cloud segmentation integrating sparse attention and instance enhancement [J]. Chinese Journal of Image and Graphics, 2023, 28(02): 483-494. DOI: https://doi.org/10.11834/jig.210787

[2] Zhou J., Hu Y., Hu C., et al. Weakly perceptual target detection method based on point cloud completion and multi-resolution Transformer [J/OL]. Computer Applications: 1-13 [2023-03-27].

[3] Han L., Gao Y., Shi Z. Radar point cloud three-dimensional target detection based on sparse Transformer [J]. Computer Engineering, 2022, 48(11): 104-110+144.

[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. Attention is all you need. In Advances in neural information processing systems, 2017:5998-6008.

[5] Devlin, J., Chang, MW, Lee, K., et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.

[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations, 2021.

Downloads
Published
2025-03-25
Section
Research Articles
License

Copyright (c) 2025 Jiayin Li, Yixin Ma, Jiagu Pan, Xing Xu

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyrights for articles published in IJIER journals are retained by the authors, with first publication rights granted to the journal. The journal/publisher is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author for more visit Copyright & License.

How to Cite

Li, J., Ma, Y., Pan, J., & Xu, X. (2025). Transformer-Based 3D Object Detection. International Journal for Innovation Education and Research, 12(4), 1-5. https://doi.org/10.31686/ijier.vol12.iss4.4220