52mW Full HD 160-Degree Object Viewpoint Recognition SoC with Visual Vocabulary Processor for Wearable Vision Applications
Yu-Chi Su, Keng-Yen Huang, Tse-Wei Chen, Yi-Min Tsai, Shao-Yi Chien, Liang-Gee Chen
National Taiwan University
2011 Symposium on VLSI Circuits (SOVC)
Abstract
A wearable 1920x080 160-degree object viewpoint recognition SoC is realized on a 6.38mm2 die with 65nm CMOS technology. This system focuses on enhancing the capability for wide viewpoint and long-distance recognition while reducing the computation of feature matching process. The recognition accuracy is improved from 29% to 94% under full HD resolution for a 50m-far traffic light compared with the performance under VGA (640x80). Object viewpoint prediction (OVP) supports 160-degree object viewpoint differences. 85% of power consumption and 75% of memory bandwidth are reduced via proposed visual vocabulary processor (VVP). 52mW power consumption with 25.9GOPS/mm2 area efficiency is achieved.
Download [PDF]
Wearable Visual Recognition SoC
Recently, mobile vision technologies, such as augmented reality, robot vision, and visually-impaired electronic aids, have been developed to assist people and make our lives more convenient. However, in many circumstances especially when wearing these devices in outdoors, state-of-the-art techniques show limited performance. We attribute this phenomenon into three major causes: (1) difficulty in detecting long-distance or small-sized objects, (2) poor recognition accuracy under large object viewpoint variation and dramatic camera ego-motions and (3) high power consumption due to the complex computation and frequent memory access.
In this paper, we propose a full HD 160-degree (80¢X for one side) object viewpoint recognition SoC as shown in Fig. 1. For overcoming the above shortages, three prominent characteristics are introduced in our system. Firstly, to recognize objects at far distance or with small size, the proposed vision recognition system is designed for full HD resolution with 30fps. Higher resolution leads to better performance in recognizing an object occupying a small portion of an image as illustrated in Fig. 1. Secondly, OVP is proposed to allow 160-degree variation of object appearance. Through synthesizing predicted pose candidates of an object, the capability of viewpoint variation tolerance is significantly enhanced without feeding extra images into the database. Lastly, VVP is designed to simplify the complicated computation. Existing object recognition systems [1-3] operate object recognition in feature matching stage and require frequent memory accesses. More memory access leads to higher power consumption that is critical in wearable applications. In this work, we advance the matching process from feature level to object level via VVP. It utilizes the conceptions of Bag-of-Words (BoW) object representation and the vocabulary tree [4] to characterize an object as a histogram vector. Instead of matching features that results in thousands of memory fetching, VVP only compares the histogram vector with memory access once to recognize an object. Combined with the above three distinguished characteristics, the proposed recognition SoC achieves both high accuracy and power efficiency for wearable vision applications.
System Flow
Implementation Results
Comparison
Reference
[1] Jinwook Oh et al., "1.2 mW On-Line Learning Mixed Mode Intelligent Inference Engine for Robust Object Recognition", in IEEE Symposium on VLSI, pp.17-18, 2010.
[2] J.-Y. Kim et al., "A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine", in IEEE ISSCC, pp.150-151, 2009.
[3] S. Lee et al., "A 345mW Heterogeneous Many-Core Processor with an Intelligent Inference Engine for Robust Object Recognition", in IEEE ISSCC, pp.332-333, 2010.
[4] D. Nister and H. Stewenius, "Scalable Recognition with a Vocabulary Tree", in IEEE CVPR, pp.2161-2168, 2006.
[5] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, vol.60, no 20, pp.91-110, 2004.
[6] Barinova et al., "On Detection of Multiple Object Instances Using Hough Transforms", in IEEE CVPR, pp.2233-2240, 2010.