测绘学报 ›› 2018, Vol. 47 ›› Issue (6): 882-891.doi: 10.11947/j.AGCS.2018.20180040

• 数字摄影测量与深度学习方法 • 上一篇    下一篇

基于深度卷积特征的影像关系表创建方法

万杰1,2, Alper YILMAZ2   

  1. 1. 北京大学地球与空间科学学院遥感与地理信息系统研究所, 空间信息集成与3S工程应用北京市重点实验室, 北京 100871;
    2. 俄亥俄州立大学土木与环境工程系, 美国 俄亥俄 43210
  • 收稿日期:2018-01-23 修回日期:2018-04-17 出版日期:2018-06-20 发布日期:2018-06-21
  • 通讯作者: Alper YILMAZ E-mail:alpery74@gmail.com
  • 作者简介:万杰(1990-),男,博士生,研究方向为摄影测量学和计算机视觉。
  • 基金资助:
    国家自然科学基金(41571432)

Machine Vision Special Issue: Building Match Graph Using Deep Convolution Feature for Structure from Motion

WAN Jie1,2, Alper YILMAZ2   

  1. 1. The Institute of Remote Sensing and Geographic Information System, the School of Earth and Space Sciences, Peking University, Beijing Key Laboratory of Spatial Information Integration and 3S Engineering Application, Beijing 100871, China;
    2. Department of Civil, Environment and Geodetic Engineering, Ohio State University, Ohio 43210, USA
  • Received:2018-01-23 Revised:2018-04-17 Online:2018-06-20 Published:2018-06-21
  • Supported by:
    The National Natural Science Foundation of China (No.41571432)

摘要: 在从运动恢复结构(structure from motion,SfM)的过程中,无序影像间的匹配非常耗时,一方面受制于特征匹配本身,另一方面受制于大量的图像间匹配,其计算复杂度为On2)。为减少匹配次数,本文提出基于深度卷积特征(deep convolution feature,DCF)的影像关系表创建方法。首先利用在ImageNet上训练好的VGG-16卷积神经网络提取影像的卷积层特征图,然后对特征图进行和池化操作,最后将该向量归一化,作为图像的特征。通过向量点乘,计算数据集中的每张影像和其余所有影像的相似度,选取相似度最大的10张影像作为影像的潜在匹配像对,并由此构建影像关系表。结果表明,本文提出的DCF能够有效的创建影像关系表,找出潜在匹配像对。在Urban和South Building数据集上,基于DCF创建的关系表匹配的SfM重建的结果和穷举匹配的重建结果基本一致,但匹配次数分别减少97.4%和92.1%。同时基于DCF创建的关系表优于主流ORB-SLAM2系统中的DBoW3创建的关系表。

关键词: 深度卷积特征, 影像关系表, 从运动恢复结构, 迁移学习

Abstract: Image matching in an unordered image dataset is quite time-consuming for structure from motion (SfM) due to image matching by comparing features and large number of matches between all image pairs. To reduce matching times, deep convolution feature (DCF) is proposed to create image match graph in this paper. Firstly, the convolutional feature map of an image is extracted using the VGG-16 convolutional neural network trained on ImageNet. Then, the sum pooling is used to process the feature map. Finally, the vector is normalized and used to represent the image. The similarities between an image and all other images is calculated by calculating the distances between these feature vectors. Thus, the match graph is constructed by selecting the top 10 images with highest similarities. The experiment results showed that the proposed DCF can create the match graph effectively, find the potential image pairs. On the Urban and South Building datasets, the results of the SfM reconstruction based on the match graph created by the proposed DCF are almost the same as those of the exhaustive matching, but the number of matches are reduced by 97.4% and 92.1%, respectively. At the same time, the match graph created by the proposed DCF is obviously better than the match graph crated by the DBoW3 in the most advanced SLAM system.

Key words: deep convolution feature, match graph, structure from motion, transfer learning

中图分类号: