- 最后登录
- 2018-6-29
- 注册时间
- 2011-7-1
- 阅读权限
- 20
- 积分
- 359
- 纳金币
- 335582
- 精华
- 0
|
Data-driven Visual Similarity for Cross-domain Image Matching
Abhinav Shrivastava Tomasz Malisiewicz Abhinav Gupta Alexei A. Efros
Carnegie Mellon University MIT Carnegie Mellon University Carnegie Mellon University
Abstract
The goal of this work is to find visually similar images even if they
appear quite different at the raw pixel level. This task is particu-
larly important for matching images across visual domains, such
as photos taken over different seasons or lighting conditions, paint-
ings, hand-drawn sketches, etc. We propose a surprisingly simple
method that estimates the relative importance of different features
in a query image based on the notion of “data-driven uniqueness”.
We employ standard tools from discriminative object detection in
a novel way, yielding a generic approach that does not depend on
a particular image representation or a specific visual domain. Our
approach shows good performance on a number of difficult cross-
domain visual tasks e.g., matching paintings or sketches to real
photographs. The method also allows us to demonstrate novel ap-
plications such as Internet re-photography, and painting2gps.
While at present the technique is too computationally intensive to
be practical for interactive image retrieval, we hope that some of
the ideas will eventually become applicable to that domain as well.
CR Categories: I.2.10 [Artificial Intelligence]: Vision and Scene
Understanding—Learning; I.4.10 [Image Processing and Computer
Vision]: Image Representation—Statistical;
Keywords: image matching, visual similarity, saliency, image re-
trieval, paintings, sketches, re-photography, visual memex
and computational photography. Unlike traditional methods, which
employ parametric models to capture visual phenomena, the data-
driven approaches use visual data directly, without an explicit inter-
mediate representation. These approaches have shown promising
results on a wide range of challenging computer graphics problems,
including super-resolution and de-noising [Freeman et al. 2002;
Buades et al. 2005; HaCohen et al. 2010], texture and video syn-
thesis [Efros and Freeman 2001; Schodl et al. 2000], image analo-
gies [Hertzmann et al. 2001], automatic colorization [Torralba et al.
2008], scene and video completion [Wexler et al. ; Hays and Efros
2007; Whyte et al. 2009], photo restoration [Dale et al. 2009], as-
sembling photo-realistic virtual spaces [Kaneva et al. 2010; Chen
et al. 2009], and even making CG imagery more realistic [Johnson
et al. 2010], to give but a few examples.
The central element common to all the above approaches is search-
ing a large dataset to find visually similar matches to a given query
– be it an image patch, a full image, or a spatio-temporal block.
However, defining a good visual similarity metric to use for match-
ing can often be surprisingly difficult. Granted, in many situations
where the data is reasonably homogeneous (e.g., different patches
within the same texture image [Efros and Freeman 2001], or dif-
ferent frames within the same video [Schodl et al. 2000]), a simple
pixel-wise sum-of-squared-differences (L2) matching works quite
well. But what about the cases when the visual content is only sim-
ilar on the higher scene level, but quite dissimilar on the pixel level?
For instance, methods that use scene matching e.g., [Hays and Efros
2007; Dale et al. 2009] often need to match images across different
illuminations, different seasons, different cameras, etc. Likewise,
retexturing an image in the style of a painting [Hertzmann et al.
2001; Efros and Freeman 2001] requires making visual correspon-
dence between two very different domains – photos and paintings.
Cross-domain matching is even more critical for applications such
as Sketch2Photo [Chen et al. 2009] and CG2Real [Johnson et al.
2010], which aim to bring domains as different as sketches and CG
renderings into correspondence with natural photographs. In all of
these cases, pixel-wise matching fares quite poorly, because small
perceptual differences can result in arbitrarily large pixel-wise dif-
ferences. What is needed is a visual metric that can capture the
important visual structures that make two images appear similar,
yet show robustness to small, unimportant visual details. This is
precisely what makes this problem so difficult – the visual similar-
ity algorithm somehow needs to know which visual structures are
important for a human observer and which are not.
|
|