查看: 2784|回复: 9

[其它] Data-driven Visual Similarity for Cross-domain Image Matching

晃晃

1023 主题	3 听众	359 积分

设计实习生

Rank: 2

纳金币: 335582
精华: 0

电梯直达

楼主

发表于 2011-12-28 09:41:04 |只看该作者 |倒序浏览

Data-driven Visual Similarity for Cross-domain Image Matching

Abhinav Shrivastava Tomasz Malisiewicz Abhinav Gupta Alexei A. Efros

Carnegie Mellon University MIT Carnegie Mellon University Carnegie Mellon University

Abstract

The goal of this work is to ﬁnd visually similar images even if they

appear quite different at the raw pixel level. This task is particu-

larly important for matching images across visual domains, such

as photos taken over different seasons or lighting conditions, paint-

ings, hand-drawn sketches, etc. We propose a surprisingly simple

method that estimates the relative importance of different features

in a query image based on the notion of “data-driven uniqueness”.

We employ standard tools from discriminative object detection in

a novel way, yielding a generic approach that does not depend on

a particular image representation or a speciﬁc visual domain. Our

approach shows good performance on a number of difﬁcult cross-

domain visual tasks e.g., matching paintings or sketches to real

photographs. The method also allows us to demonstrate novel ap-

plications such as Internet re-photography, and painting2gps.

While at present the technique is too computationally intensive to

be practical for interactive image retrieval, we hope that some of

the ideas will eventually become applicable to that domain as well.

CR Categories: I.2.10 [Artiﬁcial Intelligence]: Vision and Scene

Understanding—Learning; I.4.10 [Image Processing and Computer

Vision]: Image Representation—Statistical;

Keywords: image matching, visual similarity, saliency, image re-

trieval, paintings, sketches, re-photography, visual memex

and computational photography. Unlike traditional methods, which

employ parametric models to capture visual phenomena, the data-

driven approaches use visual data directly, without an explicit inter-

mediate representation. These approaches have shown promising

results on a wide range of challenging computer graphics problems,

including super-resolution and de-noising [Freeman et al. 2002;

Buades et al. 2005; HaCohen et al. 2010], texture and video syn-

thesis [Efros and Freeman 2001; Schodl et al. 2000], image analo-

gies [Hertzmann et al. 2001], automatic colorization [Torralba et al.

2008], scene and video completion [Wexler et al. ; Hays and Efros

2007; Whyte et al. 2009], photo restoration [Dale et al. 2009], as-

sembling photo-realistic virtual spaces [Kaneva et al. 2010; Chen

et al. 2009], and even making CG imagery more realistic [Johnson

et al. 2010], to give but a few examples.

The central element common to all the above approaches is search-

ing a large dataset to ﬁnd visually similar matches to a given query

– be it an image patch, a full image, or a spatio-temporal block.

However, deﬁning a good visual similarity metric to use for match-

ing can often be surprisingly difﬁcult. Granted, in many situations

where the data is reasonably homogeneous (e.g., different patches

within the same texture image [Efros and Freeman 2001], or dif-

ferent frames within the same video [Schodl et al. 2000]), a simple

pixel-wise sum-of-squared-differences (L2) matching works quite

well. But what about the cases when the visual content is only sim-

ilar on the higher scene level, but quite dissimilar on the pixel level?

For instance, methods that use scene matching e.g., [Hays and Efros

2007; Dale et al. 2009] often need to match images across different

illuminations, different seasons, different cameras, etc. Likewise,

retexturing an image in the style of a painting [Hertzmann et al.

2001; Efros and Freeman 2001] requires making visual correspon-

dence between two very different domains – photos and paintings.

Cross-domain matching is even more critical for applications such

as Sketch2Photo [Chen et al. 2009] and CG2Real [Johnson et al.

2010], which aim to bring domains as different as sketches and CG

renderings into correspondence with natural photographs. In all of

these cases, pixel-wise matching fares quite poorly, because small

perceptual differences can result in arbitrarily large pixel-wise dif-

ferences. What is needed is a visual metric that can capture the

important visual structures that make two images appear similar,

yet show robustness to small, unimportant visual details. This is

precisely what makes this problem so difﬁcult – the visual similar-

ity algorithm somehow needs to know which visual structures are

important for a human observer and which are not.