Automatic retrieval of similar content using search engine query interface

Dasdan, Ali; D'Alberto, Paolo; Kolay, Santanu; Drome, Chris

We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of the document. This problem has applications in search engines for competitive coverage testing. To solve this problem, we propose approaches that work in three main steps: generate a query signature from the document, query the corpus using the query signature and scrape the returned results, and validate the similarity between the input document and the returned results.

MashMaker: mashups for the masses

Ennals, RJ; Garofalakis, MN

MashMaker: Mashups for the Masses
Rob Ennals
Intel Research Berkeley

Minos Garofalakis
Yahoo Research

Categories and Subject Descriptors: H.4.3 [Information Systems Applications]: Information Browsers General Terms: Management, Design, Human Factors, Languages Keywords: Mashup, web, end-users.

Info Toggle Search Box Search Results


Properties Property Children Arg Formula Selected Widget Properties View

MashMaker is an interactive tool for editing, querying, manipulating, and visualizing "live" semi-structured data.

Syndicate content