Defining Mapping Mashups with BioXMash

Hunt, E; Jakubowska, J; Bösinger, C; Norrie, MC
Hunt, E
Jakubowska, J
Bösinger, C
Norrie, MC
Journal of Integrative Bioinformatics
We present a novel approach to XML data integration which allows a biologist to select data
from a large XML file repository, add it to a genome map, and produce a mapping mashup
showing integrated data in map context. This approach can be used to produce contextual
views of arbitrary XML data which relates to objects shown on a map. A biologist using
BioXMash searches in XML tags, and is guided by XML path data availability, shown as
the number of values reachable via a path, in both global, genome-wide, and local, pergene,
context. Then she examines sample values in an area of interest on the map. If
required, the resulting data is dumped to files, for subsequent analysis.
This is a lightweight integration approach, and differs significantly from other known methods.
It assumes that data integration can be performed on a lab computer with limited
memory, with no database installation or programming knowledge. It is different from
BioMarts which predefine possible data selections, in that arbitrary data sources related to
map items can be used. BioXMash offers full textual search in XML paths, shows path
statistics, and supports visual verification of data values. Repeated scanning of all XML
files at query time is avoided by the use of a high level indexing technique. Our prototype
demonstrates this new approach. It efficiently supports data browsing on 2 GB od data
from GeneCards with an index size of 40 MB.

