Crosswalk (metadata)
Encyclopedia
A crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema
Database schema
A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...

. It maps the elements in one schema to the equivalent elements in another schema.

For example, this is a metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 crosswalk from MARC
MARC standards
MARC, MAchine-Readable Cataloging, is a data format and set of related standards used by libraries to encode and share information about books and other material they collect...

 to Dublin Core
Dublin Core
The Dublin Core metadata terms are a set of vocabulary terms which can be used to describe resources for the purposes of discovery. The terms can be used to describe a full range of web resources: video, images, web pages etc and physical resources such as books and objects like artworks...

:

MARC field Dublin Core element
260$c (Date of publication, distribution, etc.) Date.Created
522 (Geographic Coverage Note) Coverage.Spatial
300$a (Physical Description) Format.Extent



Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC
MARC standards
MARC, MAchine-Readable Cataloging, is a data format and set of related standards used by libraries to encode and share information about books and other material they collect...

, Dublin Core
Dublin Core
The Dublin Core metadata terms are a set of vocabulary terms which can be used to describe resources for the purposes of discovery. The terms can be used to describe a full range of web resources: video, images, web pages etc and physical resources such as books and objects like artworks...

, TEI
Text Encoding Initiative
The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....

, and other metadata schemes. For example, say an archive has a MARC record in their catalog describing a manuscript. If the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog, it will have to translate the data from the MARC catalog record into a different format such as MODS that is viewable in a webpage. Because MARC has different fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "metadata mapping" or "field mapping," and is related to "data mapping,"
Data mapping
Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:...

 and "semantic mapping."

Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query.

Challenges for crosswalks

One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme, or it may have a field that is split into two different fields in another scheme; this is why you often lose data when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, you lose the distinction between types of titles:

MARC field Dublin Core element
210 Abbreviated Title Title
222 Key Title Title
240 Uniform Title Title
242 Translated Title Title
245 Title Statement Title
246 Variant Title Title



Simple Dublin Core only has one single "Title" element so all of the different types of MARC titles get lumped together without any further distinctions. This is called "many-to-one" mapping. This is also why, once you've translated these titles into Simple Dublin Core you can't translate them back into MARC. Once they're Simple Dublin Core you've lost the MARC information about what types of titles they are so when you map from Simple Dublin Core back to MARC, all the data in the "Title" element maps to the basic MARC 245 Title Statement field.

Dublin Core element MARC field
Title 245 Title Statement
Title 245 Title Statement
Title 245 Title Statement
Title 245 Title Statement
Title 245 Title Statement
Title 245 Title Statement



This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A.

Difficulties in mapping

Other mapping problems arise when:
  • One scheme has one element that needs to be split up with different parts of it placed in multiple other elements in the second scheme ("one-to-many" mapping)

  • One scheme allows an element to be repeated more than once while another only allows that element to appear once with multiple terms in it

  • Schemes have different data formats (eg: John Doe or Doe, John)

  • An element in one scheme is indexed but the equivalent element in the other scheme is not

  • Schemes may use different controlled vocabularies

  • Schemes change their standards over time


Some of these problems are simply not fixable. As Karen Coyle says in "Crosswalking Citation Metadata: The University of California's Experience,"

"The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise in order to reach a practical conclusion to projects."

Examples

MARC to Dublin Core (Library of Congress)
http://loc.gov/marc/marc2dc.html

Dublin Core to MARC21 (Library of Congress)
http://www.loc.gov/marc/dccross.html

Dublin Core to UNIMARC (UKOLN)
http://www.ukoln.ac.uk/metadata/interoperability/dc_unimarc.html

TEI to and from MARC
http://purl.oclc.org/NET/teiinlibraries

FGDC to USMARC (Alexandria)
http://www.alexandria.ucsb.edu/public-documents/metadata/fgdc2marc.html

ONIX to MARC21 (LC)
http://www.loc.gov/marc/onix2marc.html

VRA to MARC (Indiana University)
http://php.indiana.edu/%7Efryp/marcmap.html

Metadata Mappings (MIT Library)
http://libraries.mit.edu/guides/subjects/metadata/mappings.html

Mapping Between Metadata formats (UKOLN)
http://www.ukoln.ac.uk/metadata/interoperability/

International Metadata Standard Mappings (Academia Sinica)
http://www.sinica.edu.tw/%7Emetadata/standard/mapping-foreign_eng.htm

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK