Extracting Metadata for Preservation (EMP)
The Extracting Metadata for Preservation (EMP) project developed a
metadata creation and extraction tool. It was new to ECHO DEPository Phase II.
With the increasing amount of digital content there is an increasing need to
improve the efficiency of metadata creation. Our approach was to
provide machine assistance for metadata creation using linguistic
technology. Building on work at OCLC, at the Illinois Department of
Computer Science, and at the University of Maryland, EMP developed
stand-alone open-source tools, or web services, for automated metadata
Specifically, we developed a generalized metadata tool architecture and building a Named Entity Metadata Extraction tool. Development was based on two approaches:
- to extract names from existing structured marked-up text (metadata extraction)
- to extract names from free text (metadata creation)
The key deliverables in EMP were:
- Development of a documented open-source tool for high quality Named Entity metadata extraction and creation
- This work encompassed mapping to authority files (specifically, WorldCat Identities, created by OCLC) and involved developing external metadata profiles and a machine-learning approach.
- In addition, use cases/scenarios were explored to drive system development.
- Development of a general metadata tool architecture extensible to other types of metadata tools
- An evaluation and analysis of the new tools with existing Named Entity metadata tools