MetadataForums.com (a bimethods initiative) - http://www.metadataforums.com
Metadata Repositories vs. Metadata Registries
http://www.metadataforums.com/articles/257/1/Metadata-Repositories-vs-Metadata-Registries/Page1.html
Dan McCreary

Dr. Data Dictionary
Age
: 48
Gender: Male
Astrological Sign: Virgo
Zodiac Year: Rat
Industry: Consulting
Location: Minneapolis : Minnesota : United States

About Me

Technology Consultant living in Minneapolis

Interests
metadata, MDA, XML, XSLT, Ant, EAI, Data Warehousing, BI, SOA, Ontologies, Semantic Web, OWL, Data Architecture, ISO-11179, Ultimate Frisbee, Moore's Law, Singularity, Woodworking

Favorite Movies
The Matrix Blade Runner The Fifth Element Artificial Intelligence

Favorite Music
Steelie Dan Elton John Fleetwood Mac

Favorite Books
The Singualrity is Near

 
By Dan McCreary
Published on 06/1/2009
 

For several years people have been using the terms metadata Registry and Repository inconstantly, imprecisely and almost interchangeably and I would like to weigh in as to how these terms could be used more precisely to allow organizations to effectively to manage metadata processes.

First lets take the definition of a Repository. Webster defines a repository as …a place, room, or container where something is deposited or stored.. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as the see fit and put their own definition into the metadata repository as their own definition. No problems.


Metadata Repositories vs. Metadata Registries

For several years people have been using the terms metadata Registry and Repository inconstantly, imprecisely and almost interchangeably and I would like to weigh in as to how these terms could be used more precisely to allow organizations to effectively to manage metadata processes.

First lets take the definition of a Repository. Webster defines a repository as …a place, room, or container where something is deposited or stored.. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as the see fit and put their own definition into the metadata repository as their own definition. No problems.

On the other had lets take the word Registry. A Registry has the connotation of more than just a shared dumping ground. Registries have the additional capability to create workflow processes to check that new metadata is not a duplicate (for a given namespace). One of the definitions from Webster is an official record book. Note the word official.

A Repository is similar to a front-porch of a house. No locks prevent new things from landing there. But a Registry is a protected back room where human-centric workflow processes are used ensure that metadata items are non-duplicates, precise, consistent, concise, distinct, approved and unencumbered with business rules that prevent reuse across an enterprise. These registries have become the central foundation that agility can be baked-in to many enterprise process. The latest version of the Kimball's Data Warehouse Lifecycle Toolkit (which is actually a very good read) even goes as far as to call their process "metadata-driven". Not different the model-driven development world.

Registries have the implicit connotation of trust behind them. They now serve a a central process for the creation of shared meaning across the enterprise. Definitions in a registry have been vetted by an enterprise-level organization that has the responsibility of enterprise data stewardship. They have a high probability of being consistent with industry best-practices and vertical industry standards. Registries are the go-to source for creating canonical XML schemas, enterprise ontologies or conformed dimensions in a OLAP cube. Repositories are personal or small departmental definitions of an isolated view of the world.

None of these ideas are really new. They are at the core of the ISO/IEC 11179 metadata registry standard. Note that they don't call it a repository standard! People are just now starting to understand how important Registries are in most enterprise-wide systems. The growth of Business Intelligence and Enterprise Data Warehouse terminology and Service Oriented Architectures is a good place to see the rise of repositories and registries. We now see service registries, portlet registries, model registries...the list goes on-and-on.

Much of the background on the differences between the use of repositories and registries can be traced way back to the early days of object-oriented systems in the 1995 book Succeeding with Objects by Adele Goldberg and Kenneth Rubin. This was one of the first books on enterprise reuse strategies and they defined the concept enterprise asset reuse and the need for a trust-driven repository as a basis for reusing assets. They identified a multi-step process for reviewing new submissions to determine if the submission duplicated existing assets. They showed how critical it was to classify items in a registry and search an existing registry for duplicates before new items are added. If you can get a copy of the book I would suggest you read the section on "Set Up a Process for Maintaining Reusable Assets" on page 245.

The book then goes on to show how organizations can and should be structured to reuse these assets and gives the pros and cons of the differing organization structures and their impact on reuse. This is the basis for the data governance and data stewardship movement in many organizations today.

So the next time someone uses the word registry or repository in a conversation, ask them if they are using the definition of the word that is consistent with the corporate business term registry or is their own private definition from their own repository of imprecisely used buzzwords.