metadataforums.com (a bimethods initiative)
Home Home | Terms Of Us Terms of Use | Contact Us Contact Us | Site Map Site Map | bimethods bimethods

Introductory

Metadata: Use It or Lose It

You've probably heard the word "metadata," but what exactly is this mysterious information? Equally important how should it be handled? Failure to adequately capture and preserve metadata associated with electronic discovery materials has been considered spoliation of evidence and grounds for significant discovery sanctions. Conversely, a failure to erase metadata from outgoing law firm correspondence can inadvertently communicate a wealth of inappropriate information to recipients. This article reveals how metadata is created, its purpose, and situations where metadata can assume significant importance.

What is metadata?

People normally define an electronic document, spreadsheet, e-mail message or other digital data as the content they can see, the information intentionally added by the document's author. However, almost all computer programs automatically add additional information to a document, usually storing it at the beginning or end of the electronic file where the program can easily find it. This information is generically known as "metadata."

When a file is attached to an e-mail message or copied from one location to another, much of its associated metadata is also transferred with the file. Some metadata, however, is automatically updated to reflect the new location or other actions involving the file. As a result, computer files that appear identical when viewed or printed may have different metadata.

Understandig Meta-Data

Any practicing attorney who uses a computer has probably heard the dreaded word "metadata" -- but what exactly is this mysterious information? Equally important, how should it be handled? Failure to adequately capture and preserve metadata associated with electronic discovery materials has been considered spoliation of evidence and grounds for significant discovery sanctions. Conversely, a failure to erase metadata from outgoing law firm correspondence can inadvertently communicate a wealth of inappropriate information to recipients. This article reviews how metadata is created, its purpose, and situations where metadata can assume significant importance."

 

What is metadata?

 

When most people think about an electronic document or an e-mail message, they define the file by the content that has been intentionally created by the document's author—e.g., our typing. However, to function more efficiently, virtually all computer programs automatically track additional information that relates to a document, usually storing it at the beginning or end of the file, where it can easily be found by the program. This information is generically known as "metadata."

                      

When a file is attached to an e-mail message or copied from one location to another, much of its associated metadata is also transferred with the file. Some metadata, however, will automatically update to reflect the new location of the file. As a result, computer files that appear identical when printed out may have different metadata.

 

Many different forms of metadata exist, and each program tracks metadata that is appropriate for the particular data files it creates. Most word processing programs available today, for example, identify and record at least the following information:

·         Name of the user logged into the computer

·         The number of characters and words in the document

·         How often and for how long the document has been edited

·         Revisions that have been made to the document.

 

E-mail programs like Microsoft Outlook can associate e-mail messages with literally hundreds of pieces of metadata information, such as folders where messages are stored, whether messages were forwarded, and whether recipients opened a mail message that they were sent.  Many of these metadata fields, however, remain empty unless certain actions take place; in most cases, the majority of possible metadata fields are actually empty.

 

In addition to metadata information that is written into a data file by word processing (or spreadsheet or e-mail) software, a computer's operating system also tracks information about the files stored where they can be accessed. This data includes:

  • Document size;
  • Date and time the document was last saved and last accessed;
  • Location where the document has been stored; and
  • Which users have rights to access the document.

Questions Meta-Data Can Answer

The world of information technology has "grown-up" dramatically in the last fifteen years -- the term of my comparably short career. From the days of punching cards and feeding deck readers at midnight at the university computer lab to the world of dot-coms, electronic business, and business intelligence, one might believe that they have seen it all.

But not even close … One can only imagine what the next fifteen years have in store for us. Post-Y2k and for the foreseeable future, the need and speed to manage data, information, and knowledge will (if it has not already) become THE business driver.

The Power of Meta-Data

Data within your database is just that, data. It becomes information only when you can effectively extract and distribute in an understandable form. Using meta-data has come a long way; it allows for free-flowing information and is putting the ability to extract information into the hands of end-users.

When I first started my career in data processing, I began as a COBOL programmer. Back then, meta-data was strictly “data about data” and, in those days, we had to rely on s FDs, copy books, and a scarce set of documentation that comprised our meta-data. Such resources told us where data was located and what particular objects (tables) and columns meant.

Today, meta-data has grown into a complete subject area encompassing such terms as Knowledge Management, Corporate Data Dictionaries, and Enterprise Meta-Data Repositories. All of these classifications have the single goal to categorize the underlying information buried within databases so that end users can get to information faster. They do this by employing tools that allow end users to unlock the mysteries of what information is available and where it is located.

Imagine that you are cornered in the break room by the CEO or called into a sales meeting and asked to produce a report that describes demographic information for current customers. This task seems to be quick and easy, at least at face value. But, you quickly remember that you have distributed databases that control different product lines, and the way objects have been defined within those databases are quite different. To compound the issue, your CEO is starting to describe “customers” as anyone who has purchased products or services from you including potential customers that are not stored in a database yet but are in multiple spreadsheets maintained by your sales reps. No doubt, it is becoming obvious that these requirements will be changing and you begin to feel like you will be trying to hit a moving target.

The term "metadata" has been used for many years now. It dates back to the 60’s, 70's and 80's, when "metadata" was used to describe the COBOL, VSAM, and IMS copybooks that ran on IBM mainframe systems. In those days, Fortune 2000 companies would purchase and implement something called a "corporate data dictionary" to help them better manage their copybook definitions, generate COBOL records and DDL, etc ... Today, the corporate data dictionary has evolved into what we now call an "enterprise metadata repository" or "metadata management system". The term "metadata" was also embraced and popularized by industry groups such as DAMA, and by software vendor initiatives from IBM (AD/Cycle), Microsoft, OMG/CWM, CDIF, and others. The term "metadata" is also widely used within the geospatial, document management, and military software markets.

Defining Meta Tags is much easier than explaining how they are used, and by which engines. The reason is very few engines clearly lay out what they do and do not look at, and how much emphasis they put on any one factor. So, we’ll start with the easy part

Meta Tags are lines of HTML code embedded into web pages that are used by search engines to store information about your site. These "tags" contain keywords, descriptions, copyright information, site titles and more. They are among the numerous things that the search engines look for, when trying to evaluate a web site.

The prefix “meta” comes from the Greek and can indicate change, as in metamorphosis; or it can mean beyond or after, as in metaphysics. In information technology usage, the word meta-data has come to be used as a definition or description of data: a small indicator that encompasses and points to a larger piece of information. The library card catalog is the standard metaphor for meta-data: each card represented and led the user to a much larger body of information, the book or other item cataloged.

This article is not intended to define or debate the differences between structured and unstructured data.  This author considers structured data to be tabular or delimited by nature and recorded in a file or database table.  For the purpose of this article, unstructured data will be referred to as "artifacts".  Artifacts includes data/documents/content recorded in electronic format that can be managed and leveraged for the benefit of your company, your customers, your suppliers, etc.  Artifacts include word processing files, html files (web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails ... any data that is not in tabular or delimited format.  Some people call this recorded knowledge.  Some people call this web content.  Some people call this data documents as in document management.  Everybody calls it valuable.  For this article, that is the definition of unstructured data.

Over the next few years many companies will have the unenviable task of completely rebuilding their decision support systems. This is occurring because many of these systems were built with flawed architectures. The architecture used to build the meta data repository is every bit as critical to its long-term viability as the architecture for the decision support system is. By taking the time to build a sound architecture your repository effort will be able to grow and mature over time to support all of your company’s meta data needs.

Who Needs Metadata?

A few weeks ago, I was asked to help oversee a client's business intelligence project that was running behind schedule. By the time I joined the project, the requirements and design had been completed, and the delivery team had finished most of its development work. The ETL process was working properly, and several reports had been developed. However, the project had been stalled for several weeks, having made little or no progress. The reason for the delay was that each of the reports that had been developed needed to be "certified."

Approximately 15 reports had been requested for the first release of the system. The requirements team had been told that most of these would replace reports that users were currently receiving. The team was provided with a set of the existing reports, which were mainly Microsoft Excel spreadsheets. The reports looked simple enough. Each column had a heading that identified the data it contained, and the team was told where the data was located. An estimate was made for how long the reports would take to develop. So what was the problem? In a word, metadata.