MetaShare Project Website
A Collaborative Platform for Metadata Management
Metadata by nature is collaborative. Knowledge is spread out across an organization. The business users have knowledge about what a data item means and technical architects know how a data item is transformed on its way into the data warehouse.
Users are learning more all the time. They are given tasks that teach them more about the business every day. This knowledge needs to be gathered for others to use. Users need to be empowered share this knowledge in the easiest way possible. By giving them instant access to a consolidated information base, they can learn and integrate their new knowledge into the enterprise.
Metadata is something that takes active management. Metadata collaboration is an active process where an organization must share, collaborate, and manage the metadata life cycle. If an organization doesn't manage their metadata, it will get out of control.
Metadata can be presented in a fashion that enables collaboration. To enable this collaboration, we have to present the metadata in a fashion that allows users to update the metadata and bring it up to date. Wikipedia is a good example. Users submit articles, updates, and corrections to the Wikipedia online encyclopedia. Editors monitor the changes and moderate the changes to keep the quality high.
Leveraging the Kimball Spreadsheet & MetaShare
Ralph Kimball is one of the founders of data warehouse architecture. Two of his colleagues, Warren Thornthwaite and Joy Mundy (and probably others) have collaborated to create a technique for gathering requirements for a data warehouse and using those requirements to build DDL scripts to play into SQL Server to begin the process of physicalizing the data warehouse.
This worksheet is extensive and when used to its fullest includes all kinds of metadata about a data warehouse. The spreadsheet is set up to create DDL scripts for tables, views, relationships and other physical structures. It is also set up to leverage extended properties in the database to hold important pieces of metadata. After the spreadsheet is completed, the CREATE scripts are played into the database server. How can we expose this rich set of metadata to users and technical architects?
There is a tool called MetaShare that takes a SQL Server 2005 database and extracts not only the metadata out of the SQL Server catalog that describes such things as tables in databases, columns in tables, and column data types, but also the metadata stored in the extended properties that were created by the CREATE scripts generated from the Kimball Spreadsheet. Here is the list of extended properties that are supported by the Kimball spreadsheet and MetaShare out of the box:
MetaShare and the Kimball Audit Dimension
Mr. Kimball’s group has architected a structure for tracking changes in the warehouse. This includes the execution of SSIS packages and how, when, and what data those packages changed. This powerful feature of a data warehouse gets special treatment in the output of MetaShare.
There are special metadata attributes allocated to describing whether a table or column is associated with the Audit Subsystem. These attributes and tables are moved to the bottom of a MetaShare generated page in a special location so that they are easy to find and well as easy to ignore. This keeps all of the business attributes all together at the top of the page.
To read more about the Audit Subsystem, take a look at the Microsoft Data Warehouse Toolkit book written by Warren and Joy.
Using the MetaShare Platform to Update Metadata
Updating the Metadata Attributes
The pages created by MetaShare are separated into two major sections. The top of the page is created and maintained by MetaShare. The bottom is reserved for user-generated content. If the extended properties metadata requires a change, there is a companion tool web application that allows for easy updating of the Kimball spreadsheet metadata attributes after the database is deployed and in production. It allows you to choose a database on a server and navigate to the metadata attributes for the tables and columns and update them accordingly.
User Generated Content
The bottom of every page that is generated by MetaShare is reserved for user content. This is where collaboration begins to come into play. Users are encouraged to add information they know about an item or correct information they think is incorrect about that item. Users should feel empowered to make changes to the pages for two reasons. First of all, the collaboration platform retains document history. Any documents changed by a user are saved before editing begins. Comparisons between versions of a document become easy allowing for active management and moderating of changes by a metadata steward.
Managing the Metadata Lifecycle
The MetaShare metadata collaboration platform provides a framework for controlling and managing the content of the published documents. The platform is designed with an open philosophy. Mistakes are easily corrected because it provides a process by which verification of recent additions and changes to documents is easy and manageable. Every page includes a “Recent Changes” page that shows all versions of a document across time and allows for the comparison of the content of any two documents. If edits to a document are deemed incorrect, they can be reversed by restoring the document to the previous version.
Extracting SSIS Metadata
User generated content must be actively managed. If it is not, the documents will become disorganized and end up containing incorrect information. Users can subscribe to alerts that are generated when a document changes. Someone in the organization can be assigned as the metadata steward and can monitor changes to the documents. If it becomes necessary, a standing meeting can be called where stakeholders get together to talk about the changes to the documents and double check the accuracy of the information added. Wikipedia follows a very similar lifecycle through the use of moderators or editors.Utilization of the same technique will keep the metadata documents up to date and correct.
MetaShare does more than expose and help maintain your database metadata. It can also search your SSIS server for packages that are the source of the the columns in your dimensions and facts. By providing a folder on the SSIS server in which to search, MetaShare will automatically generate a document to display information about the column’s lineage inside that package. It lists the SSIS task name, task type, and the description as defined in BI Studio. Spending time and effort fully documenting SSIS packages becomes worthwhile because the data collected inside them becomes available to users.