SSC Meta-Data System |
Statistical packages continue to be the software of choice for the storage and management of much statistical data. However, there are a number of situations, particularly with complex data structures, or long-term data collection, where relational database management systems (RDBMS) provide a better solution.
A problem for statistical users is that an RDBMS does not provide the meta-data facilities which we expect from statistical packages. Constructing appropriate data structures for meta-information is not too difficult (though inappropriate choices are often made), but functionality is also needed to support the use of the information.
Our primary objective is to add information about the meaning of the data stored in a statistical database. Once we start down this route it immediately becomes necessary to link to the information about the structure of the application maintained by the host system, so further facilities to support this access are included, though not elaborated in this paper.
This system tackles two separate issues, relating to two separate types of meta-data, one specific to statistical data, and the other applicable to databases in general. Following the terminology of H-J Lenz we refer to these as Conceptual Meta-data (CMD) and Technical Meta-data (TMD).
The CMD component provides features beyond those normally found in a relational database, relating to the meaning of the information stored in a database (its content). The system provides a conceptual model for the structure of this additional information (Scales, Codes, Labels, Recodes, etc), tables in which to store it, and functionality to input, manage and retrieve it. The design of this component is generic, though the particular implementation makes substantial use of the facilities of the host RDBMS (in this case MS Access).
The TMD component relates to the structure of the host application, and makes use of information already held by the host system. The purpose is to improve the ease of access to information about the tables and columns in the application, and the queries, forms and reports used to run the application (its technical structure). The objective is to present this information in ways which are particularly useful for the management and maintenance of the host application. The facilities provided are mainly queries and reports for accessing this information. This component is highly dependent on the structure of the host system.
An interface is needed between the CMD and TMD components, so that the meaning information (in the CMD component) can be related to the actual information and structures in the host database. This interface is kept as simple as possible, and is achieved by associating the names of Fields in the host application with the corresponding Scales in the CMD system. For this purpose we define a Field to be any structure in the host application which can hold or return information, specifically columns in tables and queries, and data controls in forms and reports.
In addition, the programmer of the host system must be able to access and use the information in the CMD component, and this is provided as a collection of procedures which can be called from the host. For the database administrator we provide a collection of Forms and Reports for managing the content of the CMD system.
We adopt the Global Relation approach to naming which means that the same name always means the same thing, so that the CMD system does not need to know exactly where a name comes from, only that it is the name of a Field.
Our objective is to provide the basic meta-data facilities found in statistical packages (labeling, missing values, etc.), while still making the best use of the standard facilities provided by the relational model. We wish to provide:
Where feasible we wish to provide better facilities than statistical packages.
We make no attempt to integrate large-scale external meta-data (for example, information about origins or ownership of data), nor to manage time-varying versions of definitions. It is possible to envisage such extensions to the structure presented here, but has not been appropriate within the bounds of the current project. That is why we describe this as a simple system.
The CMD tables and queries
are held in a database separate from the host application,
referred to as the Meta database, and usually called
Meta.MDB (under MS Access). The supporting routines, forms and
reports are in the Meta Library (called
SSC Meta.MDB). For the TMD component the queries and tables
must be present in the host application, but again the routines,
forms and reports which access this are in the Meta Library. For
a new application (a new set of meta-data) the meta-data tables
must be initialised with information needed by the system. This
is most easily done by copying from a template database.
The Meta database is linked to the host application only through the names of Fields, so can be used by more than one application. This might be useful where an organisation has a set of applications which make use of the same underlying concepts but which provide different functionality or support different process. By using a common meta-data collection you are sure that all the applications are using the same definitions, and any changes are immediately available to all applications. In this situation the CMD component will be shared by several host applications, each of which will have its own TMD facilities.
Both the conceptual and the technical meta-data components have their own conceptual meta-data. This is stored in the Meta database, and is already present in the template. The Scales used in this way have a special status flag so that they can easily be identified.
The Meta database and the Library also both have their own technical meta-data. This is stored in the appropriate database and is only accessible when that database is opened directly. Thus the TMD facilities can be used in the system's own databases to document the internal structure of these databases, as well as in the host application.
Page last updated 18 June, 2003.