Southampton
University
MSc in Official Statistics
Stat6050 Statistical Computing
February 2009
This module can be taken either as an option for students registered for the MSc, or as an independent Short Course - the last presentation of the module was in the week beginning 16th February 2009. Anyone wishing to register for this or other modules should contact:
Social Statistics Programme Support Office, University of Southampton, Southampton SO17 1BJ
Tel: 023 8059 3562 or email: MOffStat
Some presentations and handouts from this module are available. If you are interested in any of these materials, please contact Andrew Westlake.
The module provides some insight into the IT issues and choices that relate to Official Statistics. The main objective of the module is to help participants to contribute constructively to discussions when new systems are being designed, and to manage operational systems.
The module will cover some standard ideas from computer science and databases, such as the Relational Database model and object-based system design using UML (the Unified Modelling Language), discussing their application to statistical systems. It will pay attention to statistical metadata and statistical dissemination systems. It emphasises the importance of generalisation in the design of systems, stepping back from the immediate details in order to capture the underlying structure of the issues being addressed.
It is not a module about statistical analysis software, nor about data capture.
Participants will understand the issues involved in the construction and operation of systems for processing, analysis and dissemination of data in official statistics.
As assessment project, students will apply the principles learnt to reviewing a planned or existing system with which they are involved or familiar.
Statistical Objectives
Technologies
Concepts
Design Components
Examples include
Click on a title to see or download course materials (where available)
Time |
Monday |
Tuesday |
Wednesday |
Thursday |
Friday |
9.30 – 10.00 |
|
Statistical
Systems
Statistical
Production Systems |
System Design |
Modelling in UML
Precision and
Complexity |
Project |
10.00 – 11.00 |
Introduction
Overview
|
||||
11.00 – 11.30
|
Break |
Break |
Break |
Break
|
|
11.30 – 12.30 |
Database Methods |
Relational databases |
Statistical Metadata |
Roundup and review
|
|
12.30 – 14.00 |
Break |
Break |
Break |
Break |
|
14.00 – 15.00 |
Database Design |
Database manipulation exercise |
Process design exercise |
UML Design exercise |
|
15.00 – 15:30 |
Break |
|
|||
15.30 – 16.00 |
Database design exercise |
Break |
Break |
Break
|
|
16.00 –17.00 |
Workflow and Process management at ONS
Andrea Staggemeier,
Information Management Group, ONS |
Integration of statistical packages at ONS
Andrea Staggemeier |
Discussion / feedback |
|
The assessment task for this module is the analysis of some statistical processing project, using the tools and concepts presented in the module. The assessment will be marked on the quality of the analysis and the clarity of the presentation.
Students are encouraged to bring their own proposals for projects, since real involvement in the outcome of the project is important for the quality of the analysis. The proposal could relate to the review or reanalysis of an existing system, to a development project currently in progress, or to a completely new system. There will be an opportunity to review proposals and focus them to the scale and requirements for assessment.
A number of more abstract projects will be available for participants who do not have their own proposals, but these will not be supported by detailed practical descriptions.
Nothing here is a prior reading requirement. Some exploration of application modelling ideas and of OLAP / Data Warehouse ideas on the web might be useful. Investigative visits to some statistical web sites (such as www.statistics.gov.uk and www.doh.gov.uk) would also be useful.
Scanning Wikipedia for information about the various terms and methods mentioned here can also be useful, as can installing the trial version of Visio 2007.
·
Date (2004). Introduction to Database systems,
8th Edition. Addison Wesley. ISBN: 0-321-19784-4.
This is the standard ‘bible’ for relational database systems, hard work, but
important if you want a deep understanding of the strengths and limitations of
relational systems.
· Date & Darwen (1997). A guide to the SQL Standard, 4th Edition, Addison-Wesley. ISBN: 0 201 96426 0
·
Dowling (2000). Database design and management
using Access. Continuum International Publishing Group; ISBN: 0826453902 ( or
1844801098).
A cheap book that works through a development project using Access. A bit out of
date because of developments in Access.
· Booch, Jacobson, & Rumbaugh (2005). The Unified Modelling Language User Guide (2nd Edition). ISBN: 0321267974
·
Fowler (2004). UML Distilled: a brief guide to
the standard object modelling language (3rd Edition). Pearson Education, ISBN:
0-321-19368-7.
This is an excellent though terse guide to the core content and use of UML,
aimed at readers with some programming background.
· Beyer & Holtzblatt (1998). Contextual Design : A Customer-Centered Approach to Systems Designs. Academic Press. ISBN: 1558604111
· Kruchten (2004). The Rational Unified Process – an Introduction (3rd Edition). Addison Wesley. ISBN: 0321197704.
· Ambler (2004). The Object Primer: Agile Model-driven Development with UML 2.0. Cambridge University Press, ISBN: 0-521-54018-6.
· McConnel (1996). Rapid Development – taming wild software development. Microsoft Press. ISBN: 1556159005
· Reed (2000). Developing Applications with Visual Basic and UML. ISBN: 0 201 61579 7.
·
Sheridan & Sekula (1999). Iterative UML
development using VB6. ISBN 1 75622701 9.
This book introduces the ideas of iterative development and works through some
projects.
·
· OMG Unified Modeling Language - http://www.uml.org/ Home page for UML, with links to tutorials.
· OMG Systems Modeling Language - http://www.omgsysml.org/ This is an adopted extension to UML, designed to provide additional facilities for modelling general systems, not just software.
·
DDI – Data Documentation Initiative Codebook
standard. www.ddialliance.org
A. international XML standard for the use of lifecycle information about social
science research data. Version 3.0 was published in April 2008. DDI facilitates
the automation of documentation and production systems for the delivery of
social science data.
· SDMX – Statistical Data and Metadata Exchange – the BIS, ECB, EUROSTAT, IMF, OECD, UN, and the World Bank have joined together to focus on business practices in the field of statistical information that would allow more efficient processes for exchange and sharing of data and metadata within the current scope of collective activities. www.sdmx.org
· ODaF – the Open Data Foundation – is dedicated to the adoption of global metadata standards and the development of open-source solutions promoting the use of statistical data, focussing on the development of tools for DDI and SDMX. www.opendatafoundation.org
·
DCMI –
An internationally approved standard for textual metadata for documentary
resources.
·
GMS – Government Metadata Standard.
www.govtalk.gov.uk (search for GMS).
A
· MetaNet – EU funded project to integrate initiatives on statistical metadata. www.epros.ed.ac.uk/metanet
Microsoft Access (all versions) is a good example of a relational database system, suitable for projects up to moderate scale (in terms of complexity and number of users, as well as physical size).
Microsoft Office. The Pivot Table component in Excel (2000 and later) is a good demonstration of the manipulation facilities developed by non-statisticians for data cubes (the earlier versions have less general functionality).
Microsoft Visio (Professional edition) contains UML and database modelling tools, as well as general diagram facilities. A free 60-day evaluation version of Visio Professional can be downloaded from Microsoft at http://trial.trymicrosoftoffice.com/trialukireland/product.aspx?sku=3082931&culture=en-GB.
Microsoft Project provides the classic project management tools, including PERT and Gantt charts. Evaluation versions are available from the Microsoft web site above. There are also a number of heavyweight project management systems available from other companies.
Rational Rose is the market leader in UML diagram and development support, and was taken over by IBM some years ago. Various suites are available (such as for Analysts or Developers) containing additional tools, including a requirements management database and code generation tools. Various presentations and evaluations are available from www.rational.com
Together is another UML modelling tool, originally more focussed on Java and round-trip code generation. The TogetherSoft company was swallowed by Borland a couple of years ago, and is now integrated. www.borland.com
Poseidon is a commercial implementation of UML 2, and includes a free Community Edition that excludes more advanced features. www.gentleware.com
hyperModel is a UML 2 tool for developing XML schemas as UML Class diagrams. It is free and can be downloaded from www.xmlmodeling.com.
Beyond 20/20 is a dissemination and manipulation tool for multi-way tables (data cubes) aimed at statistical users. There are versions for independent use with downloaded files, and for building web dissemination servers. It has been used by a number of statistical offices, including ONS (Neighbourhood Statistics), Unesco, Statistics Canada and the US Census Bureau. The developer site, at www.beyond2020.com, has various demos and descriptions, including downloads.
Nesstar is an infrastructure for data dissemination via the Internet. Nesstar Explorer offers an end user interface for searching, analysing and downloading data and documentation. Nesstar Server offers tools and resources for making data and documentation available via the Internet. Makes heavy use of the DDI Codebook metadata standard. www.nesstar.org
Andrew Westlake, 27-Jan-2009
Page last updated 27 February 2009 .