Southampton University
MSc in Official Statistics

Stat6050 Statistical Computing
February 2009

This module can be taken either as an option for students registered for the MSc, or as an independent Short Course - the last presentation of the module was in the week beginning 16th February 2009. Anyone wishing to register for this or other modules should contact:

Social Statistics Programme Support Office, University of Southampton, Southampton SO17 1BJ
Tel: 023 8059 3562 or email: MOffStat

Some presentations and handouts from this module are available. If you are interested in any of these materials, please contact Andrew Westlake.

1. Overview

Aims and objectives

The module provides some insight into the IT issues and choices that relate to Official Statistics. The main objective of the module is to help participants to contribute constructively to discussions when new systems are being designed, and to manage operational systems.

The module will cover some standard ideas from computer science and databases, such as the Relational Database model and object-based system design using UML (the Unified Modelling Language), discussing their application to statistical systems. It will pay attention to statistical metadata and statistical dissemination systems. It emphasises the importance of generalisation in the design of systems, stepping back from the immediate details in order to capture the underlying structure of the issues being addressed.

It is not a module about statistical analysis software, nor about data capture.

Learning outcome

Participants will understand the issues involved in the construction and operation of systems for processing, analysis and dissemination of data in official statistics.

As assessment project, students will apply the principles learnt to reviewing a planned or existing system with which they are involved or familiar.

Summary of content

Statistical Objectives

Technologies

Concepts

Design Components

Examples include

2. Timetable

Click on a title to see or download course materials (where available)

Time

Monday

Tuesday

Wednesday

Thursday

Friday

9.30 – 10.00

 

Statistical Systems

Statistical Production Systems  
Process and Use of Structure

British Crime Survey

System Design
Project Specification and Management
User-centred design
Review of System Modelling
HIV and AIDS System

Data Structures and Objects

Modelling in UML
Structure, Behaviour, Activity, Requirements

Precision and Complexity

Project

10.00 – 11.00

Introduction
Assessment tasks
Handouts

Overview
Concepts: Databases, Functionality, Design, Abstraction
Methods: Structure, Process, Metadata
Tools: Relational DB, UML, XML

Roles of Database and Statistical systems

11.00 – 11.30 

Break

Break

Break

Break 

 

11.30 – 12.30

Database Methods
Pakistan Fertility Survey

Relational databases
Use of Access for DB

Statistical Metadata
Scope, purpose, relationship to other meta-data.
Representation and presentation.

XML and Design

Roundup and review 

 

12.30 – 14.00

Break

Break

Break

Break

 

14.00 – 15.00

Database Design
Use of Visio for DB design

Database manipulation exercise
Use of Access for design and testing

Process design exercise

UML Design exercise

 

15.00 – 15:30

Break

 

15.30 – 16.00

Database design exercise

Break

Break

Break 

 

16.00 –17.00

Workflow and Process management at ONS

Andrea Staggemeier, Information Management Group, ONS

Integration of statistical packages at ONS

Andrea Staggemeier

Discussion / feedback

 

3. Assessment

The assessment task for this module is the analysis of some statistical processing project, using the tools and concepts presented in the module. The assessment will be marked on the quality of the analysis and the clarity of the presentation.

Students are encouraged to bring their own proposals for projects, since real involvement in the outcome of the project is important for the quality of the analysis. The proposal could relate to the review or reanalysis of an existing system, to a development project currently in progress, or to a completely new system. There will be an opportunity to review proposals and focus them to the scale and requirements for assessment.

A number of more abstract projects will be available for participants who do not have their own proposals, but these will not be supported by detailed practical descriptions.

4. References

Nothing here is a prior reading requirement. Some exploration of application modelling ideas and of OLAP / Data Warehouse ideas on the web might be useful. Investigative visits to some statistical web sites (such as www.statistics.gov.uk and www.doh.gov.uk) would also be useful.

Scanning Wikipedia for information about the various terms and methods mentioned here can also be useful, as can installing the trial version of Visio 2007.

Databases

·         Date (2004). Introduction to Database systems, 8th Edition. Addison Wesley. ISBN: 0-321-19784-4.
This is the standard ‘bible’ for relational database systems, hard work, but important if you want a deep understanding of the strengths and limitations of relational systems.

·         Date & Darwen (1997). A guide to the SQL Standard, 4th Edition, Addison-Wesley. ISBN: 0 201 96426 0

·         Dowling (2000). Database design and management using Access. Continuum International Publishing Group; ISBN: 0826453902 ( or 1844801098).
A cheap book that works through a development project using Access. A bit out of date because of developments in Access.

Systems Design and UML

·         Booch, Jacobson, & Rumbaugh (2005). The Unified Modelling Language User Guide (2nd Edition). ISBN: 0321267974

·         Fowler (2004). UML Distilled: a brief guide to the standard object modelling language (3rd Edition). Pearson Education, ISBN: 0-321-19368-7.
This is an excellent though terse guide to the core content and use of UML, aimed at readers with some programming background.

·         Beyer & Holtzblatt (1998). Contextual Design : A Customer-Centered Approach to Systems Designs. Academic Press. ISBN: 1558604111

·         Kruchten (2004). The Rational Unified Process – an Introduction (3rd Edition). Addison Wesley. ISBN: 0321197704.

·         Ambler (2004). The Object Primer: Agile Model-driven Development with UML 2.0. Cambridge University Press, ISBN: 0-521-54018-6.

·         McConnel (1996). Rapid Development – taming wild software development. Microsoft Press. ISBN: 1556159005

·         Reed (2000). Developing Applications with Visual Basic and UML. ISBN: 0 201 61579 7.

·         Sheridan & Sekula (1999). Iterative UML development using VB6. ISBN 1 75622701 9.
This book introduces the ideas of iterative development and works through some projects.

·         Westlake (2002). ‘XML, and the Design of Standards’. ASC conference on ‘Open Standards: Breaking down the barriers’, September 2002 – www.asc.org.uk.

·         OMG Unified Modeling Language - http://www.uml.org/ Home page for UML, with links to tutorials.

·         OMG Systems Modeling Language - http://www.omgsysml.org/ This is an adopted extension to UML, designed to provide additional facilities for modelling general systems, not just software.

Statistical Metadata

·         DDI – Data Documentation Initiative Codebook standard. www.ddialliance.org
A. international XML standard for the use of lifecycle information about social science research data. Version 3.0 was published in April 2008. DDI facilitates the automation of documentation and production systems for the delivery of social science data.

·         SDMX – Statistical Data and Metadata Exchange – the BIS, ECB, EUROSTAT, IMF, OECD, UN, and the World Bank have joined together to focus on business practices in the field of statistical information that would allow more efficient processes for exchange and sharing of data and metadata within the current scope of collective activities. www.sdmx.org

·         ODaF – the Open Data Foundation – is dedicated to the adoption of global metadata standards and the development of open-source solutions promoting the use of statistical data, focussing on the development of tools for DDI and SDMX. www.opendatafoundation.org

·         DCMI – Dublin Core Metadata Initiative. www.dublincore.org
An internationally approved standard for textual metadata for documentary resources.

·         GMS – Government Metadata Standard. www.govtalk.gov.uk (search for GMS).
A UK extension of Dublin Core, now at version 3.1.

·         MetaNet – EU funded project to integrate initiatives on statistical metadata. www.epros.ed.ac.uk/metanet

Software

Microsoft Access (all versions) is a good example of a relational database system, suitable for projects up to moderate scale (in terms of complexity and number of users, as well as physical size).

Microsoft Office. The Pivot Table component in Excel (2000 and later) is a good demonstration of the manipulation facilities developed by non-statisticians for data cubes (the earlier versions have less general functionality).

Microsoft Visio (Professional edition) contains UML and database modelling tools, as well as general diagram facilities. A free 60-day evaluation version of Visio Professional can be downloaded from Microsoft at http://trial.trymicrosoftoffice.com/trialukireland/product.aspx?sku=3082931&culture=en-GB.

Microsoft Project provides the classic project management tools, including PERT and Gantt charts. Evaluation versions are available from the Microsoft web site above. There are also a number of heavyweight project management systems available from other companies.

Rational Rose is the market leader in UML diagram and development support, and was taken over by IBM some years ago. Various suites are available (such as for Analysts or Developers) containing additional tools, including a requirements management database and code generation tools. Various presentations and evaluations are available from www.rational.com

Together is another UML modelling tool, originally more focussed on Java and round-trip code generation. The TogetherSoft company was swallowed by Borland a couple of years ago, and is now integrated. www.borland.com

Poseidon is a commercial implementation of UML 2, and includes a free Community Edition that excludes more advanced features. www.gentleware.com

hyperModel is a UML 2 tool for developing XML schemas as UML Class diagrams. It is free and can be downloaded from www.xmlmodeling.com.

Beyond 20/20 is a dissemination and manipulation tool for multi-way tables (data cubes) aimed at statistical users. There are versions for independent use with downloaded files, and for building web dissemination servers. It has been used by a number of statistical offices, including ONS (Neighbourhood Statistics), Unesco, Statistics Canada and the US Census Bureau. The developer site, at www.beyond2020.com, has various demos and descriptions, including downloads.

Nesstar is an infrastructure for data dissemination via the Internet. Nesstar Explorer offers an end user interface for searching, analysing and downloading data and documentation. Nesstar Server offers tools and resources for making data and documentation available via the Internet. Makes heavy use of the DDI Codebook metadata standard. www.nesstar.org

Andrew Westlake, 27-Jan-2009


Back to Top, Home

Page last updated 27 February 2009 .