Data Analysis using Stata
The aim of the course is to provide the participants with understanding and experience to undertake a basic research project using Stata as the statistical tool. Stata is a comprehensive integrated package for data management, analysis and graphics. Stata version 12 has a comprehensive GUI interface but the course focuses on Stata syntax (i.e. Do files).
The course will be presented in a way that introduces beginners to survey research and at the same time extends the capabilities of more experienced researchers. Sample datasets will be provided, but participants are encouraged to bring some of their own data for analysis in Stata, Excel or ASCII format. Teaching and practice will be closed integrated, and individual assistance will be provided as needed.
This course assumes that participants have:
- reasonable understanding of statistics to be able to comprehend the material covered in the course outline above (e.g. regression analysis)
- some familiarity with a PC environment including keyboard skills and understanding of folder and file structures
- some experience in using Microsoft Word and Excel or their equivalent
- some experience using a text editor such as Notepad, UltraEdit.
The course does not assume prior experience with Stata, SAS, SPSS or any other specific statistical packages, although any such experience would be helpful. Participants will receive a copy of the course notes on the first day.
Preparing Stata datasets
Introduction to the Stata system. Data analysis and session management. Looking at Stata datasets. Sources of help. Basic commands. Modifying data, editing, recoding, checking and tidying. Stata do-files (syntax files). Generating new variables. Inputting data into Stata. Introduction to Stata graphics. Outputting results to Word, etc. Handling strings and dates. Handling missing data.
Starting the analysis
Initial univariate analysis: frequency distributions, exploratory data analysis. Initial bivariate analysis: cross-tabulations, correlations. t-tests and analysis of variance. Developing scales and indices: summated scales, factor analysis. More graphics including scatterplots, boxplots.
Regression analysis
Introduction to regression analysis: ordinary least squares. Checking assumptions with regression diagnostics. More graphics including regression diagnostics. Basic introduction to logistic regression.
Analysis of survey data
Introduction to sampling for surveys. Weighting observations. Analysis of survey data.
Sundry topics
Advanced dataset management and the benefits of additional (user-contributed) Stata procedures.
The statistical package Stata will be used as the main teaching tool and computational aid (previous experience is not assumed).
The instructor’s detailed course notes will serve as the course text.

