Computing for Data Analysis
Roger D. Peng
This course is about learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods.Sign Up
Next session: 24 September 2012 (4 weeks long)
Workload: 3-5 hours per week
About the Course
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.
About the Instructor(s)
Roger D. Peng is an associate professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and a Co-Editor of the Simply Statistics blog. He received his Ph.D. in Statistics from the University of California, Los Angeles and is a prominent researcher in the areas of air pollution and health risk assessment and statistical methods for spatial and temporal data. He created the course Statistical Programming at Johns Hopkins where it has been taught for the past 8 years. Dr. Peng is also a national leader in the area of methods and standards for reproducible research and is the Reproducible Research editor for the journal Biostatistics. His research is highly interdisciplinary and his work has been published in major substantive and statistical journals, including the Journal of the American Medical Association, Journal of the American Statistical Association, Journal of the Royal Statistical Society, and American Journal of Epidemiology. Dr. Peng is the author of more than a dozen software packages implementing statistical methods for environmental studies, methods for reproducible research, and data distribution tools. He has also given workshops, tutorials, and short courses in statistical computing and data analysis.
Recommended Background
Some familiarity with programming concepts will be useful as well basic knowledge of statistical reasoning. At Johns Hopkins, this course is taken by first-year graduate students in Biostatistics.
Suggested Readings
- Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
- S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
- Programming with Data: A Guide to the S Language by John M. Chambers (Springer)
Course Format
The course will consist of lecture videos broken into 8-10 minute segments. There will be two large programming assignments that will be graded. There will be approximately 3 hours of video content per week.
FAQ
- What resources will I need for this class?A computer is needed on which the R software environment can be installed (recent Mac, Windows, or Linux computers are sufficient).
- Is there a textbook for the class?There is no required textbook for the class and all materials will be provided. There are, however, a few suggested readings.
- How is this course different from “Data Analysis”?This course will focus on developing the programming skills necessary for managing data and for implementing statistical methods. The course will not focus on teaching properties of specific statistical algorithms unless they are used to demonstrate important programming techniques. Some of the topics covered in this course are relevant to the “Data Analysis” course but the two do not need to be taken in sequence.
No comments:
Post a Comment