The statistical package R offers a rich and flexible environment for data analysis and modeling. Due to its free availability and easy extensibility, it has become widely popular for implementing and distributing modern statistical procedures.
The course is intended for graduate students and researchers and has three goals:
For this purpose, the course will combine familiar material (elementary statistics, linear and generalized linear models, classical plots) with newer methods (machine learning, multiple testing and visualization). The latter is by necessity only a small selection of available methods; instead of trying to cover everything, we will aim at identifying suitable tools for a given problem.
The course is intended for students with previous training and/or experience in statistics: familiarity with statistical testing, linear models and logistic regression are assumed.
The course is planned for 1.5 university credits (1 week of education/training), distributed as half-days over ten weeks. During the first eight weeks, each unit will consist of an introductory lecture, followed by a computer lab where students can immediately apply the new methods. During the ninth week, students will work in small groups (2-3 persons) on a project assignment, which they will present in the last week.
Attendance will be compulsory. Missed lectures or computer labs need to be compensated for by doing and handing in extra reading and computing assignments. Grading will be based on performance during computer labs (50%) and the final project (50%). The passing grade will be 75%.
Any of the following books covers substantial parts of the course material, and can be easily supplemented with handouts:
In addition, the following books offer in-depth treatment of individual subjects from the course: