Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.

– from a 2018 report by the National Academies of Sciences, Engineering, and Medicine.

This course aims to provide the principal foundations to working with data at scale. We will cover shell programming, git version, SQL basic, a lot of R, and, time permitting some more advanced topics (maybe Python, or C++, or using Docker).

Data analysts are in demand, and particularly for those who can walk the walk and not only talk the talk. This course aims for a “hands-on, roll-up-your-sleeves” learning-by-doing approach which can be highly rewarding to those willing to put in the required effort.

Course Objectives

After this course, students should be able to …

  • analyze and discuss code in a number of programming languages relevant to data science;
  • be fluent on the command-line to manipulate quantities of files and data with easy;
  • experienced in git for basic version control, collaboration and publishing;
  • have a solid grasp of R as a language and environment for programming with data;
  • confidently approach new data problems knowing a variety a tools from first-hand experience.