Data science tools


Welcome to my teaching page! My twin brother Shervine and I created the following set of illustrated study guides for the 15.003 class that I am currently teaching at MIT. They cover the main concepts in data retrieval, data manipulation, data visualization and productivity tips using SQL, R, Python, Git and Bash. They can (hopefully!) be useful to all future students of this course as well as to anyone interested in Data Science.


Data retrieval
Star


Data retrieval with SQL
  • • Filtering, conditions and data types
  • • Types of joins (inner, full, left, right, cross)
  • • Aggregations, window functions
  • • Table manipulation

Data manipulation


Data manipulation with R
  • • Filtering, conditions and data types
  • • Types of joins (inner, full, left, right, cross)
  • • Aggregations, window functions
  • • Data frame transformations
Data manipulation with Python
  • • Filtering, conditions and data types
  • • Types of joins (inner, full, left, right, cross)
  • • Aggregations, window functions
  • • Data frame transformations
R-Python conversion for data manipulation
  • Conversion made easy between R (tidyr, dplyr, lubridate) and Python (pandas, numpy, datetime)

Data visualization


Data visualization with R
  • • Scatterplots, line plots, histograms
  • • Boxplots, maps
  • • Customized legend
Data visualization with Python
  • • Scatterplots, line plots, histograms
  • • Boxplots, maps
  • • Customized legend
R-Python conversion for data visualization
  • Conversion made easy between R (ggplot2) and Python (matplotlib, seaborn)

Engineering tips


Engineering tips with Git, Bash
  • • Version control with Git
  • • Working with the terminal with Bash
  • • Mastering editors with Vim