R Programming
R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. [ Source ]
- GLMs from Quick-R
- Model Validation: Interpreting Residual Plots
- Beginner’s guide to R from ComputerWorld
- Quick-R
- Stackoverflow tagged R questions
- RStudio Keyboard Shortcuts
- Non-Standard Evaluation Explanation
- ggplot2 Documentation
- Shiny Documentation & Tutorial
- plotly R library
- googleVis library
- rCharts library
- xgboost R package
- R for Cats
- Tidy Data
- LASSO v Ridge v Elastic Net with glmnet
Python
Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. [ Source ]
- Python 3 Documentation
- Introduction to Python for Econometrics, Statistics and Data Analysis (PDF)
- Django web framework
- Python Machine Learning Book (Repository)
- scikit-learn machine learning map guide
- pydata conference series
- pycon 2017 conference
SQL
SQL or Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). [ Source ]
Datasets
- Awesome Public Dataset (fork)
- SF Open Data Portal
- CDC: Chronic Disease and Health Promotion Open Data
- Colorado Open Data Portal
- Open Data Network
- UCI Machine Learning Repository
- Connecticut Open Data Portal
Tools
- RStudio
- Atom Text Editor
- Import.io
- Jupyter
- Rodeo IDE for Python
- Tor Browser
- HTML Github Preview
- RSeek.org Search Engine
Blogs
- Civis Analytics Data Science
- Rstudio Blog
- yHat Blog
- DataTau
- Simply Statistics
- Adventures in Data by Oliver Keyes
- Data Science 101
- Rbloggers
- FiveThirtyEight
- Data Elixer
- An aggregate of more data science blogs on github
Books
Non-technical
- The Signal & the Noise by Nate Silver
- Big Data by Viktor Mayer-Schönberger and Kenneth Cukier
- Algorithms to Live By: The Computer Science of Human Decisions
Technical
- R for Data Science by Garrett Grolemund and Hadley Wickham
- R Packages by Hadley Wickham
- Advanced R by Hadley Wickham - Chapters of Interest: Data Structures, Object-Orientated Field Guide, Style Guide
- An Introduction to Statistical Learning with Applications in R by Gareth James, Robert Tibshirani, and Trevor Hastie
- R Graphics Cookbook by Winston Chang
- ggplot2: Elegant Graphics for Data Analysis (Use R!)
- Python for Data Analysis by Wes McKinney
Online Courses/Tutorials
most courses are ish
- Stat545: Data wrangling, exploration, and analysis with R
- Rstudio resources
- The Johns Hopkins Data Science Specialization on Coursera
- Udacity Exploratory Data Analysis
- Stanford University StatLearning: Statistical Learning
General Computing
- Ten Simple Rules for Reproducible Computational Research
- Emoji Cheat Sheet for Markdown
- Version Control
- Happy Git and GitHub for the useR by Jenny Bryan!
- Git Cheat sheet (PDF)
- Github for Cats
- GitHub, Hello World
- GitHub for Everyone
- Git, the simple guide
- Try Git, interactive browser lessons
- Literate Programming
Machine Learning Resources 🤖 📈📉
Books & Papers 📚
- The Mythos of Model Interpretability
- Towards A Rigorous Science of Interpretable Machine Learning
- “Why Should I Trust You?”: Explaining the Predictions of Any Classifier
- Applied Predictive Modeling
Online Courses/Tutorials 🎒
- Machine Learning Algorithmic Deep Dive
- Practical Deep Learning for Coders
- Machine Learning Crash Course with TensorFlow APIs (Google)
- ntroduction to Machine Learning Problem Framing (Google)
- Data Preparation and Feature Engineering in ML (Google)
Software Tools 🗜️
- scikit-learn: Machine Learning in Python
- mlr: Machine Learning in R
- H2o: Fast Scalable Machine Learning API for R & Python // H2o University Lessons
- Extreme Gradient Boosting: optimized distributed gradient boosting library for Python, R, Java, Scala, C++
Ron Weasley is also mystified by the length of this resource list