Resources
Visualization Tools/Library/Framework
Web Development
- W3Scoools: A quick reference for HTML, CSS, Boostrap, jQuery, JavaScript, and AngularJS.
- Bootstrap: An open source toolkit for developing HTML, CSS, and JavaScript.
- jQuery: A JavaScript library for quick and interactive web development.
- AngularJS: A JavaScript framework for quick and interactive web development.
- Tutorial Resources: Code School,
Code Academy,
CodingGround,
KeyLines,
Lynda.com (Free for LTU students)
Teamwork / Version Control
- Github:
A web-based Git version control repository hosting service
- Jupyter Notebook:
To develop open-source software, open-standards, and services for interactive computing
Data Science / Data Mining
- R:
Data Mining Algorithms In R.
- Rapidminer: a software platform for data
science teams that unites data prep, machine learning, and predictive model deployment.
- Weka: A collection of machine learning algorithms for data mining tasks
- Orange: Open source machine learning and data visualization for novice and expert.
- KNIME: An open source data analytics, reporting and integration platform
Deep Learning
- Tensorflow:
An open source machine learnig framework by Google
- PyTorch:
An open source deep learning platform by Facebook
- Keras:
The Python deep learning library. Now is included in Tensorflow.
- Anaconda:
A Python Data Science Platform
- Colab:
Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.
Python for Data Science
- Data Wrangling: NumPy, SciPy, Pandas
- Visualization: Matplotlib, Seaborn, Bokeh, Plotly
- Machine Learning: SciKit-Learn, Keras, TensorFlow, Theano
- Data Scraping: Scrapy, Beautiful Soup
- NLP: NLTK, Gensim, spaCY
R for Data Science
- Visualization: ggplot2, GoogleVis
- Data Wrangling: plyr, data.table
- Missing Value: MissForest, MissMDA
- Outlier Detection: Outlier, EVIR, mvoutlier
- Geocoding: ggmap
- Feature Selection: Features, RRF
- Dimension Reduction: FactoMineR, CCP
- Regression: car, randomforest, RMliner, CoreLearn
- Classification: Caret, BigRF
- Clustering: CBA, RankCluster
- Time Series: forecast, LTSA
- Survival: survival, Basta
- General Model Validation: LSMeans, Comparison
- Regression Validation: RegTest, ACD
- Classification Validation: BinomTools, DAIM
- Clustering Validation: ClustEval, SigClust
- ROC Analysis: PROC, TimeROC
Natural Language Processing
- NLTK:
A platform for building Python programs to work with human language data.
- GATE:
General Architecture for Text Engineering or GATE is a Java suite of tools for text mining
- CoreNLP: Provides a set of human language technology tools.
- Stanford Parser: A program that works out the grammatical structure of sentences,
- OpenNLP: Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
- MALLET:
A Java-based package for statistical NLP, document classification, clustering, topic modeling, and other machine learning applications to text.
- LIWC: The gold standard in computerized text analysis.
Data Sources
- Kaggle:
A platform for predictive modeling and analytics competitions
- Tableau Sample Data:
A buffet of materials to help get you started, or take you to the next level.
- General Data: Data.world,
Google Public Data,
AWS Public Datasets,
undata,
Airbnb ,
OECD Data ,
Forbes BigData ,
Gapminder World ,
Climate Data ,
Dataverse
- Financial Data: SEC Financial Statement Data Sets,
World Bank,
Quandl
- Health Data: GHDx,
Global Health Observatory,
unicef,
KeyLines,
Centers for Disease Control and Prevention
- Government Data: Food Environment Atlas (US),
US Census,
USA Spending,
Centers for Disease Control (US),
Data.gov ,
College Schoecard Data ,
NASA's Data Portal,
Migration Data Hub,
Open Alberta (Canada),
Open Government (Canada),
Data.Gov.UK (UK),
UK Data Service (UK Data Service)
- City Data: Data Driven Detorit,
Chicago Data Portal,
Detroit Open Data,
Los Angeles Open Data,
NYC Open Data,
Seattle Open Data,
Atlanta Open Data