Resources

Visualization Tools/Library/Framework


Web Development


Teamwork / Version Control

  • Github: A web-based Git version control repository hosting service
  • Jupyter Notebook: To develop open-source software, open-standards, and services for interactive computing

Data Science / Data Mining

  • R: Data Mining Algorithms In R.
  • Rapidminer: a software platform for data science teams that unites data prep, machine learning, and predictive model deployment.
  • Weka: A collection of machine learning algorithms for data mining tasks
  • Orange: Open source machine learning and data visualization for novice and expert.
  • KNIME: An open source data analytics, reporting and integration platform

Deep Learning

  • Tensorflow: An open source machine learnig framework by Google
  • PyTorch: An open source deep learning platform by Facebook
  • Keras: The Python deep learning library. Now is included in Tensorflow.
  • Anaconda: A Python Data Science Platform
  • Colab: Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

Python for Data Science

  • Data Wrangling: NumPy, SciPy, Pandas
  • Visualization: Matplotlib, Seaborn, Bokeh, Plotly
  • Machine Learning: SciKit-Learn, Keras, TensorFlow, Theano
  • Data Scraping: Scrapy, Beautiful Soup
  • NLP: NLTK, Gensim, spaCY

R for Data Science

  • Visualization: ggplot2, GoogleVis
  • Data Wrangling: plyr, data.table
  • Missing Value: MissForest, MissMDA
  • Outlier Detection: Outlier, EVIR, mvoutlier
  • Geocoding: ggmap
  • Feature Selection: Features, RRF
  • Dimension Reduction: FactoMineR, CCP
  • Regression: car, randomforest, RMliner, CoreLearn
  • Classification: Caret, BigRF
  • Clustering: CBA, RankCluster
  • Time Series: forecast, LTSA
  • Survival: survival, Basta
  • General Model Validation: LSMeans, Comparison
  • Regression Validation: RegTest, ACD
  • Classification Validation: BinomTools, DAIM
  • Clustering Validation: ClustEval, SigClust
  • ROC Analysis: PROC, TimeROC

Natural Language Processing

  • NLTK: A platform for building Python programs to work with human language data.
  • GATE: General Architecture for Text Engineering or GATE is a Java suite of tools for text mining
  • CoreNLP: Provides a set of human language technology tools.
  • Stanford Parser: A program that works out the grammatical structure of sentences,
  • OpenNLP: Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
  • MALLET: A Java-based package for statistical NLP, document classification, clustering, topic modeling, and other machine learning applications to text.
  • LIWC: The gold standard in computerized text analysis.

Data Sources




© 2020 Dr. Chih Hao Ku at c.ku17@csuohio.edu