pypi Build Status codecov pyup python3 Documentation Status DeepSource

Welcome to TwitterML

Project to analyse text streams (tweets or docs) using big data and machine learning. Uses Apache Spark to built textual metrics, then processes the text via various classification models to evaluate the sentiment (models via SciKit-Learn).

waffle wordcloud learning_curve roc_kfolds

Features

  • Classifier Builder - standalone tool to configure classifiers and train them using pre-classified samples
  • Text Classify - a standalone program for classifying the sentiment of text using NLTK and SciKit-Learn classifiers
  • Document Scanner - a program for classifying text documents on the Spark platform
  • Twitter-Kafka Publisher - reads tweets from Twitter and pumps them into a Kafka server (where they can be consumed by out Twitter Consumer programs).
  • Twitter Analyser - reads tweets from Kafka and performs analysis of the text using the Spark platform.