Please use this identifier to cite or link to this item:
Title: Clustering of Twitter Data using Semi-Supervised Parallel K-Means Clustering
Authors: John, J.
Keywords: Computer Science & Engineering
Issue Date: 2014
Abstract: Twitter is currently the leading micro-blogging social network and has attracted a large number of research works. This thesis proposes a data analysis technique to find groups of similar twitter messages or tweets. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as its application in news aggregating services, like those employed by Google News. Clusters produced by traditional unsupervised methods can often be incoherent from a topical perspective, and hence a semi-supervised approach has been employed, together with a parallelised K-Means clustering technique to improve efficiency in this thesis. As a case study, a Twitter data-set of over 10000 tweets has been mined/extracted using Twitter4j java library and TwitterAPI oauth feature, which has been represented in the Vector Space Model using the TF-IDF weighting score and subsequently successfully clustered with this technique.
Appears in Collections:01. CSE

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.