Please use this identifier to cite or link to this item:
|Title:||Clustering of Twitter Data using Semi-Supervised Parallel K-Means Clustering|
|Keywords:||Computer Science & Engineering|
|Abstract:||Twitter is currently the leading micro-blogging social network and has attracted a large number of research works. This thesis proposes a data analysis technique to find groups of similar twitter messages or tweets. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as its application in news aggregating services, like those employed by Google News. Clusters produced by traditional unsupervised methods can often be incoherent from a topical perspective, and hence a semi-supervised approach has been employed, together with a parallelised K-Means clustering technique to improve efficiency in this thesis. As a case study, a Twitter data-set of over 10000 tweets has been mined/extracted using Twitter4j java library and TwitterAPI oauth feature, which has been represented in the Vector Space Model using the TF-IDF weighting score and subsequently successfully clustered with this technique.|
|Appears in Collections:||01. CSE|
Files in This Item:
|Clustering of Twitter Data using Semi-Supervised Parallel K-Means Clustering.pdf||1.14 MB||Adobe PDF||View/Open Request a copy|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.