So here is the thing: How people are sharing on Twitter around the world? What are the devices or services they usually use to share their check-ins, photos, videos, or updates on Twitter?
This is a really simple analytics I did on a data that I’ve been gathering for almost 2 months now (around 97 million tweets from US and 5.5 million tweets from France by the time of this study) to get some answers for the above question by using Hadoop batch processing.
I have 4 EC2 instances up and running 24×7 to track tweets (from Twitter Public Streaming API) and store them into MongoDB Replica Set. One of the nodes is an application server that I built by Node.js stack to process and visualise the stream as it comes to the system in a real-time. Currently I have average of 100 tweets/s, minimum of 30-40/s, and maximum of 180-220/s. There is more than one Twitter account at the same time to tracking tweets by locations and different keywords. That’s why I get more than 1% of the entire stream sometimes!
Continue reading “Source of Tweets in France and United States”