Source of Tweets in France and United States

So here is the thing: How people are sharing on Twitter around the world? What are the devices or services they usually use to share their check-ins, photos, videos, or updates on Twitter?

This is a really simple analytics I did on a data that I’ve been gathering for almost 2 months now (around 97 million tweets from US and 5.5 million tweets from France by the time of this study) to get some answers for the above question by using Hadoop batch processing.

I have 4 EC2 instances up and running 24×7 to track tweets (from Twitter Public Streaming API) and store them into MongoDB Replica Set. One of the nodes is an application server that I built by Node.js stack to process and visualise the stream as it comes to the system in a real-time. Currently I have average of 100 tweets/s, minimum of 30-40/s, and maximum of 180-220/s. There is more than one Twitter account at the same time to tracking tweets by locations and different keywords. That’s why I get more than 1% of the entire stream sometimes!

Continue reading “Source of Tweets in France and United States”

European Conference on Complex Systems 13

I was at Barcelona this September to share the research project that we’ve done with other scientists and researchers at ECCS’13. The track we were on was Big Data in Complex Systems and the title of my talk was: Real-time Twitter Processing and Visualization on Geo-location Map by Using Cloud-based Platform.

Before my presentation at ECCS'13 on Big Data on Complex Systems
Before my presentation at ECCS’13 on Big Data on Complex Systems
Me presenting our work at ECCS’13 on Big Data in Complex Systems satellite.
Me and Prof. Eugene

How I monitor Ubuntu Servers on Amazon EC2

Well sometimes you get to the point that you’ve been using Amazon CloudWatch or Detailed Monitor option in your EC2 Instance and you know it will cost you if you create some alerts like an alarm to send you an email to let you know your CPU is over 50% of average or whatever the number you defined! and clearly when you get the email you realize that you have know idea why your CPU is over 50% or 60%!

Yes, you still need something else to monitor your server to see what went wrong or just to see what is what and what happened at that moment to trigger that alert.

Will continue to tell you how I use Zabbix and SYSSAT to see a lot more details than a simple graph on CloudWatch.

Analyse and Visualise

It has been a few weeks that I have been interested in open-source analysis and visualisation tools and software and so far my favourite is Gephi. Gephi has been selected for the Google Summer of Code in 2009, 2010, and 2011.

Today I played a little bit with existing dataset that Gephi let you download and work with them to try out some of the features that it has.

What its official website says about Gephi:

Gephi is an interactive visualisation and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.

Runs on Windows, Linux and Mac OS X. Gephi is open-source and free.

OK, here some screen shots that I took while I was working on existing datasets. Later i will try to see what else I can find out side this world to be fun to analyse and visualise 🙂

Screen-Shot-2013-01-24-at-11.35.17-AM Screen-Shot-2013-01-24-at-11.33.41-AM

Continue reading “Analyse and Visualise”