Building a real-time news analytics

It is always interesting to find out what exactly is happening around the world as they are happening. This feature of knowing something as it happens is so called “realtime” or “near realtime” because of the network latency, delay for processing the data, delay for visualizing and streaming the data, etc.

Now imagine you can monitor most international news agencies on Twitter in real time. What I have been developing for the last few months is an application that allows you to not only get important news and highlights around the world as they are happening but also persists the data for the window of 24 hours so you can always have the ability to read important news and highlights that have already happened.

My study has shown that most news agencies publish their highlights and breaking news first in Twitter and then complete and develop the news as an article in their Website or they publish the article and share it on Twitter simultaneously.

Twitter is famous for its real time feature and it kills the purpose to share something that happened days ago. Therefore, most news agencies try to publish their news and highlights on Twitter as soon as possible.

I take advantage of this nature and will share some of the works that I have been doing by monitoring over 300 official news agencies on Twitter in real time.

Some of the technologies that I use in the application architecture:

Amazon Web Services (AWS), MongoDBElasticSearchNode.js, and Objective-C for mobile devices.

For more information about the visualization, you can refer to my post on LinkedIn .

In this figure what we have is total number of news per 15 minutes. I have tried 5 minutes and 30 minutes but at the end I found out the 15 minutes window is a reasonable time frame for having the news in real time. For instance as it can be seen in figure below, there were 85 news in 15 minutes at that specific time.

Source of the tweets are always interesting to me. Although what we have here is just about news agencies and reporters, still has its own value to know what platform or device these people are more comfortable with.

The other factor in each tweet (news or highlights), which it could be held as a metric of their importance to the community is the number of retweets. The application is smart enough to update the retweet number in near real-time but if you want to visualize how many RT has happened per 10 minutes you will get something like this:

Number of news that each agencies are publishing on their Twitter accounts is another interesting part here:

I did a simple topN on hashtags in these news to see a rough analytics of what they were saying:

It is a rough analytics but still gives you some idea. For instance I had no idea what #takeoffjustlogo was until I saw it here and searched about it. I would do the topN on a small time frame to find out more about the trending hashtags in future but nonetheless.

And last but not least, as it can be seen in this figure I did filter the data within a time frame that had most number of retweets to see what would have caused such a behavior:

And there it is:

The iPhone app is available on App Store: 24hours News