How Big Data platform scaled from zero to billions of data within 6 months at ISCPIF (CNRS).

This talk contains our use of Elasticsearch, MongoDB, Redis, RabbitMQ and scalable/high available Web services built over Big Data architecture.

This presentation was presented at Université Paris-Sud, LAL, Bâtiment 200 organized by ARGOS.

Big Data at ISCPIF:
Climate at ISCPIF:
Playground for climate:

[slideshare id=55719576&doc=argos-0-151202001523-lva1-app6891&w=650&h=500]

Big Data: What I am thankful this Thanksgiving!

Big Data platform at ISCPIF: This list represents what I am thankful to this Thanksgiving

  • Elasticsearch You know for search!

This is the heart and the soul of our discovery and curiosity. The first step of every data analytics is finding the right data. Behind every great data discovery there must be a great search engine 🙂 And this is where Elasticsearch comes to the rescue with all its advanced real-time data analytics and powerful full-text search abilities built over Apache Lucene.
ISCPIF Big Data platform uses Elasticsearch and its built-in distributed and high availability feature with over 900 million documents indexed in order to facilitate data discoveries and explorations for our researchers and scientific partners.

So I am thankful to Elasticsearch for the past 2 years, which made my life much easier and made me a better engineer.
Continue reading “Big Data: What I am thankful this Thanksgiving!”

Paris before and during terrorist attacks


Night of terror

The data from Twitter shows some upsetting stats about Paris terrorist attacks on 13 November.

As it can be seen, the query for terrorist attacks has no result before 22h30. Unfortunately this shows even without any knowledge of the event itself (media or news), it is indeed possible to assume something must have happened related to the requested queries.

Paris after attacks

Continue reading “Paris before and during terrorist attacks”

Building a real-time news analytics

It is always interesting to find out what exactly is happening around the world as they are happening. This feature of knowing something as it happens is so called “realtime” or “near realtime” because of the network latency, delay for processing the data, delay for visualizing and streaming the data, etc.

Now imagine you can monitor most international news agencies on Twitter in real time. What I have been developing for the last few months is an application that allows you to not only get important news and highlights around the world as they are happening but also persists the data for the window of 24 hours so you can always have the ability to read important news and highlights that have already happened.

Continue reading “Building a real-time news analytics”

Building Search Engine by ElasticSearch

I’ve started to work with ElasticSearch for a while now. I gotta say it’s a powerful open source for building distributed real-time search engines and analytics engines.

It also uses Shards and Replica in distributed machines to make your architecture reliable and scalable. It’s not just a simple full-text search engine even though it does that perfectly.

What I do with ElasticSearch is that it’s connected to my MongoDB replica set and it indexes tweets as they are streamed by Twitter API. I track tweets for few projects (health, news, UN GlobalPulse, etc.) and try to index some of my projects in ES on the fly.

Continue reading “Building Search Engine by ElasticSearch”

5 Minutes with Viral News on Twitter: Golden Globes, Grammy Awards and State Of The Union 2014

Our MongoDB replica set servers stats during viral tweets (GoldenGlobes, Grammys, and State Of The Union 2014)



Here is some of the popular news accounts on Twitter that can be retweeted up to 1.5 thousands within 5 minutes:
Continue reading “5 Minutes with Viral News on Twitter: Golden Globes, Grammy Awards and State Of The Union 2014”