Loading Events

« Tous les évènements

  • This event has passed.

JEDI – Multivac

15 mars 2018 @ 14:00 - 18:00

We’ve built a whole bunch of stuff to bring Data Science and Big Data hand to hand right to your doorstep.

 

Introduction to Multivac Data Science Lab:

We have designed and implemented Hadoop cluster over more than 30 servers inside our private Cloud. This gives us Hadoop YARN and Hadoop HDFS to manage all resources with distributed storage over all those machines (highly available, fault tolerance, etc.). We have also implemented Apache Spark 2.2 on top of our Hadoop cluster. 

In addition, we have configured and implemented two Web-based notebooks Apache Zeppelin and Hue that enable data-driven, interactive data analytics, and visualization. They also support multiple languages, including Scala, Spark SQL, Python, R, Hive, and Markdown. Apache Zeppelin and Hue also provide Apache Spark integration making it possible to take advantage of fast in-memory and distributed data processing engine to enhance your data science workflow. 

We believe this makes Big Data development and data science much easier for any research project dealing with large-scale data.

Website: https://multivac.iscpif.fr

 

In this training:

Part I

  • Overview of Multivac Platform
    • Architecture
    • Real-time API engine
    • Data Science Lab
    • Projects
  • Introduction to Apache Hadoop
  • Introduction to Apache Spark
    • Hands-on with Spark’s programming APIs (DataFrame/SQL, Datasets, RDD)
    • Overview of Spark architecture: Core, Streaming, Standalone Mode, DAG
  • How to work with interactive Spark Shell
  • How to submit jobs with Spark Submit
    • Intellij
  • Introduction to “Interactive Spark Notebooks”
    • Apache Zeppelin
    • Hue UI / Notebooks

Part II

  • Exploring Wikipedia Page Views
    • Complex SQL on large-scale data (over 60 billion rows)
  • Introducing Standford CoreNLP with Spark
  • Introducing Spark NLP library
  • Machine Learning by Spark
    • Feature engineering
    • ML pipeline: string indexer, vector assembler, Random Forest Tree, multi-class classification evaluator, etc.
  • Uber: Trips Clustering
  • Building a Movie Recommender
    • ASL algorithm and 100 million movie ratings from Netflix prize

Level of difficulty: 
Intermediate to advanced

Language:
This training will be presented in English

Requirement:

Laptop/notebook (Linux or macOS preferably)

Chrome/Firefox/Safari browsers

Training by:

Maziyar Panahi, Big Data engineer and Cloud architect at ISC-PIF/CNRS

 

 

 

Détails

Date :
15 mars 2018
Heure :
14:00 - 18:00
Catégories d’Évènement:
,

Organisateur

ISC-PIF

Lieu

ISC-PIF
113 rue Nationale
Paris, Paris 75013 France
+ Google Map
Voir Lieu site web

Détails

Date :
15 mars 2018
Heure :
14:00 - 18:00
Catégories d’Évènement:
,

Organisateur

ISC-PIF

Lieu

ISC-PIF
113 rue Nationale
Paris, Paris 75013 France
+ Google Map
Voir Lieu site web