MULTIVAC: DATA SCIENCE LAB
We’ve built a whole bunch of stuff to bring Data Science and Big Data hand to hand right to your doorstep.
Introduction to Multivac Data Science Lab:
We have designed and implemented Hadoop cluster over more than 30 servers inside our private Cloud. This gives us Hadoop YARN and Hadoop HDFS to manage all resources with distributed storage over all those machines (highly available, fault tolerance, etc.). We have also implemented Apache Spark (+1.6, 2.2) on top of our Hadoop cluster.
In addition, we have configured and implemented two Web-based notebooks Apache Zeppelin and Hue that enable data-driven, interactive data analytics, and visualisation. They also support multiple languages, including Scala, Spark SQL, Python, R, Hive and Markdown. Apache Zeppelin and Hue also provide Apache Spark integration making it possible to take advantage of fast in-memory and distributed data processing engine to enhance your data science workflow.
We believe this makes Big Data development and data science much easier for any research project dealing with large-scale data.
In this training:
- Overview of Multivac Data Science Lab
- Opening accounts in Multivac DSL
- Introduction to Apache Spark
- How to work with interactive Spark Shell
- How to submit jobs with Spark Submit
- Introduction to “Interactive Spark Notebooks”
- Introduction to Apache Zeppelin and Hue
- Work on some samples:
- Machine learning: LDA, regression, classification, etc.
- NLP: Stanford CoreNLP
- See some Kaggle competitions