From data fusion to data driven visualizations of embryogenesis – Présentation de Paul Villoutreix (Princeton University)

Paul Villoutreix est un ancien résident de l’ISC-PIF qui a soutenu sa thèse “Randomness and variability in animal embryogenesis, a multi-scale approach en 2015 ( PhD Biologie Mathématique, Université Paris Descartes). Depuis, il effectue un postdoc au Stas Shvartsman lab à l’Université de Princeton. De passage à Paris, il est venu partager son expérience avec nous le temps d’un Iscoffee Break pour nous parler de son projet Embryo Digital Atlas, une plateforme pour la visualisation de données complexes liées à l’embryogenèse, et qui accorde une attention toute particulière à l’esthétique.

“Embryogenesis is a fascinating process. It is the process by which a single fertilized cell is turned into a multi-cellular organism. It is a really beautiful phenomenon, studied in various species, from the sea urchin, to the fruit fly, to the zebrafish, to the chicken, to the mouse, to the human.

Recent technological developments in microscopy imaging techniques have transformed embryology into a data intensive science. As an interdisciplinary scientist, I am interested in topics at the intersection of developmental biology, mathematics and data art. Currently, I am a postdoc in Stas Shvartsman lab at Princeton University and a visiting fellow of the Center for Data Arts at the New School.

In addition, I am leading the Embryo Digital Atlas, an open source web based platform for the visualization of complex experimental datasets of embryogenesis in an easy and beautiful way. This is supported by the Mozilla Science Lab and you can contribute here. Thanks Abby for the interview! Here is a blog postdescribing The Embryo Digital Atlas’ journey to the Global Sprint.”

Animal embryogenesis is a multivariate process : 

Changes in morphology

Tomer et al., Nature Methods, 2012

Changes in patterns of gene expression

Lim et al., Current Biology, 2015

Anatomical descriptions are works of art

Ramón y Cajal (1852-1934)

Drawing of Purkinje cells (A) and granule cells (B) from pigeon cerebellum – 1899

Ernst Haeckel (1834 – 1919)

Ascidian – Kunstformen der Natur – 1904

Henry Gray (1827 – 1861)

Superficial dissection of the right side of the neck, showing the carotid and subclavian arteries. Anatomy of the human body – 1858

Molecular and interactive visualization

The HIV gag polyprotein (shown in red) is translated from the HIV RNA genome (in yellow) by cellular ribosomes.

David S. Goodsell (2015)

Gael McGill (Digizyme)

Can we build accurate visualizations of developing embryos that can be data-driven, interactive and visually appealing?

Data fusion: an algorithm to merge complex and heterogeneous datasets in fly embryogenesis

Data fusion amounts to complete the matrix

A mapping between morphology and chemical signal

We learn a mapping between images morphology and stained images

An accurate movie is obtained from the reconstruction

Towards interactive and visually appealing visualizations

The Embryo Digital Atlas – an Open Source visualization platform

Data Visualization – Choice of colors

Chemical signals can be described with molecule type, spatial position and concentration => Encoded as coloring a pixel, with a given hue, and a given brightness.

Data Visualization – How to interact with the datasets – Exploded view

Drawing by Sam Galison

To enable large use of this platform as a generic visualization tool, researchers need to be able to visualize and possibly share their own dataset, which requires data storage solution such as Omero and unified file format.

An online library with datasets that people can use to learn, share knowledge by collaborative tagging of the datasets, use the platform to generate figures, videos and extract measurements.

Datasets so far:

  • 2D cross sections of the developing fly
  • The Transparent Human Embryo database
  • 3D visualization of the developing fly ?

Towards a credit system behind data fusion?

Data-driven visualization can be obtained by aggregating datasets from heterogeneous datasets sources

  • Various labs
  • Various teams
  • Various experimental systems
  • Published or unpublished papers

-> How to give credit to people, based on much their datasets have been used?
-> How to guarantee the source of data?

Using Blockchain on Data

Address Data by content and keep track of its history and integrity.

Can we replace the role of journal which serve as a trusted third party by a distributed system?

BitCoin is a monetary system which overcome the need for central bank by using a distributed cryptographic protocol based on blockchain.

Also used in Github … and in the Interplanetary File System (IPFS).