Data scientist at Bio-Prodict

Bio-Prodict is focused on delivering solutions for guiding scientific research in the field of protein engineering, molecular design and DNA diagnostics. We apply novel approaches to data mining, storage and analysis of protein data and combine these with state-of-the art analysis methods and visualization tools to create custom-built information systems for protein superfamilies.

  • Professional
  • Data science
  • Experience

I am currently employed at Bio-Prodict as a medior Data Scientist, where I use state of the art machine learning techniques to develop novel solutions for bioinformatics problems.

I am primarily involved in the production of the Helix product. I work in a team that builds on the results of my internships to predict pathogenicity for different protein variants.

Highlights

Skills used

Python
Backend programming
Visualization
Machine learning
Scikit-learn
Keras
PyTorch
Data engineering
NoSQL databases (MongoDB, Google Datastore)
Google Cloud Platform (Kubernetes, Google Compute)
SQL databases (Postgres, SQLite, MySQL)
Scientific reading/writing

Media

Performance plot showing Helix as the best product. Matthew's Correlation Coefficient performance on an independent dataset of genes, not present in any dataset. Our ensemble outperforms all competitors.