Data scientist at Bio-Prodict

Bio-Prodict is focused on delivering solutions for guiding scientific research in the field of protein engineering, molecular design and DNA diagnostics. We apply novel approaches to data mining, storage and analysis of protein data and combine these with state-of-the art analysis methods and visualization tools to create custom-built information systems for protein superfamilies.

I am currently employed at Bio-Prodict as a medior Data Scientist, where I use state of the art machine learning techniques to develop novel solutions for bioinformatics problems.

I am primarily involved in the production of the product. I work in a team that builds on the results of my internships to predict pathogenicity for different protein variants.


Skills used

Backend programming
Machine learning
Data engineering
NoSQL databases (MongoDB, Google Datastore)
Google Cloud Platform (Kubernetes, Google Compute)
SQL databases (Postgres, SQLite, MySQL)
Scientific reading/writing


Performance plot showing as the best product. Matthew's Correlation Coefficient performance on an independent dataset of genes, not present in any dataset. Our ensemble outperforms all competitors.