talk: Machine learning for predicting chronic diseases

dna

UMBC CSEE Colloquium

Machine learning techniques for predicting chronic diseases

Vladimir Korolev

1:00pm Friday, 5 April 2013, ITE 227, UMBC

In recent years we saw an explosion of cheap genetic tests, which lead to the emergence of personalized medicine. Personalized medicine is defined as practice of medicine that is tailored to specifics of individual patient. My work addresses the problem predicting an individual’s predisposition towards certain chronic diseases based on the their genetic makeup. The benefits of such work allow for more selective administration of invasive tests such as biopsies, which are known to cause health problems themselves.

Recently NIH has done a number of Gene Wide Association Studies q that resulted in massive datasets containing subjects’ generic makeup and labeled with clinical data including occurrence of chronic diseases. Unfortunately, given the relatively small number of patients in such studies and the vast number of genes possessed by human beings, these datasets cannot be analyzed with traditional statistical predictive models, which require a large number of samples (patients) with a very few features per sample.

My work attempts to solve this problem by employing state of the art machine learning techniques. In the past year I have built a software system that is capable of crunching of multi-terabyte scale datasets to refactor the NIH data into the form that is palatable by modern big data systems. I have run initial stages of feature selection. I will present the current state of the work and future plans. Another goal of this work is to ensure the repeatability of the experiments and flexibility to run with any similar dataset from current and future studies

Vlad Korolev is a PhD student in the UMBC Computer Science program. His research interests are in the are of personalized medicine, machine learning and large scale data processing. Vlad has considerable experience in the industry specializing in IT security, large scale data processing and the organization of software development processes.


Posted

in

, ,

by

Tags: