Automating Data Science for the Genetic Analysis of Common Diseases

Edward Rose Professor of Informatics and Director of the Penn Institute for Biomedical Informatics University of Pennsylvania The genetic architecture of common human diseases has been shaped by evolution yielding complex relationships between genotype and phenotype. Epistasis, or non-additive gene-gene interaction, represents one of these complexities. Statistical patterns of epistasis are difficult to model using parametric approaches such as logistic regression. This is partly due to the curse of dimensionality that comes with the analysis of genotype combinations. Machine learning methods offer an alternative that has improved power to detect epistasis. A significant challenge for any machine learning analysis is the construction of a data science pipeline that includes data integration, feature processing, feature selection, feature construction, analysis method selection, and parameter selection. These many choices can be daunting to…


Link to Full Article: Automating Data Science for the Genetic Analysis of Common Diseases