Machine Learning on noisy genome data. Scikit-learn python

gravatar for QVINTVS_FABIVS_MAXIMVS

4 hours ago by

USA SoCal

QVINTVS_FABIVS_MAXIMVS370 wrote:

I want to classify data using three dimensions, lets call them: A,B, and C


B and C are almost always positively correlated. B+C and A are usually negatively correlated. However C is usually an “all or none” statistic; we see it sometimes but not always.

With this in mind I chose to classify data using Linear Discriminant Analysis in the scikit-learn python library. http://scikit-learn.org/stable/modules/generated/sklearn.lda.LDA.html

I’m not entirely married to LDA but my PI would like to keep a linear model.

I would like to train the data but apply a weight expressed in this pseudo-code

   lda = LDA.()
   lda.train(trainX,trainY, weights=('None','None',"all_or_none") )
   # "all_or_none" indicates that when C is absent to NOT penalize the prediction

I’m a little naive in machine learning, maybe there’s another way to do this in scikit-learn?

Thanks!

Source: Machine Learning on noisy genome data. Scikit-learn python

Via: Google Alert for ML