University of Helsinki
Multilabel Classification of Drug Activity
Tuesday 12th October at 11:00-12:00
Foster Court 219
We present a multilabel learning approach for molecular classiﬁcation, an important task in drug discovery. Our model takes as input a description of a molecule and predicts the activity against a set of can- cer cell lines in one shot. Statistical dependencies between the outputs are encoded by a Markov network that has cell lines as nodes and edges represent similarity according to an auxiliary dataset. Molecules are represented by graph kernels. Max-margin training is employed to separate correct multilabels from incorrect ones with a large margin. Eﬃcient training of the model is ensured by conditional gradient optimization on the marginal dual polytope, using loopy belief propagation to ﬁnd the steepest feasible ascent directions. In our experiments, the MMCRF method outperforms the support vector machine with state-of-the-art graph kernels on a dataset comprising of cancer inhibition potential of drug-like molecules against a large number cancer cell lines.
Aalto University School of Science and Technology, Finland
Transcription factor target identification from limited data using Gaussian process models
Thursday 14th October at 11:00-12:30
Torrington Place (1-19), Basement LT
We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation and the model likelihood serves as a score to rank targets. The expression profile of the TF is modelled as a sample from a Gaussian process prior distribution that is integrated out using a non-parametric Bayesian procedure. This results in a parsimonious model with relatively few parameters which can be applied to short time series data sets without noticeable over-fitting. We assess our method using genome-wide Chromatin Immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. I will also present some more recent work on extending the model for joint regulation by several TFs, as well as properly accounting for the experimental structure in typical time series gene expression assays.
This is joint work with Neil D. Lawrence, Magnus Rattray and Michalis Titsias.