


Programme for LICSB (full abstracts listed below)

Wednesday 1^{st} April  

11001140  Keynote 1  Lachlan Coin 
11401200  Talk 1  Yahya Anvar 
12001220  Talk 2  Elisa Loza Reyes 
Lunch  
14001420  Talk 3  Pavel Mistrik 
14201440  Talk 4  Paul Kirk 
14401500  Talk 5  Michalis Titsias 
15001520  Talk 6  Keith Harris 
Tea/Coffee  
15501630  Keynote 2  Sophie Lebre 
16301650  Talk 7  Peter Milner 
16501710  Talk 8  Tina Toni 
17101730  Talk 9  Christopher Penfold 
17301900  Drinks Reception  
Thursday 2^{nd} April  
900940  Keynote 3  Andrew Golightly 
9401000  Talk 10  Vladislav Vyshemirsky 
10001020  Talk 11  Catherine Higham 
Tea/Coffee  
10501110  Talk 12  Frank Dondelinger 
11101150  Keynote 4  Carsten Wiuf 
Division of Epidemiology, Public Health and Primary Care, Imperial College London
High resolution multiplatform copy number variation genotyping and imputation using a haplotype hidden Markov model
Leiden University Medical Center, Center for Human and Clinical Genetics, Postzone: S04P, Postbus 9600, 2300 RC Leiden, The Netherlands
The Identification of Informative Genes from Multiple Datasets with Increasing Complexity
In microarray data analysis, factors such as data quality, biological variation, and the increasingly multilayered nature of more complex biological systems lead to deterioration in the ability of models to generate such regulatory networks that can represent and capture the interactions among genes. There is an urgent need to develop a methodology to overcome these bottlenecks. This paper is the initial part of a project to explore this, where we show that highly predictive (low error) and consistent (low variance) genes in multiple independent datasets related to muscle differentiation are more likely to be involved in more fundamental biological functions. Furthermore, we present that simpler and more informative datasets are able to model interactions among genes in more complex datasets and can differentiate more informative genes from others. We do this using a novel computational technique which includes several steps: (1) ordering the datasets based on their level of noise and informativeness; (2) evaluating three classifiers with increasing complexity using independent test set validation; (3) comparing the different gene selections and the influence of increasing the complexity by adding randomly selected genes; (4) assessing the ability of different models in separating the most informative genes from uninformative ones using ranking statistics; and (5) investigating the use of more informative and simpler datasets to model more complex ones.
University of Bath, Department of Mathematical Sciences, Claverton Down, BA2 7AY, Bath, UK
Detecting Evolutionary InterGene Heterogeneity in Borrelia burgdorferi
Borrelia burgdorferi is one of the bacterial species responsible for the most prevalent vectorborne disease in the temperate zone of the northern hemisphere, Lyme borreliosis [1]. Phylogenetic analyses of B. burgdorferi are now based on a concatenation of several housekeeping genes that are assumed to evolve according to one evolutionary pattern. This is a strong assumption and, when untrue, inferences are a compromise between different phylogenetic signals., We have designed a Bayesian mixture model under a missing data formulation to automatically recover the evolutionary pattern of each site in a DNA alignment. Evolutionary consistency among a set of genes can be argued whenever most of the sites are allocated to the same evolutionary class. Only in this case will a concatenation of genes produce valid inferences., In this study we demonstrate consistency in the evolution of eight housekeeping genes and evolutionary inconsistency between these housekeeping genes and the gene encoding the immunodominant outer surface protein C. Our method is a suitable indicator of evolutionary agreement or disagreement when employing largescale gene concatenations, not only in B. burgdorferi, but for any phylogenetic analysis., [1] Margos, G. et al., 2008. Proceedings of the National Academy of Sciences of the USA, 105(25): 8730  8735.
UCL EAR Institute, 332 Gray's Inn Road, London, WC1X 8EE
Prediction of the role of mutations in gap junction genes from a large scale computational model of the cochlea of the inner ear
The mutations in the GJB2 gene encoding for the connexin 26 (Cx26) protein are the most common source of nonsyndromic forms of deafness. Cx26 is a building block of gap junctions (GJ), establishing electrical intercellular connectivity between cells in distinct cochlear compartments. Cochlear circulation of ions such as potassium (K+) and metabolites such as IP3 is essential for normal hearing: animal models of the Cx26 deficiency in the organ of Corti (one of the compartments) seem to suggest the death of sensory cells (outer and inner hair cells, OHC and IHC, respectively) due to failed K+ homeostasis as the underlying problem. However, this mechanism may not be the only one. In search for alternative mechanisms we have used a large scale threedimensional model of mechanoelectrical transduction of sound in the cochlea (Mistrik et al., 2009). Indeed, a careful analysis revealed that reduced GJ conductivity in the organ of Corti would decrease the receptor potential across the OHC basolateral membrane. As the OHC electromotility is crucial for sound amplification granting the cochlear sensitivity and frequency selectivity we conclude that the reduction of the OHC somatic electromotility could represent an additional pathological mechanism in the Cx26 related forms of deafness.
Centre for Bioinformatics, Biochemistry Building, Imperial College London, SW7 2AZ
Gaussian process regression bootstrapping
Both mechanistic and empirical modelling techniques are employed in systems biology. The former construct models whose structure explicitly describes components of the biological system under investigation, while the latter make predictions on the strength of patterns in the data. Although empirical models such as Gaussian process regression (GPR) do not directly help us to elucidate the processes that generated a given data set, they can nevertheless form part of a strategy for testing and investigating hypotheses and mechanistic models. , In our work, we exploit the predictive power of GPR in order to generate plausible simulated data sets from experimentally obtained timecourse data. This amounts to a parametric bootstrap (in which the parametric model is a multivariate normal) that implicitly takes into account the timedependence in the data. Having obtained bootstrap samples, we fit mechanistic models to both the original and simulated data. The variability amongst these fitted models reveals the sensitivity of the fit to uncertainty in the data. We use this approach to investigate the effects of data uncertainty upon parameter estimates in a model of a signalling pathway and upon gene network inference.
School of Computer Science, Kilburn Building, University of Manchester, Manchester, M13 9PL
, Estimation of Multiple Transcription Factor Activities using ODEs and Gaussian Processes
Recently, ordinary differential equations (ODEs) have been used to infer the concentration of a single transcription factor (TF) protein from time series expression data of a set of target genes. For instance, this has been applied to uncover the concentration of the p53 protein; see Barenco et al. (2006). In the present work, we propose a framework to estimate multiple TFs from a set of observed gene expressions that are coregulated by these TFs. We assume that the connectivity network(that describes which TFs regulate each of the genes)is partially and probabilistically observed. For example, such side information can be available through a technique such as Chromatine Immunoprecipitation (ChIP). The objective of inference is to estimate the structure of the subnetwork, the concentration of the transcription factor proteins continuously in time as well as to infer the type of regulation in each network link (i.e. activation, repression or nonregulation). , This multipleTF framework uses Gaussian process priors to , model the unobserved TF activities continuously in time, as considered in Lawrence, et al. (2007) for the singleTF case. The ODE model of transcriptional regulation using multiple TFs is based on the following linear differential equation , dy_j(t)/dt = B_j+ S_j*g(f_1(t),...,f_R (t);w_j)− D_jy_j(t) , , where y_j(t) denotes the gene expression of jth gene at time t, (B_j,S_j,D_j) are the kinetic parameters of the equation, each f_r(t) is a TF concentration function, w_j are the connectivity weights between the gene and the TFs and g is a sigmoid (e.g. MichaelisMenten) type of function. Given a set of observations of the gene expression at discrete time points, the parameters {B_j,S_j,w_j, D_j} and the protein concentration functions {f_r(t)} are estimated by using a full Bayesian methodology that employs a Markov chain Monte Carlo algorithm. Gaussian process priors are placed on the functions {f_r(t)}, while the connectivity weights {w_j} are given sparse priors so that the side prior information about the network connectivity is taken into account. , The whole framework is currently applied to subnetworks in yeast cellcycle gene expression data in Spellman et al. (1998) and Orlando et al. (2008) by using the connectivity ChiP information provided in Lee et al. (2002). , This is a joint work with Magnus Rattray and Neil Lawrence.
Room 320, Sir Alwyn Williams Building, 18 Lilybank Gardens, Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK.
Definition of Valid Proteomic Biomarkers: Bayesian Solutions to a Currently Unmet Challenge.
Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggested the application of the twolevel hierarchical Bayesian probit regression model that Bae and Mallick (2004) proposed for variable selection, which used three different priors for the variance of the regression coefficients (inverse Gamma, exponential and Jeffreys) to incorporate different levels of sparsity in their model. We have also developed an alternative method for biomarker selection that combines model based clustering and sparse binary classification. By averaging the features within the clusters obtained from model based clustering, we deﬁne “superfeatures” and use them to build a sparse probit regression model, thereby selecting clusters of similarly behaving peptides, aiding interpretation.
LSIIT  UMR7005, University of Strasbourg, Ple API, Bd Sbastien Brant  BP 10413, 67412 Illkirch cedex, FRANCE
Timevarying genetic network inference using informative priors
School of Mathematics & Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU
Moment closure and block updating for parameter inference in stochastic biological models
This talk will tackle one of the key problems in the new science of systems biology: inference for the rate parameters underlying complex stochastic kinetic biochemical network models, using partial, discrete and noisy timecourse measurements of the system state. Although inference for exact stochastic models is possible, it is computationally intensive for relatively small networks, We explore the Bayesian estimation of stochastic kinetic rate parameters using approximate models, based on moment closure analysis of the underlying stochastic process. By assuming a Gaussian distribution and using momentclosure estimates of the first two moments, we can greatly increase the speed of parameter inference. The parameter space can be efficiently explored by embedding this approximation into an MCMC procedure. We impute the missing species using a bridge updating scheme where each proposed move is a bridge of length m. We investigate how the choice of m affects the efficiency of the sampling in a autoregulatory gene network.
Centre for Bioinformatics, Biochemisty building, Imperial College London, South Kensington Campus, SW7 2AZ London, UK
Bayesian model selection: mechanistic models of Erk MAP kinase phosphorylation dynamics
ABC SMC is a Bayesian parameter inference algorithm which is based on efficient simulation of mechanistic models. We have adapted it for model selection by defining it on an extended parameter space (M, \theta). Model selection ABC SMC algorithm chooses the best model for the system given the set of available models, balancing the fit to the data and the complexity of the model. , Here we apply it to the phosporylation dynamics of Erk MAP kinase. It has been demonstrated that in vitro phosphorylation and dephosphorylation of MAPK occur though a distributive mechanism (Burack 1997, Ferrell 1997, Zhao 2001). Recently, novel experimental techniques based on automated highthroughput immunostaining and image processing have allowed for collection of data based on population of individual cells in vivo (Ozaki et al., in preparation). We are going to examine four different hypotheses , 1) distributive phosphorylation and dephosphorylation , 2) processive phosphorylation and dephosphorylation , 3) distributive phosphorylation, processive dephosphorylation , 4) processive phosphorylation, distributive dephosphorylation , modeled by kinetic ODE models and employ Bayesian model selection tool based on ABC SMC algorithm (Toni et al., 2009) to determine the most likely mechanisms of phosphorylation and dephosphorylation occuring in Erk signaling pathway in vivo.
Systems Biology Centre, Coventry House, University of Warwick , Coventry CV4 7AL, United Kingdom
Exploring experimental designs for network inference using perturbations and a Bayesian sequential learning strategy
Modern approaches to systems biology call for a tightly coupled iterative cycle of computational modelling and independent experimental validation of model predictions. A Bayesian formulation to model inference should be exceptionably amenable to this type of experimental paradigm. Given prior knowledge that has been encoded into a model, we can train the model on data from experiment A. The result is a posterior distribution over, say, gene regulatory networks which can act as a prior for the next model, trained on data from experiment B. The Bayesian model at each stage can be seen as a distillation of the experimental data obtained up to that point, and since it is a probabilistic model it can be used as an expert prior for the model trained on the next data set. A Bayesian sequential learning strategy can therefore be employed, instead of waiting for all the data to be collected before training the first model. We explore this paradigm using simulated data from a realistic in silico model network and experimental microarray time series data sets studying stress responses in Arabidopsis and E. coli.
School of Mathematics and Statistics, Newcastle University, Newcastle Upon Tyne, NE1 7RU, UK
TBA
Sir Alwyn Williams Building, University of Glasgow, Glasgow, G12 8QQ
Bayesian Hypotheses Testing in Raman Spectroscopy
Surface enhanced resonance Raman spectroscopy (SERRS) can be used to detect a wide range of biochemical species by employing a speciﬁc set of nanoparticle probes. New data obtained using this technology will signiﬁcantly improve our abilities to understand biological systems by enabling high throughput measurements of protein concentrations. Analysis of spectra produced by SERRS is often done manually, and a solid statistical approach to interpreting such results is very important to draw valid conclusions. , We model data obtained using SERRS using Gaussian Processes. This modelling approach enables computing marginal likelihoods over diﬀerent covariance functions of GPs, and therefore consistent hypotheses testing can be performed. , We investigate several important problems in analytical biochemistry: , • Whether the spectroscopic response of analytes changes in time, or the observed variations can be explained by measurement errors. , • Is it possible to measure the diﬀerences in concentrations of an analyte given practical variability of the measurement. , • What are the most informative frequency bands to measure the concentration of a given protein with high conﬁdence. , We, additionally, develop a calibration procedure based on GP regression of the spectroscopic data using Markov Chain Monte Carlo to marginalise over the hyperparameters of the covariance function.
Faculty of Biomedical and Life Sciences, University of Glasgow , Anderson College Building, 56 Dumbarton Road, Glasgow G11 6NU, UK
Inference in a probabilistic model of dynamic DNA
Microsatellites are simple sequence repeats present in both coding and noncoding regions of the genome. DNA instability at some microsatellites is the underlying genetic defect in a number of human diseases including myotonic dystrophy type 1 (DM1). , New quantitative data, collected by single molecule analysis of repeat length in blood cells from 145 DM1 patients reveals the extent and nature of the genetic variation within and between patients (Morales, PhD thesis, 2006). This dataset of thousands of de novo mutations provides a unique opportunity to examine the underlying mechanism of mutation, which is thought to be a universal biological process that is simply amplified in the disease case. , We are developing discrete mathematical models and stochastic simulation techniques that capture key features of the mutation mechanism underlying repeat length evolution. We derive analytical expressions for the length distribution of an adapted birth and death process and employ Bayesian techniques to calibrate our models against the biological data and test model hypotheses. Our work aims to improve prognostic information for patients, as well as providing a deeper understanding of the underlying biological process. , In particular we will provide evidence that a previous model (Kaplan et al., 2007) can be improved by introducing a nonzero contraction rate.
Room 3613, Biomathematics and Statistics Scotland, James Clerk Maxwell Building, The King's Buildings, Edinburgh EH9 3JZ
Beyond Molecular Biology – Applying Gene Regulation Network Inference Methods in Ecology
Reconstructing gene regulation networks from gene expression data is an important task in molecular biology for which various network inference methods have been developed. In ecology, species interaction networks serve a similar purpose, in that they show how different species relate to each other. We have investigated the possibility of applying the methods that were developed for gene regulation networks to reconstruct species interaction networks from species abundance data. , We used a LotkaVolterra style simulation model to produce synthetic data based on species interaction networks, and then tried to reconstruct the original network from this data using Bayesian networks, LASSO (Least Absolute Shrinkage and Selection Operator) and SBR (Sparse Bayesian Regression). We also developed extensions to these methods for dealing with the problem of spatial autocorrelation. Our experiments showed that we can retrieve many species interactions, while keeping the false positive rate low. We compared the different methods, and found that LASSO and Bayesian networks perform best.
Bioinformatics Research Center, University of Aarhus, C. F. Mllers Alle, Building 1110, DK8000 Aarhus C, Denmark
Temporal Development and Collapse of an Arctic PlantPollinator Network
Abstract: Topology and linkage rules of plantpollinator networks have received much attention lately. One aspect that is difficult to study is the temporal dynamics of the network as it requires observation of how the network changes over time. Here we study an Arctic plantpollinator network in two consecutive years using mathematical models and describe the temporal dynamics (daily assembly and disassembly of links) by simple statistical distributions. Among other things, we demonstrate that the dynamics is strikingly similar in both years despite a strong turnover in the composition of the pollinator community and the daytoday development of the network poorly correlates with (available) weather parameters.