Division of Molecular Biosciences
Department of Life Sciences
Faculty of Natural Sciences

Programme for LICSB (full abstracts listed below)

Wednesday 1st April
1100-1140Keynote 1 Lachlan Coin
1140-1200Talk 1 Yahya Anvar
1200-1220Talk 2 Elisa Loza Reyes
Lunch
1400-1420Talk 3 Pavel Mistrik
1420-1440Talk 4 Paul Kirk
1440-1500Talk 5 Michalis Titsias
1500-1520Talk 6 Keith Harris
Tea/Coffee
1550-1630Keynote 2 Sophie Lebre
1630-1650Talk 7 Peter Milner
1650-1710Talk 8 Tina Toni
1710-1730Talk 9 Christopher Penfold
1730-1900Drinks Reception
Thursday 2nd April
900-940 Keynote 3 Andrew Golightly
940-1000 Talk 10 Vladislav Vyshemirsky
1000-1020Talk 11 Catherine Higham
Tea/Coffee
1050-1110Talk 12 Frank Dondelinger
1110-1150Keynote 4Carsten Wiuf

Abstracts

Dr Lachlan Coin

    Division of Epidemiology, Public Health and Primary Care, Imperial College London

    High resolution multi-platform copy number variation genotyping and imputation using a haplotype hidden Markov model

Mr S. Yahya Anvar

    Leiden University Medical Center, Center for Human and Clinical Genetics, Postzone: S-04-P, Postbus 9600, 2300 RC Leiden, The Netherlands

    The Identification of Informative Genes from Multiple Datasets with Increasing Complexity

    In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems lead to deterioration in the ability of models to generate such regulatory networks that can represent and capture the interactions among genes. There is an urgent need to develop a methodology to overcome these bottlenecks. This paper is the initial part of a project to explore this, where we show that highly predictive (low error) and consistent (low variance) genes in multiple inde-pendent datasets related to muscle differentiation are more likely to be involved in more fundamental biological functions. Furthermore, we present that simpler and more informative datasets are able to model interactions among genes in more complex datasets and can differentiate more informative genes from others. We do this using a novel computational technique which includes several steps: (1) ordering the datasets based on their level of noise and informative-ness; (2) evaluating three classifiers with increasing complexity us-ing independent test set validation; (3) comparing the different gene selections and the influence of increasing the complexity by adding randomly selected genes; (4) assessing the ability of different mod-els in separating the most informative genes from uninformative ones using ranking statistics; and (5) investigating the use of more informative and simpler datasets to model more complex ones.

Ms Elisa Loza Reyes

    University of Bath, Department of Mathematical Sciences, Claverton Down, BA2 7AY, Bath, UK

    Detecting Evolutionary Inter-Gene Heterogeneity in Borrelia burgdorferi

    Borrelia burgdorferi is one of the bacterial species responsible for the most prevalent vector-borne disease in the temperate zone of the northern hemisphere, Lyme borreliosis [1]. Phylogenetic analyses of B. burgdorferi are now based on a concatenation of several housekeeping genes that are assumed to evolve according to one evolutionary pattern. This is a strong assumption and, when untrue, inferences are a compromise between different phylogenetic signals., We have designed a Bayesian mixture model under a missing data formulation to automatically recover the evolutionary pattern of each site in a DNA alignment. Evolutionary consistency among a set of genes can be argued whenever most of the sites are allocated to the same evolutionary class. Only in this case will a concatenation of genes produce valid inferences., In this study we demonstrate consistency in the evolution of eight housekeeping genes and evolutionary inconsistency between these housekeeping genes and the gene encoding the immunodominant outer surface protein C. Our method is a suitable indicator of evolutionary agreement or disagreement when employing large-scale gene concatenations, not only in B. burgdorferi, but for any phylogenetic analysis., [1] Margos, G. et al., 2008. Proceedings of the National Academy of Sciences of the USA, 105(25): 8730 - 8735.

Dr Pavel Mistrik

    UCL EAR Institute, 332 Gray's Inn Road, London, WC1X 8EE

    Prediction of the role of mutations in gap junction genes from a large scale computational model of the cochlea of the inner ear

    The mutations in the GJB2 gene encoding for the connexin 26 (Cx26) protein are the most common source of nonsyndromic forms of deafness. Cx26 is a building block of gap junctions (GJ), establishing electrical intercellular connectivity between cells in distinct cochlear compartments. Cochlear circulation of ions such as potassium (K+) and metabolites such as IP3 is essential for normal hearing: animal models of the Cx26 deficiency in the organ of Corti (one of the compartments) seem to suggest the death of sensory cells (outer and inner hair cells, OHC and IHC, respectively) due to failed K+ homeostasis as the underlying problem. However, this mechanism may not be the only one. In search for alternative mechanisms we have used a large scale three-dimensional model of mechano-electrical transduction of sound in the cochlea (Mistrik et al., 2009). Indeed, a careful analysis revealed that reduced GJ conductivity in the organ of Corti would decrease the receptor potential across the OHC basolateral membrane. As the OHC electromotility is crucial for sound amplification granting the cochlear sensitivity and frequency selectivity we conclude that the reduction of the OHC somatic electromotility could represent an additional pathological mechanism in the Cx26 related forms of deafness.

Mr Paul Kirk

    Centre for Bioinformatics, Biochemistry Building, Imperial College London, SW7 2AZ

    Gaussian process regression bootstrapping

    Both mechanistic and empirical modelling techniques are employed in systems biology. The former construct models whose structure explicitly describes components of the biological system under investigation, while the latter make predictions on the strength of patterns in the data. Although empirical models such as Gaussian process regression (GPR) do not directly help us to elucidate the processes that generated a given data set, they can nevertheless form part of a strategy for testing and investigating hypotheses and mechanistic models. , In our work, we exploit the predictive power of GPR in order to generate plausible simulated data sets from experimentally obtained time-course data. This amounts to a parametric bootstrap (in which the parametric model is a multivariate normal) that implicitly takes into account the time-dependence in the data. Having obtained bootstrap samples, we fit mechanistic models to both the original and simulated data. The variability amongst these fitted models reveals the sensitivity of the fit to uncertainty in the data. We use this approach to investigate the effects of data uncertainty upon parameter estimates in a model of a signalling pathway and upon gene network inference.

Dr Michalis Titsias

    School of Computer Science, Kilburn Building, University of Manchester, Manchester, M13 9PL

    , Estimation of Multiple Transcription Factor Activities using ODEs and Gaussian Processes

    Recently, ordinary differential equations (ODEs) have been used to infer the concentration of a single transcription factor (TF) protein from time series expression data of a set of target genes. For instance, this has been applied to uncover the concentration of the p53 protein; see Barenco et al. (2006). In the present work, we propose a framework to estimate multiple TFs from a set of observed gene expressions that are co-regulated by these TFs. We assume that the connectivity network(that describes which TFs regulate each of the genes)is partially and probabilistically observed. For example, such side information can be available through a technique such as Chromatine Immunoprecipitation (ChIP). The objective of inference is to estimate the structure of the sub-network, the concentration of the transcription factor proteins continuously in time as well as to infer the type of regulation in each network link (i.e. activation, repression or non-regulation). , This multiple-TF framework uses Gaussian process priors to , model the unobserved TF activities continuously in time, as considered in Lawrence, et al. (2007) for the single-TF case. The ODE model of transcriptional regulation using multiple TFs is based on the following linear differential equation , dy_j(t)/dt = B_j+ S_j*g(f_1(t),...,f_R (t);w_j)− D_jy_j(t) , , where y_j(t) denotes the gene expression of jth gene at time t, (B_j,S_j,D_j) are the kinetic parameters of the equation, each f_r(t) is a TF concentration function, w_j are the connectivity weights between the gene and the TFs and g is a sigmoid (e.g. Michaelis-Menten) type of function. Given a set of observations of the gene expression at discrete time points, the parameters {B_j,S_j,w_j, D_j} and the protein concentration functions {f_r(t)} are estimated by using a full Bayesian methodology that employs a Markov chain Monte Carlo algorithm. Gaussian process priors are placed on the functions {f_r(t)}, while the connectivity weights {w_j} are given sparse priors so that the side prior information about the network connectivity is taken into account. , The whole framework is currently applied to sub-networks in yeast cell-cycle gene expression data in Spellman et al. (1998) and Orlando et al. (2008) by using the connectivity ChiP information provided in Lee et al. (2002). , This is a joint work with Magnus Rattray and Neil Lawrence.

Dr Keith James Harris

    Room 320, Sir Alwyn Williams Building, 18 Lilybank Gardens, Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK.

    Definition of Valid Proteomic Biomarkers: Bayesian Solutions to a Currently Unmet Challenge.

    Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggested the application of the two-level hierarchical Bayesian probit regression model that Bae and Mallick (2004) proposed for variable selection, which used three different priors for the variance of the regression coefficients (inverse Gamma, exponential and Jeffreys) to incorporate different levels of sparsity in their model. We have also developed an alternative method for biomarker selection that combines model based clustering and sparse binary classification. By averaging the features within the clusters obtained from model based clustering, we define “superfeatures” and use them to build a sparse probit regression model, thereby selecting clusters of similarly behaving peptides, aiding interpretation.

Dr Sophie Lebre

    LSIIT - UMR7005, University of Strasbourg, Ple API, Bd Sbastien Brant - BP 10413, 67412 Illkirch cedex, FRANCE

    Time-varying genetic network inference using informative priors

Mr Peter Milner

    School of Mathematics & Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU

    Moment closure and block updating for parameter inference in stochastic biological models

    This talk will tackle one of the key problems in the new science of systems biology: inference for the rate parameters underlying complex stochastic kinetic biochemical network models, using partial, discrete and noisy time-course measurements of the system state. Although inference for exact stochastic models is possible, it is computationally intensive for relatively small networks, We explore the Bayesian estimation of stochastic kinetic rate parameters using approximate models, based on moment closure analysis of the underlying stochastic process. By assuming a Gaussian distribution and using moment-closure estimates of the first two moments, we can greatly increase the speed of parameter inference. The parameter space can be efficiently explored by embedding this approximation into an MCMC procedure. We impute the missing species using a bridge updating scheme where each proposed move is a bridge of length m. We investigate how the choice of m affects the efficiency of the sampling in a auto-regulatory gene network.

Ms Tina Toni

    Centre for Bioinformatics, Biochemisty building, Imperial College London, South Kensington Campus, SW7 2AZ London, UK

    Bayesian model selection: mechanistic models of Erk MAP kinase phosphorylation dynamics

    ABC SMC is a Bayesian parameter inference algorithm which is based on efficient simulation of mechanistic models. We have adapted it for model selection by defining it on an extended parameter space (M, \theta). Model selection ABC SMC algorithm chooses the best model for the system given the set of available models, balancing the fit to the data and the complexity of the model. , Here we apply it to the phosporylation dynamics of Erk MAP kinase. It has been demonstrated that in vitro phosphorylation and dephosphorylation of MAPK occur though a distributive mechanism (Burack 1997, Ferrell 1997, Zhao 2001). Recently, novel experimental techniques based on automated high-throughput immunostaining and image processing have allowed for collection of data based on population of individual cells in vivo (Ozaki et al., in preparation). We are going to examine four different hypotheses , 1) distributive phosphorylation and dephosphorylation , 2) processive phosphorylation and dephosphorylation , 3) distributive phosphorylation, processive dephosphorylation , 4) processive phosphorylation, distributive dephosphorylation , modeled by kinetic ODE models and employ Bayesian model selection tool based on ABC SMC algorithm (Toni et al., 2009) to determine the most likely mechanisms of phosphorylation and dephosphorylation occuring in Erk signaling pathway in vivo.

Mr Christopher Andrew Penfold

    Systems Biology Centre, Coventry House, University of Warwick , Coventry CV4 7AL, United Kingdom

    Exploring experimental designs for network inference using perturbations and a Bayesian sequential learning strategy

    Modern approaches to systems biology call for a tightly coupled iterative cycle of computational modelling and independent experimental validation of model predictions. A Bayesian formulation to model inference should be exceptionably amenable to this type of experimental paradigm. Given prior knowledge that has been encoded into a model, we can train the model on data from experiment A. The result is a posterior distribution over, say, gene regulatory networks which can act as a prior for the next model, trained on data from experiment B. The Bayesian model at each stage can be seen as a distillation of the experimental data obtained up to that point, and since it is a probabilistic model it can be used as an expert prior for the model trained on the next data set. A Bayesian sequential learning strategy can therefore be employed, instead of waiting for all the data to be collected before training the first model. We explore this paradigm using simulated data from a realistic in silico model network and experimental microarray time series data sets studying stress responses in Arabidopsis and E. coli.

Dr Andrew Golightly

    School of Mathematics and Statistics, Newcastle University, Newcastle Upon Tyne, NE1 7RU, UK

    TBA

Dr Vladislav Vyshemirsky

    Sir Alwyn Williams Building, University of Glasgow, Glasgow, G12 8QQ

    Bayesian Hypotheses Testing in Raman Spectroscopy

    Surface enhanced resonance Raman spectroscopy (SERRS) can be used to detect a wide range of biochemical species by employing a specific set of nanoparticle probes. New data obtained using this technology will significantly improve our abilities to understand biological systems by enabling high throughput measurements of protein concentrations. Analysis of spectra produced by SERRS is often done manually, and a solid statistical approach to interpreting such results is very important to draw valid conclusions. , We model data obtained using SERRS using Gaussian Processes. This modelling approach enables computing marginal likelihoods over different covariance functions of GPs, and therefore consistent hypotheses testing can be performed. , We investigate several important problems in analytical biochemistry: , • Whether the spectroscopic response of analytes changes in time, or the observed variations can be explained by measurement errors. , • Is it possible to measure the differences in concentrations of an analyte given practical variability of the measurement. , • What are the most informative frequency bands to measure the concentration of a given protein with high confidence. , We, additionally, develop a calibration procedure based on GP regression of the spectroscopic data using Markov Chain Monte Carlo to marginalise over the hyper-parameters of the covariance function.

Ms Catherine F Higham

    Faculty of Biomedical and Life Sciences, University of Glasgow , Anderson College Building, 56 Dumbarton Road, Glasgow G11 6NU, UK

    Inference in a probabilistic model of dynamic DNA

    Microsatellites are simple sequence repeats present in both coding and non-coding regions of the genome. DNA instability at some microsatellites is the underlying genetic defect in a number of human diseases including myotonic dystrophy type 1 (DM1). , New quantitative data, collected by single molecule analysis of repeat length in blood cells from 145 DM1 patients reveals the extent and nature of the genetic variation within and between patients (Morales, PhD thesis, 2006). This dataset of thousands of de novo mutations provides a unique opportunity to examine the underlying mechanism of mutation, which is thought to be a universal biological process that is simply amplified in the disease case. , We are developing discrete mathematical models and stochastic simulation techniques that capture key features of the mutation mechanism underlying repeat length evolution. We derive analytical expressions for the length distribution of an adapted birth and death process and employ Bayesian techniques to calibrate our models against the biological data and test model hypotheses. Our work aims to improve prognostic information for patients, as well as providing a deeper understanding of the underlying biological process. , In particular we will provide evidence that a previous model (Kaplan et al., 2007) can be improved by introducing a non-zero contraction rate.

Mr Frank Dondelinger

    Room 3613, Biomathematics and Statistics Scotland, James Clerk Maxwell Building, The King's Buildings, Edinburgh EH9 3JZ

    Beyond Molecular Biology – Applying Gene Regulation Network Inference Methods in Ecology

    Reconstructing gene regulation networks from gene expression data is an important task in molecular biology for which various network inference methods have been developed. In ecology, species interaction networks serve a similar purpose, in that they show how different species relate to each other. We have investigated the possibility of applying the methods that were developed for gene regulation networks to reconstruct species interaction networks from species abundance data. , We used a Lotka-Volterra style simulation model to produce synthetic data based on species interaction networks, and then tried to reconstruct the original network from this data using Bayesian networks, LASSO (Least Absolute Shrinkage and Selection Operator) and SBR (Sparse Bayesian Regression). We also developed extensions to these methods for dealing with the problem of spatial autocorrelation. Our experiments showed that we can retrieve many species interactions, while keeping the false positive rate low. We compared the different methods, and found that LASSO and Bayesian networks perform best.

Professor Carsten Wiuf

    Bioinformatics Research Center, University of Aarhus, C. F. Mllers Alle, Building 1110, DK-8000 Aarhus C, Denmark

    Temporal Development and Collapse of an Arctic Plant-Pollinator Network

    Abstract: Topology and linkage rules of plant-pollinator networks have received much attention lately. One aspect that is difficult to study is the temporal dynamics of the network as it requires observation of how the network changes over time. Here we study an Arctic plant-pollinator network in two consecutive years using mathematical models and describe the temporal dynamics (daily assembly and disassembly of links) by simple statistical distributions. Among other things, we demonstrate that the dynamics is strikingly similar in both years despite a strong turnover in the composition of the pollinator community and the day-to-day development of the network poorly correlates with (available) weather parameters.