Bayesian Bioinformatics Applications:
BALSA is a Bayesian algorithm for local sequence alignment that takes into account the uncertainty associated with all unknown variables by incorporating in its forward sums a series of scoring matrices, gap parameters and all possible alignments.
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. The Bayesian Phylogenetic Footprinter bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest.'
The Gibbs Motif Sampler will allow you to identify motifs, conserved regions in DNA or protein sequences.
RNAG is a global RNA secondary structure alignment program. It is a blocked Gibbs Sampler for predicting consensus secondary structure of unaligned RNA sequences. As such, it has a theoretical advantage in convergence time. The algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). The samples drawn from this algorithm are used to more characterize the posterior space of structures and to assess the uncertainty of predictions.
EBIR is an exact Bayesian algorithm applicable to both variable selection and model averaging problems. It employs a fully Bayesian approach that provides a complete characterization of the posterior ensemble of possible sub-models and consequently, the marginal probability of including each of the predictor variables when the number of variables is not too large. Thus, this fully Bayesian model can be used for variable selection, model averaging applications, and examination of the shape of the posterior space.
The issue of validation and reproducibility of scientific results has recently been the subject of intense discussion in the scientific community. There is an urgent need for the development of statistical tools for quantitatively evaluating reproducibility. To help address this need, we introduce the application of a Bayesian hierarchical model for assessing the reproducibility of validation experiments in the context of evaluating top-tier predictions of high-throughput genomic studies. On this site, you will find software for assessing reproducibility of validation experiments carried out in multiple replicates, and software for designing such validation experiments.
Many computational approaches are unable to find the majority of experimentally verified binding sites without also finding many false positives. Phyloscan overcomes this difficulty by exploiting two key features of functional binding sites: (i) these sites are typically more conserved evolutionarily than are non-functional DNA sequences; and (ii) these sites often occur two or more times in the promoter region of a regulated gene.
This web page calculates the effective species count for a user-supplied phylogenetic tree and user-supplied nucleotide substitution model. The effective species count measures how efficiently sequences for the species in the leaves of the phylogenetic tree can be used to reconstruct the equilibrium distribution that governs each multiply aligned DNA sequence position.
Measuring the statistical significance of extreme sequence alignment scores is key to many important applications, but it is difficult. To precisely approximate alignment score significance, we draw random samples directly from a well chosen, importance-sampling probability distribution.