**DESCRIPTION OF COURSES**

**AS 608 ADVANCED BIOINFORMATICS (2L+1P) III
(Pre-requisite**

**Objectives**

This is a course on Bioinformatics that aims at exposing the students to some advanced statistical and computational techniques related to bioinformatics. This course would prepare the students in understanding bioinformatics principles and their applications.

**Theory**

UNIT I

Genomic databases and analysis of high-throughput data sets, Analysis of DNA sequence, Sequence annotation, ESTs, SNPs. BLAST and related sequence comparison methods. EM algorithm and other statistical methods to discover common motifs in biosequences. Multiple alignment and database search using motif models, Clustal W and others. Concepts in phylogeny. Gene prediction based on codons, decision trees. classificatory analysis, neural networks, genetic algorithms, pattern recognition, Hidden Markov models.

UNIT II

Computational analysis of protein sequence, structure and function. Modeling protein families. Expression profiling by microarray/gene chip, proteomics etc.. Multiple alignment of protein sequences. Modelling and prediction of structure of proteins. Designer proteins. Drug designing.

UNIT III

Markov Chains (MC with no absorbing states, higher order Markov dependence, patterns in sequences, Markov Chain Monte Carlo – Hastings-Metropolis algorithm, simulated annealing, MC with absorbing States). Bayesian techniques and use of Gibbs Sampling. Advanced topics in design and analysis of DNA microarray experiments.

UNIT IV

Computationally intensive methods (classical estimation methods, Bootstrap estimation and confidence intervals, hypothesis testing, multiple hypothesis testing). Evolutionary models (models of nucleotide substitution). Phylogenetic tree estimation (distances, tree reconstruction - ultrametric and neighbor-joining cases, surrogate distances, tree reconstruction, parsimony and maximum likelihood, modeling, estimation and hypothesis testing). Neural Networks (universal approximation properties, priors and likelihoods, learning algorithms - back propagation, sequence encoding and output interpretation, prediction of protein secondary structure, prediction of signal peptides and their cleavage sites, application for DNA and RNA nucleotide sequences). Analysis of SNPs and haplotypes.

**Practicals**

Genomic databases and analysis of high-throughput data sets, BLAST and related sequence comparison methods. Statistical methods to discover common motifs in biosequences. Multiple alignment and database search using motif models, clustalw, classificatory analysis, neural networks, genetic algorithms, pattern recognition, Hidden Markov models. Computational analysis of protein sequence. Expression profiling by microarray/gene chip, proteomics. Modelling and prediction of structure of proteins. Bayesian techniques and use of Gibbs Sampling. Analysis of DNA microarray experiments. Analysis of one DNA sequence, multiple DNA or protein sequences. Computationally intensive methods, multiple hypothesis testing, Phylogenetic tree estimation, Analysis of SNPs and haplotypes.

**Suggested Readings**

- Retrieved from
*“http://wiki.bioinformatics.org/Likelihood%2C_Bayesian_and_MCMC_**Methods_in_Genetics_%28Sorensen%29”* - Baldi, P. and Brunak, S. 2001.
*Bioinformatics: The Machine Learning Approach*. MIT Press. - Baxevanis, A.D. and Francis, B.F. 2004.
*Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins*. John Wiley. - Duda, R.O., Hart, P.E. and Stork, D.G. 1999.
*Pattern Classification*. John Wiley. - Ewens, W.J. and Grant, G.R. 2001.
*Statistical Methods in Bioinformatics*. Springer. - Jones, N.C. and Pevzner, P.A. 2004.
*Introduction to Bioinformatics Algorithims*. The MIT Press. - Retrieved from
*“http://wiki.bioinf**o**r**m**a**tics.org/Computational_Biolo**g**y_%28**Wunschie**r**s%29”* - Koskinen, T. 2001.
*Hidden Markov Models for Bioinformatics*. Kluwer Academic Publishers. - Krane, D.E. and Raymer, M.L. 2002.
*Fundamental Concepts of Bio-informatics*. Benjamin / Cummings. - Krawetz, S.A. and Womble, D.D. 2003.
*Introduction to Bioinformatics: A Theoretical and Practical Approach*. Humana Press. - Lesk, A.M. 2002.
*Introduction to Bio-informatics*. Oxford University Press. - Linder, E. and Seefeld, K. 2005.
*R for Bioinformatics*. O’Reilly and Associates. - Percus, J.K. 2001.
*Mathematics of Genome Analysis*. Cambridge University Press. - Sorensen, D. and Gianola, D. 2002.
*Likelihood, Bayesian and MCMC Methods in Genetics*. Springer. - Tisdall, J.D. 2001.
*Mastering Perl for Bioinformatics*. O’Reilly and Associates. - Wang, J.T.L., Zaki, M.J., Toivonen, H.T.T. and Shasha, D. 2004.
*Data Mining in Bioinformatics*. Springer. - Wu, C.H. and McLarty, J.W. 2000.
*Neural Networks and Genome Informatics.*Elsevier. - Wunschiers, R. 2004.
*Computational Biology Unix/Linux, Data Processing and Programming*. Springer. - Yang, M.C.C. 2000.
*Introduction to Statistical Methods in Modern Genetics*. Taylor and Francis.