Igor Pruenster (University of Turin)
TITLE: Bayesian nonparametric methods for prediction in EST analysis
ABSTRACT: Expressed sequence tags (ESTs) analyses
are an important tool for gene identification in organisms. Given a
preliminary EST survey from a certain cDNA library, various features of a
possible additional sample have to be predicted. For instance, interest
may rely on estimating the number of new genes to be detected, the gene
discovery rate at each additional read and the probability of not
re-observing certain specific genes present in the initial sample. We
propose a Bayesian nonparametric approach for prediction in
EST analysis based on nonparametric priors
inducing Gibbs-type exchangeable random partitions and derive estimators
for the relevant quantities. Several EST datasets are analysed by resorting
to the two parameter Poisson-Dirichlet process, which represents the most
remarkable Gibbs-type prior. Our proposal has appealing
properties over frequentist nonparametric methods, which become unstable
when prediction is required for large future samples.