David Bickel, Ottawa Institute of Systems Biology

David R. Bickel, PhD
University of Ottawa
Ottawa Institute of Systems Biology
Department of Biochemistry, Microbiology, and Immunology

News: Statomics Lab

Contact: email

More Recent Research and Software

(The page you are now viewing is no longer updated or maintained.)

Research and Software as of 1 July 2012

Updated citations and hyperlinks

	Research interests	Papers and software
Foundations of statistics	Non-Bayesian and partially-Bayesian posterior distributions. Objective quantification of the strength of evidence in data.	Bayes/non-Bayes continuum Confidence posterior distributions Strength of statistical evidence
Statistical genomics	Analysis of data from genome-wide association studies. Estimation of the degree of differential gene, protein, or metabolite expression. Local false discovery rate estimation and cluster analysis to interpret data of microarray scales and smaller scales.	Evidence for genome-wide association Empirical Bayes methods for enrichment, proteomics, and metabolomics Methods for levels of differential gene expression Differential expression detection & testing multiple hypotheses Gene network reconstruction & co-expression inference
Older statistical methodology & applications	Robust estimation of the mode. Intermittent and fractal time-series and stochastic point processes, with applications to heart rate variability, activity counts, and DNA evolution theory.	Robust mode & skewness estimation Fractal stochastic models of DNA evolution Stochastic intermittency Heart rate variability

“...there is a strong element of intellectual arrogance present in all of us, not excluding myself. We tend to think that we are "God's gift to the world," and that we are highly original in posing problems and suggesting solutions.”
— O. Kempthorne, p. 451, discussion on Lindley, D. V. (1971) The estimation of many parameters (with discussion). In: Foundations of Statistical Inference, eds. V. P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart & Winston, pp. 435-455.

about this page | main page

Evidence in Genome-Wide Association Studies

Y. Yang and D. R. Bickel, “Minimum description length and empirical Bayes methods of identifying SNPs associated with disease,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 74, available at biostats.bepress.com/cobra/ps/art74 (2010). Full preprint | Software

Methods for Levels of Differential Gene Expression

Z. Montazeri, C. M. Yanofsky, and D. R. Bickel [the first two authors contributed equally], “Shrinkage estimation of effect sizes as an alternative to hypothesis testing followed by estimation in high-dimensional biology: Applications to differential gene expression,” Statistical Applications in Genetics and Molecular Biology 9 (1) 23 (2010). Article | Software | Supplementary material | Draft

C. M. Yanofsky and D. R. Bickel, “Validation of differential gene expression algorithms: Application comparing fold change estimation to hypothesis testing,” BMC Bioinformatics 11, 63 (2010). Article | Draft

D. R. Bickel, “Correcting the estimated level of differential expression for gene selection bias: Application to a microarray study,” Statistical Applications in Genetics and Molecular Biology 7 (1) 10, (2008). Article

D. R. Bickel, “Degrees of differential gene expression: Detecting biologically significant expression differences and estimating their magnitudes,” Bioinformatics 20, 682-688 (2004). Abstract and main article | Supplementary material | Software (Statomics)

application of a likelihood method for quantifying the strength of evidence

Gene Network Reconstruction and Inference of Gene Coexpression

D. R. Bickel, Z. Montazeri, P.-C. Hsieh, M. Beatty, S. J. Lawit, and N. J. Bate, “Gene network reconstruction from transcriptional dynamics under kinetic model uncertainty: A case for the second derivative,” Bioinformatics 25, 772-779 (2009). Open access (PDF) | Supplement & software | Data

D. R. Bickel, “Probabilities of spurious connections in gene networks: Application to expression time series,” Bioinformatics 21, 1121-1128 (2005). Abstract and main article | Uncorrected version | Supplementary material (corrected) | Scalable Fig. 3 (corrected) | Software (Statomics 0.4 fixed bug)

D. R. Bickel, “Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically,” Bioinformatics 19, 818-824 (2003). Abstract and article | Software (PLATO) | Full preprint

Small-Scale Estimators of the Local False Discovery Rate

D. R. Bickel, “Minimum description length methods of medium-scale simultaneous inference,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1009.5981 (2010). Full preprint

Z. Yang, Z. Li, and D. R. Bickel, “Empirical Bayes estimation of posterior probabilities of enrichment,” Technical Report, Ottawa Institute of Systems Biology, Technical Report, Ottawa Institute of Systems Biology, arXiv:1201.0153 (2011). Full preprint | 2010 seed | Software

D. R. Bickel, “Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1106.4490 (2011). Full preprint

D. R. Bickel, “Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341 (2011). Full preprint

more empirical Bayes papers

Multiple Hypothesis Testing and Applications to Differential Gene Expression

D. R. Bickel, “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics 67, 363-370 (2011). Abstract and article | French abstract | Supplementary material | Full preprint

M. Guo, S. Yang, M. Rupe, B. Hu, D. R. Bickel, L. Arthur, and O. Smith, “Genome-wide allele-specific expression analysis using Massively Parallel Signature Sequencing (MPSS) reveals cis- and trans-effects on gene expression in maize hybrid meristem tissue,” Plant Molecular Biology 66, 551-563 (2008).

D. R. Bickel, “Error-rate and decision-theoretic methods of multiple testing: Which genes have high objective probabilities of differential expression?” Statistical Applications in Genetics and Molecular Biology 3 (1) 8, (2004). Article | S-PLUS software | R software | Full preprint [earlier version cited as 'Bickel, D. R. (2003), "Selecting an optimal rejection region for multiple testing: A decision-theoretic alternative to FDR control, with an application to microarrays," Tech. rep., Medical College of Georgia']

D. R. Bickel, “On 'Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates': Does a large number of tests obviate confidence intervals of the FDR?” Technical Report, Medical College of Georgia, arXiv:q-bio.GN/0404032 (2004). Technical report | Software (Statomics)

D. R. Bickel, “Reliably determining which genes have a high posterior probability of differential expression: A microarray application of decision-theoretic multiple testing,” Technical Report, Medical College of Georgia, arXiv:q-bio.QM/0402048 (2004). Technical report | S-PLUS software | R software

D. R. Bickel, “Microarray gene expression analysis: Data transformation and multiple-comparison bootstrapping,” Computing Science and Statistics 34, 383-400, Interface Foundation of North America (Proceedings of the 34th Symposium on the Interface, Montréal, Québec, Canada, April 17-20, 2002). Abstract | Full article | Software (BioinfoStat)

D. R. Bickel, “Conservative identification of differentially expressed genes using cDNA or oligonucleotide microarrays: Inference about values of high probability density,” 2002 Proceedings of the American Statistical Association, Biometrics Section [CD-ROM], American Statistical Association: Alexandria, VA (2002). Abstract | Full article

Strength of Statistical Evidence

D. R. Bickel, “A predictive approach to measuring the strength of statistical evidence for single and multiple comparisons,” Canadian Journal of Statistics 39, 610–631 (2011). Full article | Revised preprint | 2010 draft

D. R. Bickel, “The strength of statistical evidence for composite hypotheses: Inference to the best explanation,” Statistica Sinica 22, 1147-1198 (2012). Full article | 2010 version

D. R. Bickel, “Measuring support for a hypothesis about a random parameter without estimating its unknown prior,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1101.0305 (2011). Full preprint

more likelihood paradigm papers

Game-theoretic Strategies for Sets of Posteriors

D. R. Bickel, “Game-theoretic probability combination with applications to resolving conflicts between statistical methods,” International Journal of Approximate Reasoning 53, 880-891 (2012). Full article | 2011 preprint

D. R. Bickel, “Controlling the degree of caution in statistical inference with the Bayesian and frequentist approaches as opposite extremes,” Electronic Journal of Statistics 6, 686-709 (2012). Full article (open access) | 2011 preprint

D. R. Bickel, “Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing,” Technical Report, Ottawa Institute of Systems Biology, available from http://goo.gl/kCVUs (2012). 2012 preprint | 2011 preprint

more "imprecise probability" papers

Confidence Posterior Distributions

D. R. Bickel, “Coherent frequentism: A decision theory based on confidence sets,” Communications in Statistics – Theory and Methods 41, 1478-1496 (2012). Full article (open access) | 2009 preprint

D. R. Bickel, “Empirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals,” Statistical Applications in Genetics and Molecular Biology 11 (3), art. 7 (2012). Full article | 2010 preprint

D. R. Bickel, “A prior-free framework of coherent inference and its derivation of simple shrinkage estimators,” Technical Report, Ottawa Institute of Systems Biology, available from http://goo.gl/aUSLr (2012). 2012 preprint

D. R. Bickel, “A frequentist framework of inductive reasoning,” Technical Report, Ottawa Institute of Systems Biology, arXiv:math.ST/0602377 (2009). Christmas revision

more confidence distribution papers

Robust Estimation of the Mode and Skewness

D. R. Bickel and R. Frühwirth (contributed equally), “On a Fast, Robust Estimator of the Mode: Comparisons to Other Robust Estimators with Applications,” Computational Statistics and Data Analysis 50, 3500-3530 (2006). Full preprint | Mode estimation software

D. R. Bickel, “Robust estimators of the mode and skewness of continuous data,” Computational Statistics and Data Analysis 39, 153-163 (2002). Abstract | Full preprint

D. R. Bickel, “Robust and efficient estimation of the mode of continuous data: The mode as a viable measure of central tendency,” Journal of Statistical Computation and Simulation 73, 899-912 (2003); peer-reviewed preprint: InterStat, November 2001, http://interstat.stat.vt.edu/interstat/articles/2001/abstracts/n01001.html-ssi. Abstract | Full article

Fractal Stochastic Process Models of DNA Evolution

The following papers describe fractal models of evolution and their compatibility with DNA data.

D. R. Bickel and B. J. West, “Multiplicative and fractal processes in DNA evolution,” Fractals 6, 211-217 (1998). This paper provides an introduction to molecular evolution and its assumptions. Abstract

B. J. West and D. R. Bickel, “Fractional-difference stochastic model of evolutionary substitutions in DNA sequences,” Physics Letters A 256, 188-196 (1999). This paper also discusses assumptions made in models of DNA evolution. Abstract

D. R. Bickel, “Implications of fluctuations in substitution rates: Impact on the uncertainty of branch lengths and on relative-rate tests,” Journal of Molecular Evolution 50, 381-390 (2000). Abstract

D. R. Bickel and B. J. West, “Molecular evolution modeled as a fractal Poisson process in agreement with mammalian sequence comparisons,” Molecular Biology and Evolution 15, 967-977 (1998). Abstract

D. R. Bickel and B. J. West, “Molecular evolution modeled as a fractal renewal point process in agreement with the dispersion of substitutions in mammalian genes,” Journal of Molecular Evolution 47, 551-556 (1998). Abstract

B. J. West and D. R. Bickel, “Molecular evolution modeled as a fractal stochastic process,” Physica A 249, 544-552 (1998). Abstract

Stochastic Intermittency Publications

D. R. Bickel, “Smoothing before estimating uncertainty, scaling, and intermittency: Application to short heart rate signals,” Fractals 11, 245-252 (2003). Abstract | Full preprint

D. R. Bickel and D. Lai, “Asymptotic distribution of time-series intermittency estimates: applications to economic and clinical data,” Computational Statistics and Data Analysis 37, 419-431 (2001). Abstract | Full preprint

D. R. Bickel, “Estimating the intermittency of point processes with applications to human activity and viral DNA,” Physica A 265, 634-648 (1999). Abstract

D. R. Bickel, “Simple estimation of intermittency in multifractal stochastic processes: Biomedical applications,” Physics Letters A 262, 251-256 (1999). Abstract

D. R. Bickel, “Rest quantified by a fractal dimension of movement events: A biomedical application of intermittency estimation,” Fractals 8, 1-6 (2000). Abstract | Full article

D. R. Bickel, “Generalized entropy and multifractality of time-series: Relationship between order and intermittency,” Chaos, Solitons & Fractals 13, 491-497 (2002). Abstract

Heart Rate Variability

D. R. Bickel, M. T. Verklan, and J. Moon, “Detection of anomalous diffusion using confidence intervals of the scaling exponent with application to preterm neonatal heart rate variability,” Physical Review E 58, 6440-6448 (1998). Abstract | Full article

M. T. Verklan, D. R. Bickel, and J. Moon, “Heart rate variability of preterm neonates quantified by energy entropy,” Nursing and Health Sciences 1, 103-111 (1999). Abstract

Other publications

D. R. Bickel, comment on "Sequential Monte Carlo for Bayesian Computation" (P. Del Moral, A. Doucet, A. Jasra) in Bayesian Statistics 8 (Oxford Science Publications, 2007, p. 140), available as arXiv:math.ST/0606557 (2006). | Abstract: Is there a class of static inference problems for which the backward-kernel approach is better suited than a mixture transition kernel that automatically adapts to the target distribution?

C. A. Ordonez, D. R. Bickel, V. C. Venezia, F. D. McDaniel, S. E. Matteson, and M. I. Molina, “Electronic ion energy loss calculations on the basis of the binary encounter approximation,” Journal of Nuclear Materials 264, 133-140 (1999). Abstract | This article includes a description of the accept-reject Monte Carlo simulations of D. R. Bickel, The Stopping Power of Amorphous and Channelled Silicon at all Energies as Computed with the Binary Encounter Approximation, MA thesis, University of North Texas, Denton, Texas (1994).

The two gene expression sections include work applying empirical Bayes (decision-theoretic) and BIC methodology.

Selected Abstracts

D. R. Bickel, “Robust and efficient estimation of the mode of continuous data: The mode as a viable measure of central tendency,” Journal of Statistical Computation and Simulation; peer-reviewed preprint: InterStat, November 2001, http://interstat.stat.vt.edu/interstat/articles/2001/abstracts/n01001.html-ssi.

Although a natural measure of the central tendency of a sample of continuous data is its mode (the most probable value), the mean and median are the most popular measures of location due to their simplicity and ease of estimation. The median is often used instead of the mean for asymmetric data because it is closer to the mode and is insensitive to extreme values in the sample. However, the mode itself can be reliably estimated by first transforming the data into approximately normal data by raising the values to a real power, and then estimating the mean and standard deviation of the transformed data. With this method, two estimators of the mode of the original data are proposed: a simple estimator based on estimating the mean by the sample mean and the standard deviation by the sample standard deviation, and a more robust estimator based on estimating the mean by the median and the standard deviation by the standardized median absolute deviation. Both of these mode estimators were tested using simulated data drawn from normal (symmetric), lognormal (asymmetric), and Pareto (very asymmetric) distributions. The latter two distributions were chosen to test the generality of the method since they are not power transforms of the normal distribution. Each of the proposed estimators of the mode has a much lower variance than the mean and median for the two asymmetric distributions. When outliers were added to the simulations, the more robust of the two proposed mode estimators had a lower bias and variance than the median for the asymmetric distributions, especially when the level of contamination approached the 50% breakdown point. It is concluded that the mode is often a more reliable measure of location than the mean or median for asymmetric data. The proposed estimators also performed well relative to previous estimators of the mode. While different estimators are better under different conditions, the proposed robust estimator is reliable for a wide variety of distributions and contamination levels.

Return to Top | Full article

D. R. Bickel, “Microarray gene expression analysis: Data transformation and multiple-comparison bootstrapping,” Computing Science and Statistics 34, 383-400, Interface Foundation of North America (Proceedings of the 34th Symposium on the Interface, Montréal, Québec, Canada, April 17-20, 2002).

A simple transform function is proposed to preprocess the intensity of gene expression, where the intensity can be that of a colored dye for cDNA microarrays or a gauge of probe matching for oligonucleotide arrays. A new measure of skewness is introduced to show that the transform function effectively reduces the asymmetry of intensity values for Affymetrix data of Golub et al. (1999). This transform approaches a logarithmic transform for large intensities, but approaches a linear transform for small intensities, so that the effect of spurious ratios of small intensities is avoided. When the intensity is the average difference (AD) score, the suggested transform function preserves the stochastic nature of AD values rather than resetting negative values to an arbitrary positive value. A conservative estimator of the fold-change based on this transform is proposed. After the B-cell ALL and AML data of Golub et al. (1999) was transformed, a nonparametric bootstrapping method found that the number of genes considered differentially expressed is 48 when controlling the family-wise error rate at the 5% level and 572 when controlling the false-discovery rate at the 1% level.

Return to Top | Full article

D. R. Bickel, “Conservative identification of differentially expressed genes using cDNA or oligonucleotide microarrays: Inference about values of high probability density,” 2002 Proceedings of the Joint Statistical Meetings of the American Statistical Association (Biometrics Section).

Many methods of identifying differential expression in genes depend on testing the null hypothesis of equal mean expression for each gene across two groups, even though a difference in the mean does not imply any difference in the distribution center. This can lead to many genes considered differentially expressed that might only differ in the tails of their expression distributions. A more conservative approach is to specifically test whether distributions differ in a parameter of location that does not depend on the tails. This can be accomplished by bootstrapping outlier-rejecting estimators of location parameters. Genes identified as differentially expressed can then be used in classification. In distinguishing microarrays from patients with different types of leukemia, the expression values of many more genes were found to differ in their means than were found to differ in their central values. The data was preprocessed using a transform that approaches a logarithmic transform for large intensities, but approaches a linear transform for small intensities, so that the effect of spurious ratios of small intensities was avoided; negative AD values were not arbitrarily truncated.

Return to Top | Full article

D. R. Bickel, “Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically,” Bioinformatics.

Motivation: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. Results: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to the data of DeRisi et al. (1997), showing that rank-based methods perform better than log-based methods.

Return to Top | Full article | Software (PLATO)

D. R. Bickel, “Robust estimators of the mode and skewness of continuous data,” Computational Statistics and Data Analysis 39, 153-163 (2002).

Measures of location based on the shortest half sample, including the shorth and the location of the least median of squares, are more robust than the median to outliers, but less robust to contamination near the location. Although such measures can estimate the mode, the proposed estimator of the mode, based on densest half ranges, has a much lower bias while having similar robustness. Like the median, this mode estimator has the highest breakdown point possible: the estimator has meaning if less than half the sample consists of outliers. The mode is more robust than the median in that the mode estimates are unaffected by outliers, whereas the median is influenced by each outlier. Robustness in this sense is quantified by the rejection point, the largest absolute value that is not rejected, which is low for the mode but infinite for the median. Even though the median is changed less by contamination near the location than is the mode, outliers generally pose more of a problem to estimation than contamination near the location, so the mode is more robust for data that may have a large number of outliers. A robust estimator of skewness is based on this mode estimator. Copyright (c) 2002 Elsevier Science B.V. All rights reserved.

Return to Top | Full article

D. R. Bickel, “Smoothing before estimating uncertainty, scaling, and intermittency: Application to short heart rate signals,” Fractals

Three aspects of time series are uncertainty (dispersion at a given time scale), scaling (time-scale dependence), and intermittency (inclination to change dynamics). Simple measures of dispersion are the mean absolute deviation and the standard deviation; scaling exponents describe how dispersions change with the time scale. Intermittency has been defined as a difference between two scaling exponents. After taking a moving average, these measures give descriptive information, even for short heart rate records. For this data, dispersion and intermittency perform better than scaling exponents.

Return to Top | Full article

The intermittency of a time series can be defined as its normalized difference in scaling parameters. We establish the central limit theorem for the estimates of intermittency under the null hypothesis of a random walk. Simulations of random walks indicate that the distribution of intermittency estimates is slightly negatively skewed and positively biased, but that the skewness and bias approach zero as the length n of the random walks increases. We provide a formula by which the sample variance of the intermittency estimates of these simulations can be used to approximate the standard error of the intermittency for any large n. These results can be used to test whether the intermittency estimate of an observed long time series is significantly greater than zero, the intermittency of a random walk. This test reveals that the intermittency estimates of the S&P 500 index and of the heart rate of a human adult are significantly positive. The hypothesis testing proposed in this paper can also be applied to other observed time series to determine whether their intermittency estimates are sufficiently high for the series to be considered intermittent, or whether their estimates are small enough to be consistent with a random walk. Copyright (c) 2001 Elsevier Science B.V. All rights reserved.

Return to Top | Full article

D. R. Bickel, “Rest quantified by a fractal dimension of movement events: A biomedical application of intermittency estimation,” Fractals 8, 1-6 (2000).

The intermittency of a time-series, i.e. the extent to which it departs from slowly-varying, unifractal dynamics, can often be quantified by simple scale-free statistics. For fractal point processes, singular measures, and certain other models that describe physical and biological phenomena, the correlation co-dimension, C₂, quantifies intermittency. C₂ of human activity during the night quantifies restfulness in that it is negatively correlated with the average activity level. However, C₂ of activity appears to be more sensitive to the use of steroids than the average activity level. Copyright (c) 2000 World Scientific Publishing Company.

Return to Top | Full article

D. R. Bickel, “Generalized entropy and multifractality of time-series: Relationship between order and intermittency,” Chaos, Solitons & Fractals 13, 491-497 (2002).

The intermittency of a time-series is its tendency to have large departures from its characteristic dynamics. The quantification of intermittency has applications to the study of physical, biological, and economic phenomena. Intermittency has been quantified by multifractality, the extent to which generalized Hurst exponents differ. As an alternative descriptor of intermittent processes, we present a nonextensive measure of order, based on the Tsallis entropy of a sequence of symbols corresponding to the time-series. Like multifractality, nonextensive order increases with intermittency. Nonextensive order has the advantage that it does not assume scaling in the time-series, whereas a scaling region has to be identified in order to estimate multifractality. However, unlike multifractality, nonextensive order requires the selection of parameters used to generate subsequences of symbols from the time-series.
Both nonextensive order and multifractality can distinguish time-series that have different levels of intermittency. In distinguishing simulated point processes of D=0.1 from those of D=0.5, nonextensive order and multifractality performed about equally well and nonextensive order performed better than its extensive counterpart. Multifractality more accurately distinguished processes with D=0.5 from those of D=0.9. Which statistic better describes a time-series depends on the specific application. Copyright (c) 2002 Elsevier Science B.V. All rights reserved.

Return to Top

B. J. West and D. R. Bickel, “Fractional-difference stochastic model of evolutionary substitutions in DNA sequences,” Physics Letters A 256, 188-196 (1999).

The number of molecular substitutions occurring in a DNA sequence over a given time is described by a fractional difference random walk model. This is empirically motivated stochastic model of molecular evolution and does not address the detailed evolutionary mechanisms that lead to the substitution of nucleotides. This fractal stochastic process yields a Fano Factor, the ratio of the variance to the mean in the number of molecular substitutions, that increases as a power law in time. This prediction agrees with the observed statistics across 49 different genes in mammals. The fractional-difference model of molecular evolution is episodic and can be made consistent with the punctuated equilibrium model of macroevolution. Copyright (c) 1999 Elsevier Science B.V. All rights reserved.

Return to Top

D. R. Bickel, “Implications of fluctuations in substitution rates: Impact on the uncertainty of branch lengths and on relative-rate tests,” Journal of Molecular Evolution 50, 381-390 (2000).

Many tests of the lineage-dependence of substitution rates, computations of the error of evolutionary distances, and simulations of molecular evolution assume that the rate of evolution is constant in time within each lineage descended from a common ancestor. However, estimates of the index of dispersion of numbers of mammalian substitutions suggest that the rate has time-dependent variations consistent with a fractal-Gaussian-rate Poisson process, which assumes common descent without assuming rate constancy. While this model does not affect certain relative-rate tests, it substantially increases the uncertainty of branch lengths. Thus, fluctuations in the rate of substitution cannot be neglected in calculations that rely on evolutionary distances, such as the confidence intervals of divergence times and certain phylogenetic reconstructions. The fractal-Gaussian-rate Poisson process is compared and contrasted with previous models of molecular evolution, including other Poisson processes, the fractal renewal process, a Lvy-stable process, a fractional-difference process, and a log-Brownian process. The fractal models are more compatible with mammalian data than the non-fractal models considered, and they may also be better supported by Darwinian theory. Although the fractal-Gaussian-rate Poisson process has not been proven to have better agreement with data or theory than the other fractal models, its Gaussian nature simplifies the exploration of its impact on evolutionary distance errors and relative-rate tests. Copyright (c) 2000 Springer-Verlag New York Inc.

Return to Top

D. R. Bickel and B. J. West, “Molecular evolution modeled as a fractal Poisson process in agreement with mammalian sequence comparisons,” Molecular Biology and Evolution 15, 967-977 (1998).

The fractal doubly-stochastic Poisson process (FDSPP) model of molecular evolution, like other doubly-stochastic Poisson models, agrees with the high estimates for the index of dispersion found from sequence comparisons. Unlike certain previous models, the FDSPP also predicts a positive geometric correlation found between the index of dispersion and the mean number of substitutions. Such a relationship is statistically proven herein using comparisons between 49 mammalian genes. There is no characteristic rate associated with molecular evolution according to this model, but there is a scaling relationship in rates according to a fractal dimension of evolution. The FDSPP is a suitable replacement for the homogeneous Poisson process in tests of the lineage-dependence of rates and in estimating confidence intervals for divergence times. As opposed to other fractal models, this model can be interpreted in terms of Darwinian selection and drift. Copyright (c) 1998 Society for Molecular Biology and Evolution.

Return to Top

A fractal renewal point process (FRPP) is used to model molecular evolution in agreement with the relationship between the variance and mean numbers of nonsynonymous and synonymous substitutions in mammals. Like other episodic models such as the doubly-stochastic Poisson process, this model accounts for the large variances observed in amino acid substitution rates, but unlike certain other episodic models, it also accounts for the increase in the index of dispersion with the mean number of substitutions in Ohta's (1995) data. We find that this correlation is significant for nonsynonymous substitutions at the 1% level and for synonymous substitutions at the 10% level, even after removing lineage effects and when using Bulmer's (1989) unbiased estimator of the index of dispersion. This model is simpler than most other overdispersed models of evolution in the sense that it is fully specified by a single interevent probability distribution. Interpretations in terms of chaotic dynamics and in terms of chance and selection are discussed. Copyright (c) 1998 Springer-Verlag New York Inc.

Return to Top

B. J. West and D. R. Bickel, “Molecular evolution modeled as a fractal stochastic process,” Physica A 249, 544-552 (1998).

Modeling the rate of nucleotide substitutions in DNA as a dichotomous stochastic process with an inverse power-law correlation function describes evolution by a fractal stochastic process (FSP). This FSP model agrees with recent findings on the relationship between the variance and mean number of synonymous and nonsynonymous substitutions in 49 different genes in mammals, that being a power-law increase in the ratio of the variance to the mean, the index of dispersion, with the number of substitutions in a protein. The probability of a given number of substitutions occurring in a time t is determined by a fractional diffusion equation whose solution is a truncated Lvy distribution implying that evolution is a Lvy process in time and yields the same functional behavior for the variance in the number of substitutions as the FSP model. In addition to obtaining these relationships, the FSP model implies lognormal statistics for the index of dispersion as a function of the mean number of substitutions in a protein, which is confirmed in the regression of the FSP model to data. Lognormal statistics suggest that molecular evolution can be viewed as a multiplicative stochastic process, rather than the linear additive process of Darwinian selection and drift. Copyright (c) 1998 Elsevier Science B.V. All rights reserved.

Return to Top

D. R. Bickel, “Estimating the intermittency of point processes with applications to human activity and viral DNA,” Physica A 265, 634-648 (1999).

The intermittency of a point process is the extent to which the number of events in a time window has pronounced departures from typical values. Combining point process and multifractal formalisms indicates that the correlation codimension can be used to quantify intermittency. The correlation codimension is easily estimated and is simply related to other second order scaling exponents, such as those of the Fano factor and spectral density. The correlation codimensions are derived for various uncorrelated, fractal, and fractal-rate point processes. In addition, the estimation of intermittency as the correlation codimension of experimental events is illustrated with applications to experimental data. Human activity during bed rest is highly intermittent, while other human activity and viral DNA composition are non-intermittent. Copyright (c) 1999 Elsevier Science B.V. All rights reserved.

Return to Top

D. R. Bickel, “Simple estimation of intermittency in multifractal stochastic processes: Biomedical applications,” Physics Letters A 262, 251-256 (1999).

A number of physical and biological phenomena are intermittent in the sense that they tend to have large departures from their typical dynamics. The intermittency of a multifractal can be qualified and quantified by differential or nondifferential multifractality, the extent to which the generalized Hurst exponents differ. Multifractality is related to the generalized dimension of a singular measure, but also applies to other signals, including noises, walks, anomalous diffusion, and point processes. Multifractality has uses in data-model and data-data comparisons; e.g., the multifractality of the heart rate reveals the inadequacy of unifractal models and distinguishes healthy subjects from those with heart failure. In addition, the multifractality of human activity quantifies restfulness at night. Copyright (c) 1999 Elsevier Science B.V. All rights reserved.

Return to Top

D. R. Bickel and B. J. West, “Multiplicative and fractal processes in DNA evolution,” Fractals 6, 211-217 (1998).

Darwin's theory of evolution by natural selection revolutionized science in the nineteenth century. Not only providing a new paradigm for biology, the theory formed the basis for analogous interpretations of complex systems studied by other disciplines, such as sociology and psychology. With the subsequent linking of macroscopic phenomena to microscopic processes, the Darwinian interpretation was adopted to patterns observed in molecular evolution by assuming that natural selection operates fundamentally at the level of DNA. Thus, patterns of molecular evolution have important implications in many fields of science. Although the evolution rate of a given gene seems to be of approximately the same order of magnitude in all species, genes appear to differ in rate from one another by orders of magnitude, a fact which standard theory does not adequately explain. An understanding of the statistics of rates across different genes may shed light on this problem. The evolution rates of mammalian DNA, based on recent estimates of numbers of nonsynonymous substitutions in 49 genes of humans, rodents, and artiodactyls, are studied. We find that the rate variations are better described by lognormal statistics, as would be the case for a multiplicative process, than by Gaussian statistics, which would correspond to a linear, additive process. Thus, we introduce a multiplicative evolution statistical hypothesis (MESH), in which the theoretical explanation of these statistics requires the evolution of different substitution rates in different genes to be a multiplicative process in that each rate results from the interaction of a number of interdependent contingency processes. Lognormal statistics lend support to fractal process models of DNA substitutions, including anomalous diffusion processes and fractal stochastic point processes, such as the fractal renewal process and the fractal doubly-stochastic Poisson process. The realization of a fractal process is a random self-similar time series with a power-law autocorrelation function, spectral density, and Fano factor over many time scales. Copyright (c) 1998 World Scientific Publishing Company.

Return to Top

The scaling exponent of the root mean square (rms) displacement quantifies the roughness of fractal or multifractal time series; it is equivalent to other second-order measures of scaling, such as the power-law exponents of the spectral density and autocorrelation function. For self-similar time series, the rms scaling exponent equals the Hurst parameter, which is related to the fractal dimension. A scaling exponent of 0.5 implies that the process is normal diffusion, which is equivalent to an uncorrelated random walk; otherwise, the process can be modeled as anomalous diffusion. Higher exponents indicate that the increments of the signal have positive correlations, while exponents below 0.5 imply that they have negative correlations. Scaling exponent estimates of successive segments of the increments of a signal are used to test the null hypothesis that the signal is normal diffusion, with the alternate hypothesis that the diffusion is anomalous. Dispersional analysis, a simple technique which does not require long signals, is used to estimate the scaling exponent from the slope of the linear regression of the logarithm of the standard deviation of binned data points on the logarithm of the number of points per bin. Computing the standard error of the scaling exponent using successive segments of the signal is superior to previous methods of obtaining the standard error, such as that based on the sum of squared errors used in the regression; the regression error is more of a measure of the deviation from power-law scaling than of the uncertainty of the scaling exponent estimate. Applying this test to preterm neonate heart rate data, it is found that time intervals between heart beats can be modeled as anomalous diffusion with negatively correlated increments. This corresponds to power spectra between 1/f and 1/f², whereas healthy adults are usually reported to have 1/f spectra, suggesting that the immaturity of the neonatal nervous system affects the scaling properties of the heart rate. Copyright (c) 1998 The American Physical Society.

Full article

M. T. Verklan, D. R. Bickel, and J. Moon, “Heart rate variability of preterm neonates quantified by energy entropy,” Nursing and Health Sciences 1, 103-111 (1999).

Identifying variables predictive of neurobehavioral sequelae is a key objective in the study of high-risk neonates. Examination of heart rate variability (HRV) characteristics may be a finer discriminator of the neonate's response to physiologic stressors than the mean heart rate. The energy entropy of the heartbeat tachogram, computed in four different domains, was used to quantify the HRV in 13 preterm neonates. The entropies of energies were computed from 1024 interbeat time intervals obtained once per week from 26 to 35 weeks post-conceptional age (PCA). The energy entropy computed in three of the domains, like the standard deviation of intervals, distinguished between the 10 neonates that were measured at 35 weeks PCA with 100% specificity and 67% sensitivity, but did not distinguish between healthy and unhealthy neonates at earlier ages. The findings suggest that energy entropy may be a discerning measure of physiologic stress in the preterm infant, although future research is needed to refine the test and determine statistical significance. Copyright (c) 1999 The Japanese Urological Association.

Return to Top

Electronic ion energy loss calculations on the basis of the binary encounter approximation are presented for protons in oxygen, nitrogen and silicon. Calculations using both an analytical approach as well as a Monte Carlo approach are found to agree well with experimental data even down to energies below the stopping cross section maximum. Energy loss calculations for protons in silicon under channeling conditions are included and predictions are made for channeling in the <110> direction at low energies (5 to 500 keV). Copyright (c) 1999 Elsevier Science B.V. All rights reserved.

About this page

RSS feed

Last modified September 4, 2013

Author information. personal web page

This web site is not affiliated with the University of Ottawa.