# 3. Ensemble Statistics, Probability, Stochastic Processes, and their Temporal Counterparts

Theme: For stationary and cyclostationary time series, a wrong turn in their mathematical modeling was taken almost a century ago. Today, Academia should engage in remediation to overcome the detrimental influence on the teaching and practice of time-series analysis in Science and Engineering.

The objective of this page is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the theory of Stochastic processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Notes on the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.

The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.

Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.

Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website.

Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp with the continuous-time theory.

Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But, the statistics sought with Monte Carlo simulations of stationary and  cyclostationary time series can more easily be obtained from time averages on a single record. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records.  In such cases, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.5. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.

For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist, and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally-trained employees who peter out and hire a new batch.

In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.

FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from anything empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.

BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t s) (with some abuse of notation due to the use of x to denote two distinct functions). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.5. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.

The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of this page 3. On page 3.5 it is shown that the differences referred to here are in some cases advantageous for the FOT-probabilistic model and in other cases disadvantageous.

The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically-driven work, in which the theory of stochastic processes surfaced, which ran its course in about 50 years. The mathematically driven development of stochastic processes, has continued beyond that initial period, but primarily centered on nonstationary processes, rather than primarily stationary processes.  The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20th century and extending to the present.

##### Mathematically Driven Development of Probability Spaces and Stochastic Processes as the Preferred Conceptual/Mathematical Basis for Time Series Analysis (1900-1950)
• Josiah Willard Gibbs (Ensemble Average)
• Henri Leon Lebesgue (Probability Space)
• Maryan von Smoluchowski (Brownian Motion)
• Albert Einstein (Brownian Motion)
• Norbert Wiener (Brownian Motion)
• Aleksandr Jakovlevich Khinchin (Stochastic P.)
• Herman Ole Andreas Wold (Stochastic Process)
• Andrei Nikolaevich Kolmogorov (Stochastic P.)
• Harold Cramer (Stochastic Process)
• Joseph L. Doob (Stochastic Process)
##### Empirically Driven Development of Time-Series Analysis Methodology (1650-1950)
• Isaac Newton (1642-1727)
• Leonard Euler (1707-1783)
• Joseph Louis Lagrange (1736-1813)
• Christopher H. D. Buys-Ballot (1817-1890)
• George Gabriel Stokes (1819-1903)
• Sir Arthur Schuster (1851-1934)
• John Henry Poynting (1852-1914)
• Albert Abraham Michelson (1852-1931)
• George Udny Yule (1871-1951)
• Evgency Egenievish Slutsky (1880-1948)
• Karl Johann Stumpff (1895-1970)
• Herman Ole Andreas Wold (1908-1992)
• Charles Goutereau (18XX-19XX)
• Norbert Wiener (1894-1964)
• Percy John Daniell (1889-1946)
• Maurice Stevenson Bartlett (1910-2002)
• Ralph Beebe Blackman (1904-1990)
• 3.1 Fraction-of-time Probability for Time-Series that Exhibit Cyclostationarity

The following article, FRACTION-OF-TIME PROBABILITY FOR TIME-SERIES THAT EXHIBIT CYCLOSTATIONARITY, Signal Processing, Vol. 23, No. 3, pp. 273-292, by William A Gardner and William A Brown [JP34], was published in 1991, 5 years after this novel probability theory was introduced in the book [Bk2].  Thirty years hence, this article remains the single most complete and easy-to-read accounting of this probability theory aimed at a readership of statistical time-series analysis practitioners. For this reason, it is incorporated here as part of this Page 3, as an encouragement to readers to make this their first detailed encounter with this novel probability theory. In comparison with other worthy sources on this theory, including primarily the originating book [Bk2], the 2006 survey paper [JP64], the 2006 development of a measure-theory foundation [J24], and the most recent and most comprehensive treatment of cyclostationarity in general, the 2019 book [B2], this treatment is both concise and quite complete.

The next section, 3.2, is a compilation of other treatments, and starts off with Section IV of the presentation slides from the kick-off plenary lecture at the first international workshop on cyclostationarity, held in 1992.These presentation slides comprise a summary of parts of the introductory chapter of the book [Bk5] that came out of the workshop and is a good candidate for the next treatment to be considered by the interested student.

Fraction-of-time Probability for Time-Series that Exhibit Cyclostationarity
• 3.2 Compilation of Discussions of Fraction-of-Time vs Stochastic Probabilistic Theories

This page consists of a compilation of unpublished essays, workshop presentations, published articles, brief technical notes, communications between collaborators, etc. on the pros and cons of a paradigm shift from the stochastic process theory of cyclostationarity to the Fraction-of-Time probabilistic theory of cyclostationarity.

The first item is the set of slides used for Section IV of the opening Plenary Lecture for the first international Workshop on Cyclostationarity.  To repeat an explanation given on Page 2, some readers may wonder why this is appropriate considering that this workshop was held 30 years ago! (in 1992). I consider this appropriate because I developed these slides specifically for a broad group of highly motivated students. I say they were students solely because they traveled from far and wide specifically to attend this educational program. In fact, the participants of the workshop were mostly senior researchers in academia, industry, and government laboratories. Knowing the workshop was a success and knowing all the topics covered are as important today as they were then, I have chosen this presentation as ideal for the purposes of this website. In particular, the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity is in about the same state it was in 30 years ago, with the important exception of progress on measure-theoretic considerations of these two alternative theories that is reported on Page 3.5. That is, many of the questions raised in1992, particularly those involving stochastic-process models and surrounding the concept of cycloergodicity, remain unanswered, though a few have been addressed in published journal papers that are cited throughout Page 3.

The unavoidable absence of detail in the presentation slides for Sec. IV presented below is made up for, to the extent that progress has been achieved in the ensuing 30 years, throughout this Page 3 and the sources linked to herein.

Because the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity summarized in this Section IV of the Plenary Lecture is a relatively technical subject, it is recommended that students consider this section to be only a concise overview and that they follow up on it with Chapter 1 in the book [Bk5], This chapter not only describes the duality between the stochastic and nonstochastic theories of cyclostationarity, but also derives the nonstochastic FOT-probabilistic theory from an inquiry into the nature of the property of time functions that is responsible for the defining characteristic of cyclostationarity: that fine-strength sine waves can be generated from cyclostationary functions by subjecting the functions to time-invariant nonlinear transformations. This inquiry leads naturally to the definitions of cyclic probabilistic moments and cyclic probability distributions and, more generally, cyclic expectation; and, in Chapter 2 of the book [Bk5], cyclic probabilistic cumulants. This is to be contrasted with the stochastic theory of cyclostationarity in which these key probabilistic quantities are simply posited on the basis of mathematical considerations only, with not even a mention of generating sine waves, which is a key characteristic of the physical manifestation of cyclostationarity.

The direct relevance of this discussion to the primary subject of this website is the claim herein that science and engineering were done great harm by mathematicians’ hard sell of the stochastic process model to the exclusion of the non-stochastic time-series model that came before.

With a brief look ahead at Page 7, one can surmise that this hard sell reflects inadequate Right-Brain (RB) activity which would have been required to reveal the absence of a necessity to use such unrealistic and overly abstract models—something that has unnecessarily burdened teachers and students alike, and of course practicing engineers and scientists, with the challenge to each and every one of them to bring to bear the considerable RB activity required to make sense of the huge conceptual gap between the reality from nature of a single time-series of measured/observed data and the mathematical fiction of a typically-infinite ensemble of hypothetical time-series together with a probability law (a mathematical creation) governing the ensemble average over all the fictitious time series. All these poor unsuspecting individuals were left to close this conceptual gap on their own, being armed with nothing more than a mathematical theorem, which only rarely can be applied in practice, that gives the mathematical condition on a stochastic process model under which its ensemble averages equal (in an abstract sense; i.e., with probability equal to 1) the time averages over individual time-series in the ensemble. This condition on the probability law ensures that expected values of a proposed stochastic process mathematically calculated (a Left-Brain (LB) activity) from the mathematical model equal time averages measured from a single time-series member of the ensemble, assumed to be the times series that actually exists in practice. But this equality imposes another condition, namely that we mathematically take the limit of the time average as the amount of averaging time approaches infinity. Thus, the theorem—called the Ergodic Theorem—doesn’t actually address reality, because one never has an infinitely long segment of time-series data. Moreover, the theorem is of little-to-no operational utility because the condition on the probability law can only rarely be tested for a given specific stochastic process model. Thus, most users of stochastic process theory rely conceptually on what is called the Ergodic Hypothesis by which one simply assumes the condition of the Ergodic Theorem is satisfied for whatever stochastic process model one chooses to work with. Faith of this sort has no place in science and engineering.

In my opinion, acceptance of all this gibberish and going forward with the stochastic process concept as the only option for mathematically modeling real time-series data requires abandonment of RB thinking. There really is no way to justify this abstraction of reality as a necessary evil.  The fraction-of-time probabilistic model of single times series is an alternative option that avoids departing so far from the reality of measured/observed time-series data, its empirical statistical analysis, the mathematical modeling of the time-series, and the results of the analysis.  The wholesale adoption by academicians of the stochastic process foisted upon them by mathematicians suggests these academicians, as well as the mathematicians, suffer from low-level RB activity. These general remarks are backed up by a detailed mathematical comparative analysis presented on Page 3.5.

• 3.3 The Hierarchy of Non-Stochastic Theories of Time Series

Statistical metrics for time series such as mean, bias, variance, coefficient of variation, covariance, and correlation coefficient can be defined using finite-time averages as replacements for expected values in well-known probabilistic metrics. These statistical metrics also can be arrived at from nothing more than a little thought, without any reference to probability or expected value. In fact, all of these statistical metrics were in use long before the probabilistic theory of stochastic processes was developed.

In the book [Bk2], such non-probabilistic statistical metrics are used for statistical spectral analysis. The resultant theory for understanding how to perform and study statistical spectral analysis is the lowest level in a hierarchy of non-stochastic theories of statistical spectral analysis and, more generally, time-series analysis. This level is referred to as the purely empirical non-probabilistic theory.

The next level up in the hierarchy is referred to as the purely empirical FOT-probabilistic theory, where, as explained elsewhere, FOT stands for Fraction-of-Time. This theory is defined in the presentation below. The third and highest level in the hierarchy is referred to as the non-stochastic FOT-probabilistic theory. This theory is fully developed in the book [Bk2].

In this section, the terms purely empirical, probabilistic, and stochastic are defined and the three individual levels of the hierarchy are defined and illustrated. The following material was prepared for presentation at the 2021 On-Line Grodek Conference on Cyclostationarity.

#### 1. COMPLETING THE FAMILY OF NONSTOCHASTIC THEORIES OF TIME SERIES

Preliminary Definitions – –

Def: PURELY EMPIRICAL (THEORY/METHOD)

– Excludes ensembles of outcomes of hypothetical experiments

– Excludes mathematical limits as some parameter approaches infinity, such as averaging time

– Excludes all quantities (e.g., Expected Values) that are not identifiable as, or cannot be calculated/computed from, recorded physical measurements or observations

– Consequence: As applied to Statistical Spectral Analysis: the mathematical descriptions of calculations consist of primarily integral calculus including Fourier transform theory and/or a discrete-time counterpart

Def: STATISTICAL SPECTRUM

– A Statistical Spectrum is an empirically-averaged spectrum

– A Probabilistic Spectrum (e.g., the standard Power Spectral Density of a stochastic process) is a mathematical quantity

– Ex: The Statistical Probabilistic Theory of Communications. (The traditional name used is statistical, but the theory is mostly probabilistic)

#### 2. HIERARCHY OF NONSTOCHASTIC THEORIES OF TIME SERIES

In order of Ease of Mathematization (proving existence of key quantities; level of mathematical sophistication required):

1. The Purely Empirical NON-probabilistic theory (finite time) for approximately S (stationary), CS (cyclostationary), and PCS (poly-cyclostationary) time series–introduced in 1987 for statistical spectral analysis

2. NEW: The Purely Empirical FOT-probabilistic theory (finite time) for approximately S, CS, and PCS time series–formally introduced in 2021

3. The Nonstochastic FOT-probabilistic theory (infinite time) for exactly S, CS, PCS, and almost cyclostationary (ACS) time series–introduced in 1987 for statistical spectral analysis

#### 3. THE FOT-PROBABILISTIC THEORY OF STATISTICAL SPECTRAL ANALYSIS IS NOTPURELY EMPIRICAL

– Existing theory for stationary and cyclostationary time series is not purely empirical because the key quantities in the theory are based on infinite limits of time averages and evaluation of these limits requires an analytical model of a time series, not just empirical measurements represented by mathematical symbols.

– The required property of joint relative measurability in the 2006 Leskow-Napolitano theory cannot be verified empirically, because it requires analytical calculation based on an analytical model of an infinitely long time series.

– Strictly speaking, no times series can be said to be “at hand” if it is infinitely long. This is a term we have used since my 1987 SSA book to refer to a single time series as distinguished from a hypothetical ensemble of time series. But this term is used loosely when applied to infinitely long time series.

– Similarly, Almost CS time series cannot be distinguished from PolyCS time series in a purely empirical theory (PolyCS means exhibits at most a finite number of harmonically unrelated cycle frequencies)

#### 4. A PURELY EMPIRICAL NON-PROBABILISTIC THEORY OF STATISTICAL SPECTRAL ANALYSIS EXISTS

– The motivation for using infinitely-long time averages is that it enables exact quantification (analogous to expected values), not just approximation

– But all averages in a purely empirical theory must be based on finite-time averages. Finite-length time series can indeed be “at hand”.

– My 1987 SSA book presents an analytical nonprobabilistic theory of statistical (time-averaged) spectral analysis that approximately quantifies temporal and spectral resolution and reliability (repeatability over time) for finite-length time series. [see parts of Chapters 2, 3, 7, 11, and slide 5 here]

– But this theory does not use the concept of probability—the closest thing to it that is used is the calculated finite-time temporal coefficient of variation of time-dependent measurements, such as spectral density and spectral correlation density.

#### 5. A PURELY EMPIRICAL FOT-PROBABILISTIC THEORY OF STATISTICAL SPECTRAL ANALYSIS

– When the concept of probability is introduced in the 1987 SSA book, it is done in terms of infinite limits of time averages and it, therefore, forfeits the empiricism of the finite-time nonprobabilistic theory.

– We can easily construct a Purely Empirical FOT-Probabilistic theory as long as we accept approximate quantification of some of the relationships.

– Example 1: We can show that a statistical spectrum is approximately normal for sufficiently large averaging time, but we cannot prove it is asymptotically exactly normal, because infinite limits are outside the scope of the calculations allowed by an empirical theory

– Example 2: We can show that the difference between time-averaged and frequency-smoothed statistical spectral correlation measurements can be made small when the temporal/spectral resolution product is large, but we cannot prove it is asymptotically zero, because infinite limits are outside the scope . . .

#### 6. ELEMENTS OF THE PURELYEMPIRICAL FOT-PROBABILISTIC THEORY

– Definitions of finite-time FOT Cumulative Distributions for approximately S, CS, and PolyCS time series

– Same as those for exactly S, CS, and PolyCS time series, but without the limits

These FOT Cumulative Distributions can be computed from empirical data

Fundamental Theorems of Averaging and Sine-Wave Component Extraction: same

#### 7. FINITE-TIME FOT CUMULATIVE DISTRIBUTIONS (CDs)

Same as existing infinite-time CD definitions, but without taking the limit as averaging time

– Approximately Stationary FOT CD

– Approximately Cyclostationary FOT CD

– Approximate -Cyclic FOT CD

– Approximate -Periodic FOT CD

– Exact Relationship (see slide 8 for proof)

– Approximate Polyperiodic FOT CD

#### 8. DERIVATION OF

Notes:

1) The shift by is introduced in order to exclude at .

2) Because is discontinuous, the use of the Dirac Deltas as shown above requires special justification: because the approximately -periodic CD is defined for all time in an interval of length , the undefined finite values of time samples that occur at time points of discontinuity can be ignored if they occur only for times confined to a possibly-disjoint set of measure zero. Conditions on that guarantee this are being sought.

#### 9. FUNDAMENTAL THEOREM OF APPROXIMATE SINE-WAVE COMPONENT EXTRACTION

Application To Cyclic Moments

#### 10. ACCURACIES OF APPROXIMATIONS

Increase as the number of periods averaged over increases:

#### 11. CONCLUSION

– It has been shown that there exists an entirely empirical FOT probabilistic theory of approximately stationary, cyclostationary and poly-cyclostationary times series.

– All quantities occurring in the theory can be calculated from physically measured/observed time series data on finite intervals

– This theory should appeal to practitioners who actually analyze or otherwise process empirical data.

– Relative to the idealized non-empirical FOT probabilistic theory of exactly (but non-empirical) cyclostationary and poly-cyclostationary and almost cyclostationary time series, this new theory probably has some drawbacks even if the users are restricted to empiricists. Approximate relationships can become messy relative to exact relationships. But, in such situations, one can always temporarily resort to the exact theory based on limits.

• 3.4 Ensembles in Wonderland

The 1987 book, Statistical Spectral Analysis: A Nonprobabilistic Theory, argues for more judicious use of the modern stochastic-process-model (arising from the work of mathematicians in the 1930s, such as Khinchin, Kolmogorov, and others) instead of the more realistic predecessor: the time-series model first developed mathematically by Norbert Wiener in 1930 (see also page 59 of Wiener 1949, written in 1942, regarding the historical relationship between his and Kolmogorov’s approaches), that was briefly revisited in the 1960s by engineers before it was buried by mathematicians. The brief tongue-in-cheek essay Ensembles in Wonderland, published in IEEE Signal Processing Magazine, AP Forum, 1994 and reproduced below, is an attempt at satirizing the outrage typified by narrow-minded thinkers exemplified by two outspoken skeptics, Neil Gerr and Melvin Hinich, who wrote scathing remarks and a book review characterizing this book as utter nonsense.

But first, let us consider the parallel to the book Alice in Wonderland; the following is comprised of excerpts taken from https://en.wikipedia.org/wiki/Alice’s_Adventures_in_Wonderland : Martin Gardner and other scholars have shown the book Alice in Wonderland [written by Lutwidge Dodgson under the pseudonym Lewis Carroll] to be filled with many parodies of Victorian popular culture. Since Carroll was a mathematician at Christ Church, it has been argued that there are many references and mathematical concepts in both this story and his later story Through the Looking Glass; examples include what have been suggested to be illustrations of the concept of a limit, number bases and positional numeral systems, the converse relation in logic, and the ring of integers modulo a specific integer. Deep abstraction of concepts, such as non-Euclidean geometry, abstract algebra, and the beginnings of mathematical logic, was taking over mathematics at the time Alice in Wonderland was being written (the 1860s). Literary scholar Melanie Bayley asserted in the magazine New Scientist that Alice in Wonderland in its final form was written as a scathing satire on new modern mathematics that was emerging in the mid-19th century.

Today, Dodgson’s satire appears to be backward looking because, after all, there are strong arguments that modern mathematics has triumphed. Coming back to the topic of interest here, stochastic processes also have triumphed in terms of being wholly adopted in mathematics and science and engineering, except for a relatively small contingent of empirically-minded scientists and engineers. Yet, recent mathematical arguments, summarized in [B2], provide a sound mathematical basis for reversing this outcome, especially when the overwhelming evidence of practical and pragmatic and pedagogic and overarching conceptual advantages provided in the 1987 book is considered. The present dominance of the more abstract and less realistic stochastic process theory might be viewed as an example of the pitfalls of what has become known as groupthink or the inertia of human nature that resists changes in thinking, which is exemplified on Page 7.

Before presenting the several letters comprising the debate, including the standalone article “Ensembles in Wonderland”, the final letter to SP Forum in the debate is reproduced here first to provide hindsight, especially for interpreting “Ensembles in Wonderland”. The bracketed text, e.g., [text], below was added at the time this material was posted on this website to enhance clarity.

###### 3.4.1 Preliminary Material

July 2, 1995 (published in Nov 1995)

To the Editor:

Introduction

This is my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], and carried on by Mr. Neil Gerr through his letters to SP Forum.

In this letter, I supplement my previous remarks aimed at clarifying the precariousness of Hinich’s and Gerr’s position by explaining the link between my argument in favor of the utility of fraction-of-time (FOT) probability and the subject of a plenary lecture delivered at ICASSP ’94. In the process of discussing this link I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition–that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework–simply cannot be defended if argument is to be based on fact and logic.

David J. Thomson’s Transcontinental Waveguide Problem

To illustrate that the stochastic-process conceptual framework is often applied to physical situations where the time-average framework is a more natural choice, I have chosen an example from D. J. Thomson’s recent plenary lecture on the project that gave birth to the multiple-window method of spectral analysis [2]. The project that was initiated back in the mid-1960s was to study the feasibility of a transcontinental millimeter waveguide for a telecommunications transmission system potentially targeted for introduction in the mid-1980s. It was found that accumulated attenuation of a signal propagating along a circular waveguide was directly dependent on the spectrum of the series, indexed by distance, of the erratic diameters of the waveguide. So, the problem that Thomson tackled was that of estimating the spectrum for the more than 4,000-mile-long distance-series using a relatively small segment of this series that was broken into a number of 30-foot long subsegments. (It would take more than 700,000 such 30-foot sections to span 4,000 miles.) The spectrum had a dynamic range of over 100 dB and contained many periodic components, indicating the unusual challenge faced by Thomson.

When a signal travels down a waveguide (at the speed of light) it encounters the distance-series [consisting of the distances traveled as time progresses]. Because of the constant velocity, the distance-series is equivalent to a time-series. Similarly, the series of diameters that is measured for purposes of analysis is—due to the constant effective velocity of the measurement device—equivalent to a time-series [of measurements]. So, here we have a problem where there is one and only one long time-series of interest (which is equivalent to a distance-series)—there is no ensemble of long series over which average characteristics are of interest and, therefore, there is no obvious reason to introduce the concept of a stochastic process. That is, in the physical problem being investigated, there was no desire to build an ensemble of transcontinental waveguides. Only one (if any at all) was to be built, and it was the spectral density of distance-averaged (time-averaged) power of the single long distance-series (time-series) that was to be estimated, using a relatively short segment, not the spectral density of ensemble-averaged power. Similarly, if one wanted to analytically characterize the average behavior of the spectral density estimate (the estimator mean) it was the average of a sliding estimator over distance (time), not the average over some hypothetical ensemble, that was of interest. Likewise, to characterize the variability of the estimator, it was the distance-average squared deviation of the sliding estimator about its distance-average value (the estimator variance) that was of interest, not the variance over an ensemble. The only apparent reason for introducing a stochastic process model with its associated ensemble, instead of a time-series model, is that one might have been trained to think about spectral analysis of erratic data only in terms of such a conceptual artifice and might, therefore, have been unaware of the fact that one could think in terms of a more suitable alternative that is based entirely on the concept of time averaging over the single time-series. (Although it is true that the time-series segments obtained from multiple 30 ft. sections of waveguide could be thought of as independent random samples from a population, this still does not motivate the concept of an ensemble of infinitely long time-series–a stationary stochastic process. The fact remains that, physically, the 30-foot sections represent subsegments of one long time-series in the communications system concept that was being studied.) [And even if Mr. Thomson was aware of the fact that one could conceptualize the problem entirely in terms of time averages, he had good reason to fear that this approach would be off-putting to his readers all of whom were likely indoctrinated only in statistical spectral analysis theory couched in terms of stochastic processes—an unfortunate situation].

It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process (the model adopted by Thomson) except to accommodate lack of familiarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework. Somehow, he does not recognize the mental gyrations required to force this and other physical problems into the stochastic process framework.

Gerr’s Letter

Having explained the link between my argument in favor of the utility of FOT probability and Thomson’s work, let us return to Gerr’s letter. Mr. Gerr, in discussing what he refers to as “a battle of philosophies,” states that I have erred in likening skeptics to religious fanatics. But in the same paragraph we find him defensively trying to convince his readers that the “statistical/probabilistic paradigm” has not “run out of gas” when no one has even suggested that it has. No one, to my knowledge, is trying to make blanket negative statements about the value of what is obviously a conceptual tool of tremendous importance (probability) and no one is trying to denigrate statistical concepts and methods. It is only being explained that interpreting probability in terms of the fraction-of-time of occurrence of an event is a useful concept in some applications. To argue, as Mr. Gerr does again in the same paragraph, that in general this concept “has no obvious advantages” and using it is “like building a house without power tools: it can certainly be done, but to what end?” is, as I stated in my previous letter, to behave like a religious fanatic — one who believes there can be only One True Religion. This is a very untenable position in scientific research.

As I have also pointed out in my previous letter, Mr. Gerr is not at all careful in his thinking. To illustrate his lack of care, I point out that Gerr’s statement “Professor Gardner has chosen to work within the context of an alternative paradigm [fraction-of-time probability]”, and the implications of this statement in Gerr’s following remarks, completely ignore the facts that I have written entire books and many papers within the stochastic process framework, that I teach this subject to my students, and that I have always extolled its benefits where appropriate. If Mr. Gerr believes in set theory and logic, then he would see that I cannot be “within” paradigm A and also within paradigm B unless A and B are not mutually exclusive. But he insists on making them mutually exclusive, as illustrated in the statement “From my perspective, developing signal processing results using the fraction-of-time approach (and not probability/statistics) … .” (The parenthetical remark in this quotation is part of Mr. Gerr’s statement.) Why does Mr. Gerr continue to deny that the fraction-of-time approach involves both probability and statistics?

Another example of the lack of care in Mr. Gerr’s thinking is the convoluted logic that leads him to conclude “Thus, spectral smoothing of the biperiodogram is to be preferred when little is known of the signal a priori.” As I stated in my previous letter, it is mathematically proven* in [1] that the frequency smoothing and time averaging methods yield approximately the same result. Gerr has given us no basis for arguing that one is superior to the other and yet he continues to try to make such an argument. And what does this have to do with the utility of the fraction-of-time concept anyway? These are data processing methods; they do not belong to one or another conceptual framework.

To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing referred to above was first derived by using the fraction-of-time conceptual framework [1]. If there is no conceptual advantage to this framework, why wasn’t such a fundamental result derived during the half century of research based on stochastic processes that preceded [1]? The second example is taken from the first attempt to develop a theory of higher-order cyclostationarity for the conceptualization and solution of problems in communication system design. In [3], it is shown that a fundamental inquiry into the nature of communication signals subjected to nonlinear transformations led naturally to the fraction-of-time probability concept and to a derivation of the cumulant as the solution to a practically motivated problem. This is, to my knowledge, the first derivation of the cumulant. In all other work, which is based on stochastic processes (or non-fraction-of-time probability) and which dates back to the turn of the century, cumulants are defined, by analogy with moments, to be coefficients in an infinite series expansion of a transformation of the probability density function (the characteristic function), which has some useful properties. If there is no conceptual advantage to the fraction-of-time framework, why wasn’t the cumulant derived as the solution to the above-mentioned practical problem or some other practical problem using the orthodox stochastic-probability framework?

Conclusion

Since no one in the preceding year has entered the debate to indicate that they have new arguments for or against the philosophy and corresponding theory and methodology presented in [1], it seems fair to proclaim the debate closed. The readers may decide for themselves whether the resolution put forth in [1] was defeated or was upheld.

But regarding the skeptics, I sign off with a humorous anecdote:

When Mr. Fulton first showed off his new invention, the steamboat, skeptics were crowded on the bank, yelling ‘It’ll never start, it’ll never start.’

It did. It got going with a lot of clanking and groaning and, as it made its way down the river, the skeptics were quiet.

For one minute.

Then they started shouting. ‘It’ll never stop, it’ll never stop.’

— William A. Gardner

* A more detailed and tutorial proof of this fundamental equivalence is given in the article “The history and the equivalence of two methods of spectral analysis,” Signal Processing Magazine, July 1996, No.4, pp.20 – 23, which is copied into the Appendix farther down this Page.

References

1. W. A. Gardner. Statistical Spectral Analysis: A Nonprobabilistic Theory. Prentice-Hall, Englewood Cliffs, NJ, 1987.
2. D. J. Thomson. “An Overview of Multiple-window and quadratic-inverse spectrum estimation methods,” Plenary Lecture, Proceedings of 1994 International Conference on Acoustics, Speech and Signal Processing, pp. VI-185 – VI-194.
3. W. A. Gardner and C. M. Spooner. “The Cumulant Theory of Cyclostationary time-series, Part I: Foundation,” IEEE Transactions on Signal Processing, Vol. 42, December 1994, pp. 3387-3408.

Excerpts from earlier versions of above letter to the editor before it was condensed for publication:

April 15, 1995

Introduction

In this, my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], I shall begin by addressing two remarks in the opening paragraph of Mr. Neil Gerr’s last letter (in March 1995 SP Forum). In the first remark, Mr. Gerr suggests that the “bumps and bruises” he sustained by venturing into the “battle” [debate] were to be expected. But I think that such injuries could have been avoided if he had all the relevant information at hand before deciding to enter the debate. This reminds me of a story I recently heard:

Georgios and Melvin liked to hunt. Hearing about the big moose up north, they went to the wilds of Canada to hunt. They had hunted for a week, and each had bagged a huge moose. When their pilot Neil landed on the lake to take them out of the wilderness, he saw their gear and the two moose. He said, “I can’t fly out of here with you, your gear, and both moose.”

“Because the load will be too heavy. The plane won’t be able to take off.”

They argued for a few minutes, and then Melvin said, “I don’t understand. Last year, each of us had a moose, and the pilot loaded everything.”

“Well,” said Neil, “I guess if you did it last year, I can do it too.”

So, they loaded the plane. It moved slowly across the lake and rose toward the mountain ahead. Alas, it was too heavy and crashed into the mountain side. No one was seriously hurt and, as they crawled out of the wreckage in a daze, the bumped and bruised Neil asked, “Where are we?”

Melvin and Georgios surveyed the scene and answered, “Oh, about a mile farther than we got last year.”

If Mr. Gerr had read the book [1] and put forth an appropriate level of effort to understand what it was telling him, he would have questioned Mr. Hinich’s book review and would have seen that the course he was about to steer together with the excess baggage he was about to take on made a crash inevitable.

A friend of mine recently offered me some advice regarding my participation in this debate. “Why challenge the status quo”, he said, “when everybody seems happy with the way things are.” My feeling about this is summed up in the following anecdote:

“Many years ago, a large American shoe manufacturer sent two sales reps out to different parts of the Australian outback to see if they could drum up some business among the aborigines. Sometime later, the company received telegrams from both agents.

The first one said. ‘No business. Natives don’t wear shoes.’

The second one said, ‘Great opportunity here–natives don’t wear shoes.'”

Another friend asked “why spend your time on this [debate] when you could be solving important problems.” I think Albert Einstein answered that question when he wrote:

“The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science”

This underscores my belief that we are overemphasizing “engineering training” in our university curricula at the expense of “engineering science.” It is this belief that motivates my participation in this debate. Instead of plodding along in our research and teaching with the same old stochastic process model for every problem involving time-series data, we should be looking for new ways to think about time-series analysis.

In the second remark in Mr. Gerr’s opening paragraph, regarding my response to Mr. Gerr’s October 1994 SP Forum letter in sympathy with “Hinich’s gleefully vicious no-holds-barred review” of [1], Mr. Gerr says “Even by New York standards, it [my response] seemed a bit much.” Well, I guess I was thinking about what John Hancock said, on boldly signing the Declaration of Independence:

There, I guess King George will be able to read that!

Like the King of England who turned a deaf ear to the messages coming from the new world, orthodox statisticians, like Messrs. Hinich and Gerr who are mired in tradition seem to be hard of hearing–a little shouting might be needed to get through to them.

Nevertheless, I am disappointed to see no apparent progress, on Mr. Gerr’s part, in understanding the technical issues involved in his and Hinich’s unsupportable position that the time-average framework for statistical signal processing has, and I quote Gerr’s most recent letter, “no obvious advantages.” I hasten to point out, however, that this most recent position is a giant step back from the earlier even more indefensible position taken by Hinich in his book review, reprinted in April 1994 SP Forum, where much more derogatory language was used.

In this letter, I make a final attempt to clarify the precariousness of Hinich’s and Gerr’s position by explaining links between my arguments and the subjects of two plenary lectures delivered at ICASSP ’94. In the process of discussing these links and this paper, I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition, that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework, simply cannot be defended if argument is to be based on fact and logic.

I wish that Mr. Gerr would let go of the fantasy about “the field where the Fraction-of-Timers and Statisticians do battle.” There do not exist two mutually exclusive groups of people—one of which can think only in terms of fraction-of-time probability and the other of which call themselves Statisticians. How many times and in how many ways does this have to be said before Mr. Gerr will realize that some people are capable of using both fraction-of-time probability and stochastic process concepts, and of making choices between these alternatives by assessing the appropriateness of each for each particular application? Mr. Gerr’s “battle” of “fraction-of-time versus probability/statistics” simply does not exist. This insistence on a dichotomy of thought is strongly reminiscent of the difficulties some people have had accepting the proposition that the concept of fuzziness is a useful alternative to the concept of probability. The vehement protests against fuzziness are for most of us now almost laughable.

To quote Professor Lotfi Zadeh in his recent plenary lecture [2]

“[although fuzzy logic] offers an enhanced ability to model real-world phenomena…[and] eventually fuzzy logic will pervade most scientific theories…the successes of fuzzy logic have also generated a skeptical and sometimes hostile reaction…Most of the criticisms directed at fuzzy logic are rooted in a misunderstanding of what it is and/or a lack of familiarity with it.”

I would not suggest that the time-average approach to probabilistic modeling and statistical inference is as deep a concept, as large a departure from orthodox thinking, or as broadly applicable as is fuzzy logic, but there are some definite parallels, and Professor Zadeh’s explanation of the roots of criticism of fuzzy logic applies equally well to the roots of criticism of the time-average approach as an alternative to the ensemble-average or, more accurately, the stochastic-process approach. In the case of fuzzy logic, its proponents are not saying that one must choose either conventional logic and conventional set theory or their fuzzy counterparts as two mutually exclusive alternative truths. Each has its own place in the world. Those opponents who argue vehemently that the unorthodox alternative is worthless can be likened to religious fanatics. This kind of intolerance should have no place in science. But it is all too commonplace and it has been so down through the history of science. So surely, one cannot expect to find its absence in connection with the time-average approach to probabilistic modeling and statistical inference. Even though experimentalists in time-series analysis (including communication systems analysis and other engineered-systems analysis) have been using the time-average approach (to various extents) for more than half a century, there are those like Gerr and Hinich who “see no obvious advantages.” This seems to imply that Mr. Gerr has one and only one interpretation of a time-average measurement on time series data—namely an estimate of some random variable in an abstract stochastic process model. To claim that this mathematical model is, in all circumstances, the preferred one is just plain silly.

David J. Thomson and the Transcontinental Waveguide –addition to published discussion:

[It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process except to accommodate unfamiliarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework.] It is correct in this case that a sufficiently capable person would obtain the same result using either framework, but it is incorrect to not recognize the mental gyrations required to force this physical problem into the stochastic process framework. My claim—and the reason I wrote the book [1]—is that our students deserve to be made aware of the fact that there are two alternatives. It is pigheaded to hide this from our students and force them to go through the unnecessary and sometimes confusing mental gyrations required to force-fit the stochastic process framework to real-world problems where it is truly an unnecessary and, possibly, even inappropriate artifice.

To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing, whose proof is outlined in the Appendix at the end of this letter, was first derived by using the fraction-of-time conceptual framework [1].

An Illustration of Blinding Prejudice

To further illustrate the extent to which Mr. Gerr’s prejudiced approach to scientific inquiry has blinded him, I have chosen one of his research papers on the subject of cyclostationary stochastic processes. In [5], Mr. Gerr (and his coauthor) tackle the problem of detecting the presence of cyclostationarity in an observed time-series. He includes an introduction and references sprinkled throughout that tie his work to great probabilists, statisticians, and mathematicians. (We might think of these as the “Saints” in Mr. Gerr’s One True Religion.) This is strange, since his paper is nothing more than an illustration of the application of a known statistical test (and a minor variation thereof) to synthetic data. It is even more strange that he fails to properly reference work that is far more relevant to the problem of cyclostationarity detection. But I think we can see that there is no mystery here. The highly relevant work that is not cited is authored by someone who champions the value of fraction-of-time probabilistic concepts. The fact that the relevant publications (known to Gerr) actually use the stochastic process framework apparently does not remove Mr. Gerr’s blinders. All he can see–it would seem–is that the author is known to argue (elsewhere) that the stochastic process framework is not always the most appropriate one for time-series analysis, and this is enough justification for Mr. Gerr to ignore the highly relevant work by this “heretic” author (author of the book [1] that Hinich all but said should be burned).

To be specific, Mr. Gerr completely ignores the paper [6] (published 1-1/2 years prior to the submission of Gerr’s paper) and the book [7] (published 4 years prior) wherein the problem of cyclostationarity detection is tackled using maximum-likelihood [6], maximum-signal-to-noise ratio [6], [7], and other optimality criteria, all of which lead to detection statistics that involve smoothed biperiodograms (and that also identify optimal smoothing) which are treated by Gerr as if they were ad hoc. Mr. Gerr also cites a 1990 publication (which does not appear in his reference list) that purportedly shows that the integrated biperiodogram (cyclic periodogram) equals the cyclic mean square value of the data (cf. (12)); but this is a special case of the much more useful result, derived much earlier than 1990, that the inverse Fourier transform of the cyclic periodogram equals the cyclic correlogram. The argument, by example, that Gerr proffers to show that (12) (the cyclic correlogram at zero lag) is sometimes a good test statistic and sometimes a bad one is trivialized by this Fourier transform relation (cf. [1]) and the numerous mathematical models for data for which the idealized quantities (cyclic autocorrelations, and cyclic spectral densities) in this relation have been explicitly calculated (cf. [1], [7]). These models include, as special cases, the examples that Gerr discusses superficially. The results in [1], [7] show clearly when and why the choice of zero lag made by Gerr in (12) is a poor choice. As another example, consider Mr. Gerr’s offhand remark that a Mr. Robert Lund (no reference cited) “has recently shown that for the current example (an AM signal with a square wave carrier) only lines [corresponding to cycle frequencies] spaced at even multiples of d=8 [the reciprocal of the period of the carrier] will have nonzero spectral (rz) measure.” This result was established in a more general form many years earlier in his coauthor’s Ph.D. dissertation (as well as in [1]) where one need only apply the extremely well-known fact that a symmetrical square wave contains only odd harmonics.

To go on, the coherence statistic that Gerr borrows from Goodman for application to cyclostationary processes has been shown in [7] to be nothing more than the standard sample statistic for the standard coherence function (a function of a single frequency variable) for two processes obtained from the one process of interest by frequency-shifting data transformations–except for one minor modification; namely, that time-averaged values of expected values are used in place of non-averaged expected values in the definition of coherence because the processes are asymptotically mean stationary, rather than stationary. Therefore, the well-known issues regarding frequency smoothing in these cross-spectrum statistics need not be discussed further, particularly in the haphazard way this is done by Gerr, with no reliance on analysis of specific underlying stochastic process models.

Continuing, the incoherent average (13) proposed by Gerr for use with the coherence statistic is the only novel contribution of this paper, and I claim that it is a poor statistic. The examples used by Gerr show that this “incoherent statistic” outperforms the “coherent statistic,” but what he does not recognize is that he chose the wrong coherent statistic for comparison. He chose the cyclic correlogram with zero lag (12), which is known to be a poor choice for his examples. For his example in Figure 9, zero lag produces a useless statistic, whereas a lag equal to T/2 is known to be optimum, and produces a “coherent statistic” that is superior to Gerr’s incoherent statistic. Thus, previous work [1], [7] suggests that a superior alternative to Gerr’s incoherent statistic is the maximum over a set of lag-indexed coherent statistics.

Finally, Mr. Gerr’s vague remarks about choosing the frequency-smoothing window-width parameter M are like stabs in the dark by comparison with the thorough and careful mathematical analysis carried out within–guess what–the time-average conceptual framework in [1] in which the exact mathematical dependence of bias and variance of smoothed biperiodograms on the data-tapering window shape, the spectral-smoothing window shape, and the ideal spectral correlation function for the data model are derived, and in which the equivalence between spectral correlation measurement and conventional cross-spectrum measurement is exploited to show how conventional wisdom [1, chapter 5, 7] applies to spectral correlation measurement [1,chapters 11, 13, 15].

In summary, Gerr’s paper is completely trivialized by previously published work of which he was fully aware. What appears to be his choice to “stick his head in the sand” because the author of much of this earlier highly relevant work was not a member of his One True Religion exemplifies what Gerr is trying to deny. Thus, I repeat it is indeed appropriate to liken those (including Gerr) who Gerr would like to call skeptics to religious fanatics who are blinded by their faith.

Conclusion

In closing this letter, I would like to request that Mr. Gerr refrain from writing letters to the editor on this subject. To say, as he does in his last letter, “There are many points on which Professor Gardner and I disagree, but only two that are worthy of further discussion,” is to try to worm his way out of the debate without admitting defeat. I claim to have used careful reasoning to refute beyond all reasonable doubt every point Mr. Gerr (and Mr. Hinich) has attempted to make. Since he has shown that he cannot provide convincing arguments based on fact and logic to support his position, he should consider the debate closed. To sum up the debate:

– The resolution, cited in the introductory section of my 2 July 1995 letter to the editor, in contrapositive form, was made by myself in [1].

– The resolution was challenged by Hinich and defended by myself in April 1994 SP Forum.

– Hinich’s challenge was supported and my defense was challenged by Gerr in October 1994 SP Forum.

– Gerr’s arguments were challenged by myself in January 1995 SP Forum.

– Gerr defended his arguments in March 1995 SP Forum.

– Gerr’s presumably-final defense was challenged and the final arguments in support of the resolution are made by myself in this letter.

APPENDIX from July 2, 1995 letter to Editor (published in Nov 1995)

– Proof of Equivalence Between Time-Averaged and Frequency-Smoothed Cyclic Periodograms

History and Equivalence of Two Methods of Spectral Analysis

Published in IEEE SIGNAL PROCESSING MAGAZINE, July 1996

The purpose of this article is to present a brief history of two methods of spectral analysis and to present, in a tutorial fashion, the derivation of the deterministic relationship that exists between these two methods

History

Two of the oldest and currently most popular methods of measuring statistical (average) power spectral densities (PSD’s) are the frequency smoothing method (FSM) and the time averaging method (TAM). The FSM was thought to have originated in 1930 with Norbert Wiener’s work on generalized harmonic analysis [1], and to have been rediscovered in 1946 by Percy John Daniell [2]. But it was discovered only a few years ago (cf. [3]) that Albert Einstein had introduced the method in 1914 [4]. The currently popular method of deriving the FSM begins by showing that adjacent frequency bins in the periodogram have approximately the same correct mean values and the same large variances, and are approximately uncorrelated with each other. Then, it is observed that averaging these bins together retains the correct mean value, while reducing the variance.

The TAM is often attributed to a 1967 paper by P.D. Welch in the IEEE Transactions on Audio and Electroacoustics [5], but in fact the earliest known proposal of the TAM was by Maurice Stevenson Bartlett in 1948 [6]. The reasoning behind the TAM is similar to that for the FSM: the periodograms on adjacent segments of a data record have approximately the same correct mean values and the same large variances, and they are approximately uncorrelated with each other. Therefore, averaging them together will retain the correct mean value, while reducing the variance. (A more detailed historical account of the FSM, TAM, and other methods is given in [7].) Essentially, every spectral analysis software package available today includes either the FSM or the TAM, or both, often in addition to others. These other methods include, for example, the Fourier transformed tapered autocorrelation method, attributed to Ralph Beebe Blackman and John Wilder Tukey [8] (but used as early as 1898 by Albert A. Michelson [9]); and various model fitting methods that grew out of pioneering work by George Udny Yule in 1927 [10] and Gilbert Walker in 1931 [11].

It is well known that both the FSM and the TAM yield PSD estimates that can be made to converge to the exact PSD in some probabilistic sense, like in mean square as the length of the data record processed approaches infinity, However, it is much less commonly known that these two methods are much more directly related to each other. The pioneering methods due to Michelson, Einstein, Wiener, Yule, and Walker were all introduced without knowledge of the concept of a stochastic process. But starting in the 1950s (based on the work of mathematicians such as Khinchin, Wold, Kolmogorov, and Cramér in the 1930s and 1940s , the stochastic-process point of view essentially took over. It appears as though this mathematical formalism, in which analysts focus on calculating means and variances and other probabilistic measures of performance, delayed the discovery of the deterministic relationship between the FSM and TAM for about 40 years. That is, apparently it was not until the non-stochastic approach to understanding statistical (averaged) spectral analysis was revived and more fully developed in [7] that a deterministic relationship between these two fundamental methods was derived.

The next section presents, in a tutorial fashion, the derivation of the deterministic relationship between the FSM and TAM, but generalized from frequency-smoothed and time-averaged versions of the periodogram to same for the biperiodogram (also called the cyclic periodogram [7]). This deterministic relationship is actually an approximation of the time-averaged biperiodogram (TAB) by the frequency-smoothed biperiodogram (FSB) and, of course, vice versa. For evidence of the limited extent to which this deterministic relationship is known, the reader is referred to letters that have appeared in the SP Forum section of this magazine in the October 1994, January 1995, March 1995, and November 1995 issues.

Equivalence

Definitions

Let be a data-tapering window satisfying for , let be its autocorrelation

and let be its Fourier transform

Let be the sliding (in time ) complex spectrum of data seen through window

Similarly, let be a rectangular window of width , centered at the origin, and let be the corresponding sliding complex spectrum (without tapering). Also, let be the sliding cyclic correlogram for the tapered data

and let be the sliding cyclic correlogram without tapering

To complete the definitions, let and be the sliding biperiodograms (or cyclic periodograms) for the data

Derivation

It can be shown (using ) that (cf. [7, Chapter 11])

The above approximation, namely

for , becomes more accurate as the inequality grows in strength (assuming that there are no outliers in the data near the edges of the -length segment, cf. exercise 1 in [7, Chapt. 3] exercise 4b in [7, Chapt. 5], and Section B in [7, Chapt. 11]). For example, if the data is bounded by , , and , then it can be shown that the error in this approximation is worst-case bounded by . The first and last equalities above are simply applications of the cyclic-periodogram/cyclic-correlogram relation first established in [7, Chapter 11] together with the convolution theorem (which is used in the last equality).

Interpretation

The left-most member of the above string of equalities (and an approximation) is a biperiodogram of tapered data seen through a sliding window of length and time-averaged over a window of length . If this average is discretized, then we are averaging a finite number of biperiodograms of overlapping subsegments over the -length data record. (It is fairly well known that little is gained – although nothing but computational efficiency is lost – by overlapping segments more than about 50 percent.) The right-most member of the above string is a biperiodogram of un-tapered data seen through a window of length and frequency-smoothed along the anti-diagonal , using a smoothing window , for each fixed diagonal . Therefore, given a -length segment of data, one obtains approximately the same result, whether one averages biperiodograms on subsegments (TAM) or frequency smoothes one biperiodogram on the undivided segment (FSM). Given , the choice of determines both the width of the frequency smoothing windows in FSM and the length of the subsegments in TAM. Given and choosing , one can choose either of these two methods and obtain approximately the same result (barring outliers within of the edges of the data segment of length . By choosing (i.e., ), we see the biperiodograms reduce to the more common periodograms, and the equivalence then applies to methods of estimation of power spectral densities, rather than bispectra. Bispectra are also called cyclic spectral densities and spectral correlation functions [7]. As first proved in [7], the FSM and TAM spectral correlation measurements converge to exactly the same quantity, namely, the limit spectral correlation function (when it exists), in the limit as and , in this order. Further this limit spectral correlation function, also called the limit cyclic spectral density, is equal to the Fourier transform of the limit cyclic autocorrelation, as first proved in [7], where this relation is called the cyclic Wiener relation because it generalizes the Wiener relation between the PSD and autocorrelation from to

where

with .

In the special circumstance where the inequality cannot be satisfied because of the degree of spectral resolution (smallness of , that is required, there is no known general and provable argument that either method is superior to the other. It has been argued [e.g., by Gerr] that, since the TAM involves time averaging, it is less appropriate than the FSM for nonstationary data. The results presented here, however, show that, for , neither the TAM nor the FSM is more appropriate than the other for nonstationary data. And, when is not satisfied, there is no known evidence that favors either method for nonstationary data.

The derivation of the approximation between the FSM and TAM presented here uses a continuous-time model. However, a completely analogous derivation of an approximation between the discrete-time FSM and TAM is easily constructed. When the spectral correlation function is being measured for many values of the frequency-separation parameter, , the TAM, modified to what is called the FFT accumulation method (FAM), is much more computationally efficient than the FSM implemented with an FFT [12].

William A. Gardner
Professor, Department of Electrical and Computer Engineering
University of California,
Davis, CA.

References

1. Wiener, N., “Generalized harmonic analysis,” Acta Mathematika, Vol. 55, pp. 117-258, 1930.
2. Daniell, P. J., “Discussion of ‘On the theoretical specification and sampling properties of autocorrelated time-series’,” J Royal Statistic. Soc., Vol. 8B, No. 1, pp 27-97, 1946.
3. Gardner, W. A., “Introduction to Einstein’s contribution to time-series analysis,” IEEE Signal Processing Magazine, Vol. 4, pp. 4-5, 1987.
4. Einstein, A., “Méthode pour la détermination de valeurs statistiques d’observations concernant des grandeurs sourmises à des fluctuations irrégulières,” Archives des Sciences Physiques et Naturelles, Vol. 37, pp. 254-256, 1914.
5. Welch, P. D., “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Transactions on Audio and Electroacoustics, Vol. AU-15, pp. 70-73, 1967.
6. Bartlett, M. S., “Smoothing periodograms from time-series with continuous spectra,” Nature, Vol. 161, pp. 686-687, 1948.
7. Gardner, W. A., Statistical Spectral Analysis: A Nonprobabilistic Theory. Englewood Cliffs, NJ: Prentice-Hall, 1987.
8. Blackman, R. B. and J. W. Tukey, The Measurement of Power Spectra, New York: AT&T, 1958 (Also New York: Dover, 1959).
9. Michelson, A. A. and S. W. Stratton, “A new harmonic analyzer,” American Journal of Science, Vol. 5, pp. 1-13, 1898.
10. Yule, G. U., “On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers,” Phil. Trans. Royal Soc: London A, Vol. 226, pp. 267-298, 1927.
11. Walker, G., “On periodicity in series of related terms,” Proceedings of the Royal Society, Vol. 131, pp. 518-532, 1931.
12. Roberts, R. S., W. A. Brown, and H. H. Loomis, Jr., “Computationally efficient algorithms for cyclic spectral analysis,” IEEE Signal Processing Magazine, Vol. 8, pp. 38-49, 1991.
###### 3.4.2: The debate

This section is comprised of the following letters to the editor of IEEE Signal Processing Magazine:

• 3.5 Measure Theoretic Discussion of Fraction-of-Time vs Stochastic Probability

Fraction-of-Time (FOT) Probability for time-series data is mathematically sound yet considerably less abstract than the more established or orthodox Stochastic-Process (SP) Probability. Consequently, FOT probability is pedagogically superior to SP probability. Yet, because FOT probability is still considered by many to be a heterodox theory, it is greeted with suspicion by those who find comfort in orthodoxy. So, it should come as no surprise that college professors in schools of engineering have for the most part not yet adopted this superior alternative in their curricula. But, for the broad class of time series that are accurately modeled as probabilistically stationary or cyclostationary or poly-cyclostationary, or almost cyclostationary, there is more to lose than there is to gain by sticking with tradition—a young 50-year-old tradition that appears to be simply an accident of history. Although the SP approach enjoys the benefit of being applicable to generally nonstationary processes, there are serious conceptual disadvantages to this orthodox approach in comparison with the heterodox FOT alternative when the focus is on single-records of stationary time series and nonstationary time series that exhibit any of the three forms of cyclostationarity.

The following material consists of excerpts from the narrative portions of a feature article (Napolitano and Gardner 2021) that reviews the current state of comparison of the two alternative approaches to modeling random processes, FOT Probability and SP Probability. The basis for this review is a mathematically sound measure-theoretic analysis first developed in 2006 (Leśkow, J., and A. Napolitano. 2006). The term random processes is used here to mean time-series data modeled with either FOT probability or SP probability.

###### 3.5.1 Overview of Review Article on Fraction-of-Time Probability

Time series arising from measurements in many fields of physics, engineering, chemistry, biology, and econometrics, are commonly modeled as sample paths from an ensemble which, together with a probability measure, is called a stochastic process. Stationarity and ergodicity assumptions about this model are generally made for analytical convenience and mathematical tractability of the model. In the article (Napolitano and Gardner 2021), it is shown that a dichotomy, which can be very misleading in practice, exists between the properties of a stochastic process and those of its individual sample paths. This dichotomy can be eliminated by adopting the fraction-of-time (FOT) probability approach reviewed in (Napolitano and Gardner 2021), where a probabilistic model is constructed from a single time series without introducing the abstraction of the stochastic process. Two FOT-probability models are reviewed. The first considers probabilistic functions that do not depend on time and employs the relative measure on the real line as a probability measure and the time average as an expectation operator. The second considers periodic, poly-periodic, and almost periodic probabilistic functions and employs the operator that extracts the finite-strength additive sine-wave components of its argument as an expectation operator. This latter model is appropriate for describing time series originating from phenomena involving a combination of periodic and random phenomena. Such time series were named by their originators cyclostationary, poly-cyclostationary, and almost cyclostationary signals.

###### 3.5.2 Introduction and Historical Perspective on Fraction-of-Time Probability

In many fields of physics, engineering, chemistry, biology, and econometrics, randomness in time-series measurements and observations on phenomena being studied has typically been modeled by resorting to the abstract concept of a stochastic process. That is, an empirical time series is modeled as a representative (sample path or realization) of an ensemble of time series with “similar characteristics,” together with a probability measure defined on the set of ensemble members, namely, a stochastic process. A desirable property of the model is that the probabilistic functions comprising the stochastic process be estimable by measurements made on the single empirical time series.

That is, the “similar characteristics” of the sample paths should be such that the properties and, in particular, the probabilistic functions of the whole ensemble such as mean, covariance, and amplitude distribution, can be inferred by measurements made on any one of the sample paths, with the exception (an abstract mathematical detail) of a set of sample paths that occurs with zero probability. This desirable property is exactly the ergodicity property in classical stochastic process theory. This provides the all-important tie for signal processing applications between the model and that which can be measured. However, it is shown in this review article that this tie is not as strong a tie as one would like for empirical signal processing purposes.

The ergodicity concept was first treated mathematically in 1931 by Birkhoff (Birkhoff 1931) and in 1932 by von Neumann (von Neumann 1932) with reference to dynamical systems. They established conditions under which averaging, at a single time instant, a function of the variables of what was called the phase space across an ensemble of different copies of the same system is equivalent to averaging over time for a single system.

Subsequently, ergodicity has played a key theoretical role in multiple fields (Ashley 2015) and has been the subject of much thought and discussion. As examples, see the recent treatments in the fields of economics (Doctor, Wakker, and Wang 2020), atomic physics (Kindermann et al. 2017), and condensed matter systems (Mauro, Gupta, and Loucks 2007). Theoretical research on ergodicity has continued for nearly a century; see, for example, the more recent work by Boyles and Gardner (Boyles and Gardner 1983), Shields (Shields 1987), Katznelson and Weiss (Katznelson and Weiss 1982), and Gray (Gray 2009), and the latest work by the Authors reported in this review.

The adoption of the stochastic process model requires a substantial abstraction, the hypothesized mathematical existence of the ensemble which, in various circumstances, creates serious conceptual problems. In fact, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In such cases, knowing only a statistical (more appropriately, a probabilistic) theory of ensembles of data records (stochastic processes) is an impediment to intuitive real-world understanding of the principles of statistical analysis of single records of time-series data, e.g., statistical spectral analysis. A key drawback of the stochastic process model is that this model tells one nothing at all about a single record, as elucidated in this review. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. That is, it is not a theory of measurements such as finite-time averages on single times series, which are called statistics; rather, it is a probabilistic theory about hypothetical ensembles of time series. For stochastic processes, mixing (a type of asymptotic independence of time-samples with increasing separation in time) assumptions are typically used as sufficient conditions for ergodicity. But such conditions often cannot be verified for the adopted mathematical models; generally, they are simply assumed to hold on the basis of little more than faith. It is explained in this review that properties of probabilistic functions defined in terms of ensemble averages in most cases do not correspond to similar properties of analogous functions defined in terms of time averages on single sample paths, even if the stochastic process is ergodic. Consequently, there is a dichotomy between the properties of a stochastic process–the model–and the properties of its individual sample paths–the physical signals. The idea that averaging over an ensemble does not necessarily corresponds to averaging over time–the lack of ergodicity of some stochastic process models–is receiving increasing attention in various fields of study (Editorial 2019). Furthermore, it is explained in the following sections that the critical nature of some assumptions made in the classical probability theory of stochastic processes was already pointed out by Kolmogorov in his seminal work 90 years ago (Kolmogorov 1933, 15). An attempt to describe randomness without introducing an abstract sample space and probability was made by Kolmogorov himself by introducing the concept of complexity for a single random sequence. Such an alternative approach, however, did not enjoy the same success probability theory did.

Moreover, unlike the straightforward easy-to-specify relative measure required to define a fraction-of-time probability, the probability measure required to define a stochastic process in the classical sense–which is a requirement for determining ergodicity of a specific stochastic process model–is an abstract entity that rarely gets specified in either practical or theoretical work on stochastic processes. Consequently, strict ergodicity is essentially never determined for specific stochastic processes. It is little more than a pacifier: the ergodicity condition is simply assumed to be satisfied except in a few cases for which it is easy to demonstrate that it is not satisfied. Examples are stochastic processes that are 1) a random constant (time invariant), 2) a sine wave with a random constant amplitude, and 3) a dynamical system defined by the evolution of its states throughout some state space in cases for which it can be shown that the system can, for some but not all ensemble members, at some point in time, get stuck for all future time inside a subset of state space, never again visiting any part of the rest of state space.

Motivated by these drawbacks of the abstract stochastic process model of time series data has led researchers to investigate more concrete models as an alternative. These models are based on properties of time averages of individual signals, instead of averages over hypothetical ensembles of signals. In this paper, this alternative approach to time-series analysis is reviewed, and new results on the concrete benefits of this alternative are reported. This approach is based on what is called fraction-of-time (FOT) probability, which was first introduced in earnest in 1987, with a comprehensive book (W. A. Gardner 1987b) devoted exclusively to this theme by one of the Authors, William Gardner, and the duality between these two alternative models was a theme of the earlier book (W. A. Gardner 1985) by Gardner. But, important advances in the theoretical underpinnings of the FOT-probability theory of signals have been made quite recently.

In this approach, starting from a single function of time, a valid distribution function and all other familiar probabilistic parameters such as means, moments, and cumulants are constructed. That is, formulas for calculating these functions/parameters directly from time series are developed. The approach can be put in a rigorous measure-theoretic framework built on the conceptual foundation of relative measure introduced by Kac and Steinhaus (Kac 1959), (Kac and Steinhaus 1938). Developments were presented in (Urbanik 1958), (Fukuyama 1988). More recently, the concept was revisited by one of the Authors, Antonio Napolitano, and Leśkow in (Leśkow and Napolitano 2006). Very recent (unpublished) work by Gardner on generalizations of ergodicity to cycloergodicity that ameliorate some of the drawbacks of the standard assumptions of stationarity and ergodicity are included in this review.

A precursor of the spectral analysis of a single time-series is due to Einstein in his work reported in 1914 (Einstein 1987) (see (W. A. Gardner 1987a) for Gardner’s introduction to and technical commentary on this work and the comments of Yaglom in (Yaglom 1987)).

The FOT approach has its roots in Wiener’s work on generalized harmonic analysis (GHA) for single time series (Wiener 1930) (Wiener 1933Chap. 4), (Wiener 1949, 37–45). Wiener’s GHA approach was followed by some of his students (Lee 1967Chap. 2) and a few authors: Bochner, Hartman, Jessen, Kershner, and Wintner (Bochner and Jessen 1934), (Jessen and Wintner 1935), (Kershner and Wintner 1936), (Wintner 1932), Haviland (Haviland 1933), Bass and his students (Bass 1959), (Bass 1971, vol. III, pt. 5, chap. 1-3), (Bertrandias 1966), (Hien 1975), Furstenberg (Furstenberg 1960), Hofstetter (Hofstetter 1964), Benedetto (Benedetto 1996Sec. 2.9), Mäkilä (Mäkilä 2004), and Nobel (Nobel 2004). Results have been presented in spectral analysis (Casinovi 2007), (Pfaffelhuber 1975), and for the prediction of individual sequences (Michaeli, Pohl, and Eldar 2011). Ziv developed a coding theory for individual sequences (Ziv 1978). The definition of jointly relatively measurable functions (Leśkow and Napolitano 2006), and the issues of quantile prediction (Leśkow and Napolitano 2002), central limit theorem (Dehay, Leśkow, and Napolitano 2013), and time average estimation (Dehay, Leśkow, and Napolitano 2018) were addressed by Dehay, Leśkow, and Napolitano.

Roughly speaking, in the above-mentioned papers, an underlying stationary model is assumed. In fact, the FOT expectation operator is the infinite time average and the probabilistic functions derived therefrom are time invariant.

The extension of the FOT approach and GHA to periodic phenomena was made by one individual, Gardner in (W. A. Gardner 1987bPart II), (W. A. Gardner and Brown 1991). Periodic phenomena are generated by the interaction of periodic mechanisms and random phenomena. The results are processes that are not periodic but whose probabilistic functions vary periodically with time. These signals have been referred to as cyclostationary or poly-cyclostationary if, respectively, only one periodicity or a finite number of incommensurate periodicities are present in the probabilistic functions. If the probabilistic functions are almost-periodic functions of time (which can generally be represented by infinite numbers of incommensurate periods), the signals have been named almost-cyclostationary (W. A. Gardner 1987bPart II) and provide an alternative to almost-cyclostationary stochastic processes.

The extension of the FOT approach and GHA presented in (W. A. Gardner 1987bPart II), (W. A. Gardner and Brown 1991) is of great importance in practice due to the ubiquity of science data generated by periodic phenomena. In communications, radar, sonar, and telemetry, periodicities in the statistical functions arise from the modulation by random data of sinusoidal carriers or periodic pulse trains (W. A. Gardner 1987bChap. 12), (Napolitano 2019Chap. 7). In vibro-acoustic signals collected from mechanical machinery, periodicities in the statistics are due to rotations of gears, belts, and bearings (Napolitano 2019Sec. 10.6). In econometrics, the weekly opening and closing of markets and the seasonal supply and demand of products give rise to periodicities in the statistical functions of prices and exchange rates (Napolitano 2019Sec. 10.7). In radio astronomy, periodicities are due to the revolution and rotation of planets and pulsation of stars; and in human biological signals, by the heart pulsation and other biological rhythms. Periodicities are present in genome sequences, diffusion processes of molecular dynamics, and signals encountered in neuroscience (see (Napolitano 2019Chap. 10) and references therein, and also the results of an internet search producing 135,000 published research papers on cyclostationarity across essentially all fields of science and engineering (William A. Gardner 2018b)).

The extension of FOT and GHA to periodic phenomena made in (W. A. Gardner 1987bPart II), (W. A. Gardner and Brown 1991) is not obvious since Gardner’s periodically or almost-periodically time-variant distribution is constructed from a single function of time by recognizing the non-obvious fact that the operator that extracts all the finite-strength additive sine-wave components of its argument is a valid expectation operator. Thus, when such an operator is applied to the indicator of the set of time instants where a function is below a threshold, it provides a valid periodically or almost-periodically time-variant cumulative distribution function. This definition of temporal expectation may, at first glance, seem arbitrary; however, it produces the unique dual to the periodically or almost-periodically time-variant cumulative distribution functions of classically defined stochastic processes. It also is a generalization of the time-average operation which is the constant-component extraction operator.

The extension of FOT and GHA to periodic phenomena overcomes the need to invoke cycloergodicity (Boyles and Gardner 1983), that is, the hypothesized and partially realized extension of the ergodicity property to the problem of estimation of statistical functions of stochastic processes with almost-periodic probabilistic functions. This is perhaps even more important than avoiding invocation of ergodicity, because a full extension of Birkhoff’s ergodic theorem to a cycloergodic theorem may not exist (William A. Gardner 2018a, 3); or, if such a theorem does exist, it may not include some important and practical stochastic process models that exhibit all the ergodic properties of practical interest.

In this paper, newly exposed advantages of adopting the FOT approach for time-series analysis instead of the stochastic-process-based model for the classes of stationary, cyclostationary, poly-cyclostationary, and almost-cyclostationary signals are highlighted. The commonly made assumptions of stationarity and/or ergodicity which, in many cases, cannot be proved or do not correspond to a physical interpretation of the phenomenon to be modeled, are shown to be unnecessary. Moreover, pitfalls arising in the adoption of the classical stochastic-process model are exposed.

Finally, the reader is informed that a lively debate on the subject of the FOT probability, in which Gardner represented the Pro side of the debate, was held in the SP Forum of the IEEE Signal Processing Magazine in April 1994, October 1994, January 1995, March 1995, and November 1995 issues. The aim of the debate was to “inform people about both sides of the controversy without trying to discredit either side.” The interested reader can go directly to the cited references.

Before proceeding, the types of applications for which the stochastic process is the model of choice are identified here. These fall into two categories: 1) Applications for which there exist many replicas of a time series of interest, or for which it at least is not inappropriate to pretend such replicas exist, and random variations from one replica to another are of interest. (However, the dichotomies exposed herein between properties of classically defined stochastic processes and individual sample paths may be important limitations of this model that should not be ignored.) 2) Nonstationary behavior that is not cyclostationary and not polycyclostationary and not almost cyclostationary, because for such processes there does not appear to be a viable probabilistic model that is alternative to the stochastic process. (Yet, the classical definition in terms of a probability measure has undesirable properties that can in some cases be avoided with stochastic models that are defined directly in terms of (a) an explicit time function that is dependent on some random variables or (b) explicit specification of the CDFs for all sets of time samples of the process, instead of explicit specification of a probability measure on the space of sample waveforms—with its attendant requirements that rule out some stochastic process models of interest).

The paper is organized as follows. In Section II, motivation for introducing FOT probability is presented. In Section III, the relative measure–the dual of the classic probability measure–and its properties are reviewed. In Section IV, the extension of the FOT-probability theory to almost-periodic phenomena is treated. In Section V, new insight into the mostly unstudied topic of cycloergodicity is provided and the problem of statistical function estimation in the FOT approach is addressed. Examples of the dichotomy existing between the properties of a stochastic process and those of its sample paths are presented in Section VI. Conclusions are drawn in Section VII.

###### 3.5.3 Motivation for Adopting Fraction-of-Time Probability

In order for the comparison between two alternative probability theories reviewed in this article to be accurate, certain technicalities cannot be swept under the rug as a means for avoiding mathematical terminology that is likely unfamiliar to some readers. Nevertheless, an attempt has been made to prevent the occasional unfamiliar concept or term from interrupting the flow of thought. In some places, brief parenthetical remarks are included to smooth the way. In addition, it is explained right here at the outset that an FOT theory of probability can be fully specified without the inclusion of any limits as averaging time approaches infinity. However, this does result in a theory for which properties like stationarity and cyclostationarity can only be approximate. This can complicate thinking about and using the theory. For this reason, only the version that is based upon a foundation defined in terms of limits as averaging time approaches infinity is reviewed here. One can think of this as analogous to accepting limits in the theory of calculus to obtain a theory that is much easier to think about and use than it would otherwise be. Also, the very brief use of a very few mathematical terms, like field and Borel subsets, can be ignored without loss of comprehension. And the term Lebesgue measure can be thought of as nothing more than a formalization of standard measures of length, area, and volume in Euclidean space. Finally, the differences between the more familiar Riemann integral and the Lebesgue and Riemann-Stieltjes integrals can be ignored, or standard definitions of these integrals can be looked up on the Web, but adequate comprehension for the non-mathematician does not require this.

The relative measure  and the infinite-time average are the fraction-of-time (FOT) counterparts of the probability measure  and the ensemble average, respectively (Kac and Steinhaus 1938), (Leśkow and Napolitano 2006).

Due to the differences between the relative measure, , on the relatively measurable sets (which are a subset of the -field of Borel subsets of the real line) and the probability measure, , on the -field of Borel subsets of a probability sample space, mathematical properties holding for stochastic processes do not necessarily have counterparts that hold for functions of time representing sample paths of these stochastic processes.

The key differences include:

• The class of the -measurable sets is closed under union and intersection; the class of the relatively measurable sets is not.
• is a -additive (additivity of countably infinite numbers of terms) measure;  is not.
• Expectation is -linear (linearity of an operator applied to a linear combination of a countably infinite number of terms); infinite-time average is not.
• Joint measurability of sample spaces is typically assumed but cannot be verified; joint relative measurability is a property of functions that can be verified.

These differences clearly show that the mathematical properties of the relative measure render it less amenable to mathematical study than do those of the probability measure . This, however, does not constitute an obstacle to using the FOT approach for signal analysis but, rather, as explained in this paper, provides motivation for using this approach instead of the classical stochastic-process approach based on .

Creators of the mathematical definition of a stochastic process dictated that certain properties of the mathematical entity (such as -additivity of the probability measure and -linearity of the expectation) must be exhibited so they could obtain mathematically desirable tractability. But as explained below, these dictations create a dichotomy between the abstract stochastic process properties and the properties of concrete individual sample paths of the stochastic process–the entities of primary interest to practitioners in many applications.

As also explained below, the creators of the mathematical definition of the FOT-probability model for functions, which can be thought of as single sample paths, did not dictate such problematic properties and therefore did not create such a dichotomy for the FOT probability approach.

The adoption of the FOT approach overcomes all problems arising from the need to check sufficient conditions for validating assumptions for ergodicity–problems which occur frequently in time-series analysis applications.

###### 3.5.4 Conclusions on Fraction-of-Time Probability

The fraction-of-time (FOT) probability approach for signal analysis is an alternative to the classical probability approach that models signals as sample paths of an ensemble which, together with a probability measure, is called a stochastic process. In the FOT approach, the unique function of time at the hand of the experimenter is not modeled as a representative of an ensemble. All the familiar probabilistic functions and parameters in this approach are constructed from this single function of time by adopting the relative measure of subsets of the real line (representing time) as a probability measure.

The relative measure is not -additive (additivity of countably infinite numbers of terms). Moreover, when it is adopted to construct the distribution function of the values assumed over time by a persistent function of time, the corresponding expectation operator, the infinite-time average, is not -linear (linearity of an operator applied to a linear combination of a countably infinite number of terms). These facts make the relative measure less amenable to mathematical manipulation involving infinite linear combinations than is the classical probability measure. However, when we define a stochastic process  as a function of two variables  and , for which  is the ensemble index, the results of operations (e.g., calculations of values of time-distributions or time-averages) made over the time line indexed by  for some fixed  (which represent empirical measurements) exhibit, by definition, the same properties as those of the FOT probabilistic functions and, therefore, do not have the -additivity and -linearity properties of the corresponding results of operations made over . Hence, if only one function of time is available at the hands of the experimenter, the abstraction of introducing a hypothetical ensemble indexed by  creates an unnecessary dichotomy between theory and measurements. Directly considering probabilistic functions constructed over  avoids this dichotomy between the properties of the stochastic process and those of its individual sample paths, as illustrated in the examples presented in (Napolitano and Gardner 2021). This is the primary motivation for adopting the FOT probability approach for signal analysis.

The deep difference between the relative measure and the probability measure is a result of dictating that the measure of the hypothetical sample space  be finite, which enables the introduction of Axiom VI of probability. These assumed properties are at odds with the empirical quantities to be modeled. They artificially give to the probability measure properties that are not shared with the relative measure, thereby rendering it more amenable to mathematical manipulations involving infinite linear combinations.

The conceptual usefulness of the FOT approach is evident in statistical spectral analysis of time series. This is the point of view of the book (W. A. Gardner 1987b), and the early work on generalized harmonic analysis (Wiener 1930). In most problems of statistical spectral analysis, in fact, there is no need to model the available time series as a representative of an ensemble and, hence, there is no need to willfully incur a dichotomy between theory and measurements.

In Information Theory, compression of individual sequences has been shown to be a practicable and, in some cases, a convenient approach (Ziv 1978). The same is true of channel coding. For example, the construction of block codes and convolutional codes and the corresponding decoding procedure can be made without involving any stochastic concept. This fact is reflected in James L. Massey’s remark in review of the 1987 book (W. A. Gardner 1987b), “I admire the scholarship of this book and its radical departure from the stochastic process bandwagon of the past 40 years.” This is remarkable considering his prominent place in the development of coding theory and cryptography. However, in contrast to the practice of source and channel coding, in order to prove the channel coding theorem, one must adopt the concept of typical sequences whose probabilistic characterization is formulated in terms of the classical stochastic process. This is so, even though all the basic quantities used in Information Theory can be defined in terms of FOT probability.

Even if typical sequences can be characterized without introducing the stochastic process (Kac 1959Chap. 2), the proof of the channel-coding theorem is based on the characterization of typical sequences using the classical stochastic approach and the concept of the ensemble of all possible codes for a given channel. No proof of the channel coding theorem based on FOT probability is on the horizon.

A field where the FOT approach appears methodologically more appropriate than the classical stochastic approach is that of Monte Carlo simulations. They are FOT simulations, not stochastic simulations (Dehay, Leśkow, and Napolitano 2018Sec. 4.4). In fact, a computer program for random number generation produces a unique periodic sequence with a very long period. Calling such a routine several times (with different seeds) is equivalent to picking different time segments of the unique sequence. So, the sample space is time indexed, not ensemble indexed. (If the period is sufficiently long, the sequence can be considered aperiodic for practical purposes.)

The extension of the FOT approach to periodic phenomena, that is, those which produce time series through the interaction of periodic mechanisms and random phenomena, is based on the nonobvious result that the almost-periodic component extraction operator is a valid expectation operator. When it is applied to the indicator of the set of values of  where a time series is below a threshold , one obtains, for every value of , a valid distribution function in . The obtained function is almost-periodic in  by construction. Therefore, it is suitable for an FOT probabilistic characterization of cyclostationary, poly-cyclostationary, and almost-cyclostationary time series. In particular, a complete temporal probabilistic theory for these time series can be constructed (W. A. Gardner 1987b), (W. A. Gardner and Brown 1991), including a theory of higher-order moments and cumulants (W. A. Gardner and Spooner 1994).

The FOT approach for almost-cyclostationary signals can be extended to the class of the generalized almost-cyclostationary (GACS) signals (Napolitano 2012Chap. 2), (Napolitano 2019Chap. 12). GACS signals have multivariate statistical functions almost periodic with respect to time with (generalized) Fourier series expansions for which not only the coefficients depend on the lag parameters of the time-shifted versions of the time series, but also the frequencies. For these signals, the almost-periodic component extraction operator can be adopted as the expectation operator and a complete FOT higher-order theory is developed in (Izzo and Napolitano 1998).

Other generalizations of the class of almost-cyclostationary processes are the spectrally correlated (SC) processes (Napolitano 2012Chap. 4), (Napolitano 2019Chap. 13) and the oscillatory almost-cyclostationary (OACS) processes (Napolitano 2019Chap. 14). For these classes of nonstationary stochastic processes, there apparently do not exist FOT-probability counterparts despite the relationships of these processes to almost cyclostationary processes because the nonstationarity of these generalizations is not almost periodic or otherwise of known functional form. However, time-warped almost-cyclostationary signals have been treated in (William A. Gardner 2018b) with an approach that mostly avoids stochastic processes and ergodicity.

###### 3.5.5 References on Fraction-of-Time Probability

Ashley, Steven. 2015. “Core Concept: Ergodic Theory Plays a Key Role in Multiple Fields.” Proceedings of the National Academy of Sciences 112 (7): 1914. https://doi.org/10.1073/pnas.1500429112.

Bass, J. 1959. “Suites uniformément Denses, Moyennes trigonométriques, Fonctions pseudo-aléatoires.” Bulletin de La Société Mathématique de France 87: 1–64.

———. 1963. “Espaces de Besicovitch, Fonctions presque-périodiques, Fonctions pseudo-aléatories.” Bulletin de La Société Mathématique de France 91: 39–61.

———. 1971. Cours de Mathématique. Vol. III. Paris: Masson & Cie.

Benedetto, J. J. 1996. Harmonic Analysis and Applications. New York: CRC Press.

Bertrandias, J.-P. 1966. “Espaces de Fonctions bornées Et Continues En Moyenne Asymptotique d’orde .” Bull. Soc. Math. France (Mémories de La S.M.F.) 5: 3–106.

Birkhoff, George D. 1931. “Proof of the Ergodic Theorem.” Proceedings of the National Academy of Sciences 17 (12): 656–60. https://doi.org/10.1073/pnas.17.2.656.

Bochner, S., and B. Jessen. 1934. “Distribution Functions and Positive-Definite Functions.” Annals of Mathematics 35 (2): 252–57.

Boyles, R., and W. A Gardner. 1983. “Cycloergodic Properties of Discrete- Parameter Nonstationary Stochastic Processes.” IEEE Transactions on Information Theory IT-29 (1): 105–14. https://doi.org/10.1109/TIT.1983.1056613.

Casinovi, G. 2007. “-Norm Convergence Properties of Correlogram Spectral Estimates.” IEEE Transactions on Signal Processing 55 (9): 4354–65.

Dehay, D., J. Leśkow, and A. Napolitano. 2013. “Central Limit Theorem in the Functional Approach.” IEEE Transactions on Signal Processing 61 (16): 4025–37. https://doi.org/10.1109/TSP.2013.2266324.

———. 2018. “Time Average Estimation in the Fraction-of-Time Probability Framework.” Signal Processing 153: 275–90. https://doi.org/10.1016/j.sigpro.2018.07.005.

Doctor, Jason N., Peter P. Wakker, and Tong V. Wang. 2020. “Economists’ Views on the Ergodicity Problem.” Nature Physics 16 (12): 1168. https://doi.org/10.1038/s41567-020-01106-x.

Editorial. 2019. “Time to Move Beyond Average Thinking.” Nature Physics 15 (12): 1207. https://doi.org/10.1038/s41567-019-0758-3.

Einstein, A. 1987. “Method for the Determinination of the Statistical Values of Observations Concerning Quantities Subject to Irregular Fluctuations.” IEEE ASSP Magazine 4 (4): 6. https://doi.org/10.1109/MASSP.1987.1165595.

Fukuyama, Katusi. 1988. “Some Limit Theorems of Almost Periodic Function Systems Under the Relative Measure.” Journal of Mathematics of Kyoto University 28 (3): 557–77. https://doi.org/10.1215/kjm/1250520403.

Furstenberg, Harry. 1960. Stationary Processes and Prediction Theory. (AM-44). Princeton University Press.

Gardner, W. A. 1985. “Introduction to Random Processes with Applications to Signals and Systems.” In. New York: Macmillan.

———. 1987a. “Introduction to Einstein’s Contribution to Time-Series Analysis.” IEEE ASSP Magazine 4 (4): 4–5. https://doi.org/10.1109/MASSP.1987.1165601.

———. 1987b. “Statistical Spectral Analysis: A Nonprobabilistic Theory.” In. Englewood Cliffs, NJ: Prentice-Hall.

———. 1994. “An Introduction to Cyclostationary Signals.” In Cyclostationarity in Communications and Signal Processing, edited by W. A. Gardner, 1–90. IEEE Press.

Gardner, W. A., and W. A. Brown. 1991. “Fraction-of-Time Probability for Time-Series That Exhibit Cyclostationarity.” Signal Processing 23: 273–92.

Gardner, W. A., and C. M. Spooner. 1994. “The Cumulant Theory of Cyclostationary Time-Series. Part I: Foundation.” IEEE Transactions on Signal Processing 42: 3387–3408.

Gardner, William A. 2018a. “Cyclostationarity.com.” 2018. https://cyclostationarity.com.

———. 2018b. “Statistically Inferred Time Warping: Extending the Cyclostationarity Paradigm from Regular to Irregular Statistical Cyclicity in Scientific Data.” EURASIP Journal on Advances in Signal Processing 2018 (1): 59. https://doi.org/10.1186/s13634-018-0564-6.

Gray, Robert M. 2009. Probability, Random Processes, and Ergodic Properties. 2nd ed. Dordrecht, NL: Springer Verlang.

Halmos, P. R. 1944. “The Foundations of Probability.” The American Mathematical Monthly 51 (9): 493–510.

———. 1956. Lectures on Ergodic Theory. Mathematical Society of Japan.

Hartman, Philip, E. R. van Kampen, and Aurel Wintner. 1939. “Asymptotic Distributions and Statistical Independence.” American Journal of Mathematics 61 (2): 477–86.

Haviland, E. K. 1933. “On Statistical Methods in the Theory of Almost-Periodic Functions.” Proceedings of the National Academy of Science of the U.S.A. 19: 549–55.

Hien, P. P. 1975. “Mesure Asymptotique définie Par Une Fonction à Valeurs Dans  Ou Dans Un Espace Vectoriel Topologique.” Annales de l’Institut Henri Poincaré–Section B XI (1): 23–107.

Hofstetter, E. M. 1964. “Random Processes.” In The Mathematics of Physics and Chemistry, edited by H. Margenau and G. M. Murphy. Vol. II. Princeton, NJ: D. Van Nostrand Co.

Hurd, H. L., and T. Koski. 2004. “The Wold Isomorphism for Cyclostationary Sequences.” Signal Processing 84 (May): 813–24.

Izzo, L., and A. Napolitano. 1998. “The Higher-Order Theory of Generalized Almost-Cyclostationary Time-Series.” IEEE Transactions on Signal Processing 46 (11): 2975–89. https://doi.org/10.1109/78.726811.

———. 2002. “Linear Time-Variant Transformations of Generalized Almost-Cyclostationary Signals. Part I: Theory and Method.” IEEE Transactions on Signal Processing 50 (12): 2947–61. https://doi.org/10.1109/TSP.2002.805499.

Jessen, B., and A. Wintner. 1935. “Distribution Functions and the Riemann Zeta Function.” Transactions of the American Mathematical Society 38 (1): 48–88.

Kac, M. 1959. Statistical Independence in Probability, Analysis and Number Theory. USA: The Mathematical Association of America.

Kac, M., and H. Steinhaus. 1938. “Sur Les Fonctions indépendantes (IV) (Intervalle infini).” Studia Mathematica 7: 1–15.

Katznelson, Yitzhak, and Benjamin Weiss. 1982. “A Simple Proof of Some Ergodic Theorems.” Israel Journal of Mathematics 42 (4): 291–96. https://doi.org/10.1007/BF02761409.

Kershner, Richard, and Aurel Wintner. 1936. “On the Asymptotic Distribution of Almost Periodic Functions with Linearly Independent Frequencies.” American Journal of Mathematics 58 (1): 91–94. https://doi.org/10.2307/2371059.

Kindermann, Farina, Andreas Dechant, Michael Hohmann, Tobias Lausch, Daniel Mayer, Felix Schmidt, Eric Lutz, and Artur Widera. 2017. “Nonergodic Diffusion of Single Atoms in a Periodic Potential.” Nature Physics 13 (2): 137–41. https://doi.org/10.1038/nphys3911.

Kolmogorov, A. N. 1933. Foundations of the Theory of Probability.

Krengel, Ulrich. 1985. Ergodic Theorems. Berlin: Walter de Gruyter.

Lee, Y. W. 1967. Statistical Theory of Communication. New York: Wiley.

Leśkow, J., and A. Napolitano. 2002. “Quantile Prediction for Time Series in the Fraction-of-Time Probability Framework.” Signal Processing 82 (11): 1727–41. https://doi.org/10.1016/S0165-1684(02)00334-1.

———. 2006. “Foundations of the Functional Approach for Signal Analysis.” Signal Processing 86 (12): 3796–3825. https://doi.org/10.1016/j.sigpro.2006.03.028.

———. 2007. “Non-Relatively Measurable Functions for Secure-Communications Signal Design.” Signal Processing 87 (11): 2765–80. https://doi.org/10.1016/j.sigpro.2007.05.005.

Marcinkiewicz, J., and A. Zygmund. 1937. “Sur Les Fonctions indépendantes.” Fundamenta Mathematica 29: 60–90.

Mari, J. 1996. “A Counterexample in Power Signals Space.” IEEE Transactions on Automatic Control 41 (1): 115–16. https://doi.org/10.1109/9.481613.

Mauro, John C., Prabhat K. Gupta, and Roger J. Loucks. 2007. “Continuously Broken Ergodicity.” The Journal of Chemical Physics 126 (18): 184511. https://doi.org/10.1063/1.2731774.

Mäkilä, P. M. 2004. “On Chaotic and Random Sequences.” Physica D 198: 300–318.

Miao, Hongxia, Feng Zhang, and Ran Tao. 2021. “A General Fraction-of-Time Probability Framework for Chirp Cyclostationary Signals.” Signal Processing 179: 107820. https://doi.org/10.1016/j.sigpro.2020.107820.

Michaeli, T., V. Pohl, and Y. C. Eldar. 2011. “U-Invariant Sampling: Extrapolation and Causal Interpolation from Generalized Samples.” IEEE Transactions on Signal Processing 59 (5): 2085–2100. https://doi.org/10.1109/TSP.2011.2113342.

Napolitano, A. 2012. Generalizations of Cyclostationary Signal Processing: Spectral Analysis and Applications. John Wiley & Sons Ltd – IEEE Press. https://doi.org/10.1002/9781118437926.

———. 2019. Cyclostationary Processes and Time Series: Theory, Applications, and Generalizations. Elsevier. https://doi.org/10.1016/C2017-0-04240-4.

Napolitano, A., and W. A. Gardner. 2021. “Fraction-of-Time Probability: Advancing Beyond the Need for
Stationarity and Ergodicity Assumptions.” Feature Article submitted to IEEE Signal Processing Magazine, Aug 15, 2021.

Nobel, A. B. 2004. “Some Stochastic Properties of Memoryless Individual Sequences.” IEEE Transactions on Information Theory 50 (7): 1497–1505. https://doi.org/10.1109/TIT.2004.830750.

Pfaffelhuber, E. 1975. “Generalized Harmonic Analysis for Distributions.” IEEE Transactions on Information Theory IT-21: 605–11.

Shields, P. 1987. “The Ergodic and Entropy Theorems Revisited.” IEEE Transactions on Information Theory 33 (2): 263–66. https://doi.org/10.1109/TIT.1987.1057287.

Urbanik, K. 1958. “Effective Processes in the Sense of H. Steinhaus.” Studia Mathematica XVII.

von Neumann, J. 1932. “Proof of the Quasi-Ergodic Hypothesis.” Proceedings of the National Academy of Sciences 18 (1): 70–82. https://doi.org/10.1073/pnas.18.1.70.

Wiener, N. 1930. “Generalized Harmonic Analysis.” Acta Mathematica 55: 117–258.

———. 1933. The Fourier Integral and Certain of Its Applications. London: Cambridge University Press.

———. 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, MA: MIT Press.

Wintner, A. 1932. “On the Asymptotic Repartition of the Values of Real Almost Periodic Functions.” American Journal of Mathematics 54 (2): 339–45.

Wold, H. O. A. 1948. “On Prediction in Stationary Time Series.” Ann. Math. Statist. 19: 558–67.

Yaglom, A. 1987. “Einstein’s 1914 Paper on the Theory of Irregularly Fluctuating Series of Observations.” IEEE ASSP Magazine 4 (4): 7–11. https://doi.org/10.1109/MASSP.1987.1165596.

Zhang, C. 2004. “New Limit Power Function Spaces.” IEEE Transactions on Automatic Control 49 (5): 763–66.

Ziv, J. 1978. “Coding Theorems for Individual Sequences.” IEEE Transactions on Information Theory 24 (4): 405–12. https://doi.org/10.1109/TIT.1978.1055911.

• 3.6 Status of the Theory of Cycloergodicity

Content in Preparation