Ensemble Statistics, Probability, Stochastic Processes, and their Temporal Counterparts

W.A. Gardner

The objective of this page 3 is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the more abstract theory of Stochastic Processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Discussion of the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.

Briefing Summary: Fraction-of-Time Probability vs. Stochastic Process Probability

This document reviews the concept of Fraction-of-Time (FOT) probability, an alternative to the more widely taught stochastic process (or population-based) probability. The central argument is that FOT probability, despite being highly relevant in many engineering and scientific applications, is often overlooked in academia in favor of stochastic process models. This briefing will explain the core differences, highlight the historical reasons for the current situation, and argue for a more balanced approach in education.

Main Themes

Paradigm Shift: The document argues that a slow paradigm shift is underway, moving from stochastic process models to FOT probability, especially in fields involving time series analysis. This shift, it’s claimed, is driven by the limitations of population-based models for single-function analysis.
Two Meanings of Probability: There are two fundamental ways of understanding probability:
- Stochastic Process (Population) Probability: Based on the idea of a “sample space,” a population of functions from which samples can be randomly drawn. Probability is defined as a function of subsets (events) within this space. This approach is championed by mathematicians for its mathematical amenability to theorem proving, applicability outside of time series analysis, and ability to accommodate arbitrary non-stationarity.
- Fraction-of-Time (FOT) Probability: Based on time averages of a single function. Probability is defined as the fraction of time an event occurs within that function, as the averaging time approaches infinity. This approach is championed for its elegance, practicality, and direct relevance to single empirically observed time series of data.

Historical Development

The document posits that mathematicians, from the 1930s to the 1960s, favored the stochastic process model (Kolmogorov’s approach) because it was more amenable to developing and proving theorems, not because it was necessarily more applicable to practical problems. This preference led to the exclusion of FOT probability from many academic curricula.

Limitations of Stochastic Process Models
- Absence of Population: The stochastic process approach requires a “population” of functions from which samples can be drawn. Many real-world situations (e.g., a single cosmic universe, or an individual machine’s vibrations) lack this population, making the application of stochastic processes scientifically unsound in these contexts. As the text states, “There is no scientific evidence that a population of cosmic universes exists, so to base statistical studies of astronomical time series data on an assumed population model of time series data is non-scientific.”
- Ergodicity Assumption: The traditional approach often invokes the ergodic hypothesis (equating sample-space and time averages). The document suggests that it’s often applied without proof or verification.
- Masking of Characteristics: Important characteristics of individual functions can be masked in a stochastic process model, as these are expected values over the population, and may not be present in all (or any) individual functions. For instance, whether a signal will create a sine wave under a memoryless non-linear transformation is a property of the single signal in FOT probability, not of the expected behavior of the population in a stochastic model. As the text puts it: “Because important characteristics of cyclostationarity are characteristics of individual sample functions, not of expected functions representing average behavior over the sample space, the stochastic process can mask these characteristics that are surfaced with the non-population probability calculations.”
Advantages of FOT Probability
- Direct Applicability: FOT probability directly uses time averages of the available function, which is exactly how real-world measurements are often performed.
- Simpler and Less Abstract: When applicable, FOT probability is less abstract and more intuitive than dealing with theoretical sample spaces. “Because of this crucial distinction, the alternative probability theory, when it applies, is more elegant and less abstract and, as a result, is able to be used more effectively.”
- Cyclostationarity: FOT is well-suited for analysis of cyclostationary signals (signals with periodic statistical properties). Indeed, the theory was revolutionary in its application to signals intelligence and particularly communications intelligence in the 1980s (see, (Gardner, 2018a, pp. 6 and 12)).

Call for Change in Education

The document argues that both types of probability are essential tools, and a balanced education is crucial. “To put it succinctly, there is no good reason for university instructors in engineering and science to choose between introducing students exclusively to population probability or non-population probability.” It calls for supplementing standard stochastic process introductions with FOT probability, to enable students to choose the most appropriate model for practical problems.

Key Quotes and Supporting Arguments

- On the problem of relying solely on sample-space approaches: “The simple fact that there exists a mathematically viable theory of non-population probability (viz., FOT probability) for individual time functions and their time-average statistics that can often be more appropriate in practice than the population probability theory for ensembles of functions seems to have been studiously ignored ever since its full-fledged introduction for both stationary times series and time series exhibiting cyclostationarity in the mid-1980s.”
- On the practical advantage of FOT probability: “This concept is actually crucial in empirical work in the subfield of electrical engineering we call random signals and noise… Empirical work involving uncertain functions is commonly conducted using time averages…”
- On the distinction between the two approaches: “All the quantities encompassed by this alternative probability theory are defined in terms of measurements on a single function of time without any reference to a population of functions.”
- On the dual nature of the two theories: “Despite this fundamental conceptual difference between the traditional population probability theory and the alternative non-population probability theory of functions, the mechanics of using these theories are essentially identical!”
- Endorsements of FOT probability: The document includes endorsements from prominent scientists like Enders A. Robinson, Ronald N. Bracewell, and James Massey to further bolster the credibility and importance of the FOT model. For example, Robinson writes, “Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data.” And Bracewell states, “Professor Gardner’s book demonstrates a consistent approach from data, those things which in fact are given, and shows that analysis need not proceed from assumed probability distributions or random processes. This is a healthy approach and one that can be recommended to any reader.”
- On the importance of recognizing the limitations of stochastic models: “If we are to go beyond pure mathematical deduction and make advances in the realm of phenomena, theory should start from the data. To do otherwise risks failure to discover that which is not built into the model . . .”

Conclusion

The provided text advocates for a significant shift in how probability is taught and applied, especially within engineering and scientific fields dealing with time-series data. The argument is not to abandon stochastic process models, but to recognize their limitations and to introduce students to the more practical and conceptually appropriate alternative offered by Fraction-of-Time probability. The goal is to equip future engineers and scientists with the ability to choose the most suitable framework for the problem at hand.

History of a Paradigm Shift Between Two Alternative Meanings of Probability

The motivation for this website is the recognition that a paradigm shift in theory has been very slowly underway since the mid-1980s, encompassing both stationary time series and those exhibiting cyclostationarity, and an older rudimentary form of this theory for stationary time series represents relatively common practice as a tool for research and product development and testing in industry, especially in fields of engineering involving time series analysis. Interestingly, however, is the fact that this theory, which is based on what is called Fraction-of-Time (FOT) Probability. has apparently not been integrated into curricula in academia. (There may be a few exceptions to this broad statement, but the WCM is not aware of them.) This disconnect between academia and engineering industry is likely due to a situation that arose between the 1930s and 1960s whereby mathematicians apparently shunned previous practice in applied work involving random functions of time, which practice exclusively worked with time-average statistics (FOT probability), and adopted a new theory that was more amenable to developing and proving theorems—the mathematician’s bread and butter—than to elegant conceptualization of practical problems to be solved and avoidance of unnecessary abstraction.

This new theory put forth by Kolmogorov in the early 1930s, was based on ensemble or sample-space average statistics. The problem with this rejection of the earlier practical approach is that there are many problems to be solved for which there is no counterpart in the real world to the Kolmogorov’s assumed sample space, namely a population of time functions from which one can draw samples at random. The simple fact that there exists a mathematically viable theory of non-population probability (viz., FOT probability) for individual time functions and their time-average statistics that can often be more appropriate in practice than the population probability theory for ensembles of functions seems to have been studiously ignored ever since its full-fledged introduction for both stationary times series and time series exhibiting cyclostationarity in the mid-1980s. It is conceivable that this situation is a result of the fact that the population probability theory is far more widely applicable, because it accommodates arbitrarily nonstationary (time-varying) probabilities, whereas the non-population probability theory accommodates only stationary (time-invariant), cyclostationary (periodic), and almost-cyclostationary (almost periodic) probabilities. An almost periodic function can be represented as a sum of periodic functions with incommensurate periods. Nevertheless, this more restricted applicability still accommodates a very broad array of practical applications—broad enough for this shunned theory to merit being a standard tool in every communication systems engineer’s analytic toolbox and likewise for rotating machinery engineers and other statistical analysts of time series data exhibiting statistical cyclicity, including those working in econometrics and a variety of natural sciences where rhythms arise.

In addition to the generality of arbitrary nonstationarity, this head-in-the-sand situation is supported by the fact that there are indeed many applications for statistical inference on time series data involving populations of functions—probably more than those not associated with a population. The argument presented here is that not only is population probability here to stay for good reason but also that non-population probability is equally deserving of the same status. Non-population probability theory for individual time functions is needed for all those applications for which no population of functions is available in the application of interest or, more strongly, no population can exist. Example? There is no scientific evidence that a population of cosmic universes exists, so to base statistical studies of astronomical time series data on an assumed population model of time series data is non-scientific. So, why is it being done? Perhaps the answer is as simple as the fact that mathematical statisticians have had a substantial influence over the general field of statistics, and population models of probability are generally preferred to non-population models for exclusively mathematical reasons, notably proving theorems. But this situation was admittedly artificially created by Kolmogorov (who expressed his reservations) by simply adopting stochastic process model axioms that rule out function behavior that can be less mathematically friendly. That such less friendly behavior of individual time functions can arise in the non-population probability model is simply a reflection of the practice of using time averages in empirical time series analysis [Napolitano, A., Gardner, W.A.: Fraction-of-time probability: Advancing beyond the need for stationarity and ergodicity assumptions. IEEE Access 10, 34591–34612 (2022)]. It is a reflection of reality, as ugly as some mathematicians may find that reality.

To put it succinctly, there is no good reason for university instructors in engineering and science to choose between introducing students exclusively to population probability or non-population probability. It is a simple matter to supplement standard introductory treatments of stochastic processes (population probability for functions) with a more brief introduction to the alternative random functions (non-population probability for functions). This equips students after graduation to make an intelligent choice when presented with a practical problem involving time functions and a desire to perform probabilistic or statistical analysis of function behavior. With the status quo, the vast majority of our graduates are completely unaware of non-population probability and are known to simply default to stochastic processes, believing it is the only option available to them.

There is an accumulating number of endorsements in support of returning to the now-fully-developed version of the earlier nascent non-population theory of probability for single functions.

As an illustrative example, the field of Signals Intelligence, and particularly Communications Intelligence, was revolutionized by Gardner’s initial 1987 revelation of his non-population theory of cyclostationarity and his demonstration of its applicability to Communication Signal Interception (cf., (Gardner, 2018a, pp. 6 and 12)). (Despite this key fact, researchers typically use the alternative stochastic process version of Gardner’s theory, which he introduced simultaneously in his 1985 book [Bk3], most likely because of indoctrination in our universities, despite Gardner’s demonstration of major new insights gained from using the non-population theory in signal interception.) More recently, the nascent movement in Econometrics referred to as Ergodicity Economics is in essence a return from population-probability models and methods to their non-population probability counterpart [Peters, O.: The ergodicity problem in economics. Nature Physics 15(12), 1216–1221 (2019)]. The editorial introducing the issue of Nature/Physics [Time to move beyond average thinking. Nature Physics 15(12), 1207 (2019)] containing the above article is in complete alignment with the editorial remarks on the wisdom of a return to non-population probability the Authors have included throughout a number of their publications, such as [Napolitano, A., Gardner, W.A.: Fraction-of-time probability: Advancing beyond the need for stationarity and ergodicity assumptions. IEEE Access 10, 34591–34612 (2022)]. And this is in complete alignment with the remarks from the editor of the Journal of Sound and Vibration cited in the article Gardner (2023) [Gardner, W.A.: Transitioning away from stochastic process models. Journal of Sound and Vibration, 117871 (2023)]. Other endorsements from leaders in the field that are published in this article include the following:

Professor Enders A. Robinson, originator of the digital revolution in geophysics, and highest honored scientist in the field of geophysics, in a published review of the book [Bk2] introducing FOT-Probability theory [Signal Processing, EURASIP, and Journal of Dynamical Systems, Measurement, and Control, ASME, 1990], wrote:

“This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”

Similarly, the following quotation from Professor Ronald N. Bracewell – recipient of the IEEE’s Heinrich Hertz medal for pioneering work in antenna aperture synthesis and image reconstruction as applied to radio astronomy and to computer-assisted tomography – taken from his Foreword to the book [Bk2], introducing FOT-Probability theory, makes essentially the same point that Robinson makes:

“If we are to go beyond pure mathematical deduction and make advances in the realm of phenomena, theory should start from the data. To do otherwise risks failure to discover that which is not built into the model . . . Professor Gardner’s book demonstrates a consistent approach from data, those things which in fact are given, and shows that analysis need not proceed from assumed probability distributions or random processes. This is a healthy approach and one that can be recommended to any reader.”

Not to belabor the point, but even the information theorist, Professor James Massey – Professor of Digital Technology at ETH Zurich, IEEE Alexander Graham Bell medalist and member of the National Academy of Engineering – wrote, in a 1986 prepublication review of the book [Bk2], “I admire the scholarship of this book and its radical departure from the stochastic process bandwagon of the past 40 years.”

Clearly, evidence in support of Gardner’s 1987 proposal for a paradigm shift in time series analysis is mounting and suggests that the shift is solidly underway now. But there’s no reflection of this in university curricula! Perhaps, overwhelmed by the breakneck pace of technological advances, in electrical engineering for example, university faculty members may have simply had too many other pressing matters in keeping curricula up to date.

Narrative Explanation of Two Alternative Meanings of Probability

In order to be more concrete about these two alternative meanings of probability, this overview is concluded with a detailed narrative explanation of the exact difference between the two meanings.

The classical meaning of Probability and Random Variables is taught exclusively in terms of populations which are called samples spaces. When an experiment is performed, the sample space is defined to be the set of all possible elementary or indecomposable experimental outcomes. For example, when a noise voltage across the terminals of an electrically resistive device is measured, the relevant sample space for this experiment can be said to contain all the electrical states of the device described by positions and velocities of charge carriers.

The sample space is often not analytically described in any detail; it is used mostly as a concept. For example, a random variable is defined to be a real-valued function on a sample space. Similarly, probability is defined to be a real-valued function on all the well-behaved subsets of a sample space. Each such subset is called an event. When an experiment is performed, various events can occur. Every event subset of the sample space that contains the one sample point that occurs upon execution of the experiment is said to have occurred. An example of four such events is Event-1 = noise voltage is positive, Even-2 = noise voltage is less than 10 volts, Event-3 = noise voltage is between 2 and 3 volts, and Event-4 = noise voltage is between -1 and 5 volts. If the measured voltage is 2.6 volts, all four of these events occur.

In order for the function of event sets to be a valid probability for a given sample space, it must satisfy three axioms:

Axiom 1: The probability of any event is a non-negative real number

Axiom 2: The probability of the event consisting of the entire sample space is 1

Axiom 3: For any collection of mutually exclusive events, the probability of their union is the sum of their individual probabilities

This universally accepted notion and formal definition of probability has proven itself to be of fundamental importance in analysis of matters of chance. In particular, to consider just one technical area where this notion defines the field of study, we have functions of time that involve matters of chance. Examples include information bearing signals, whose embedded information content is usefully modeled in terms of probability and random variables.

However, there are some matters of chance that arise throughout electrical engineering, for example, for which an alternative concept of probability has proven itself to also be of fundamental importance. This concept is actually crucial in empirical work in the subfield of electrical engineering we call random signals and noise. Outside of electrical engineering proper, this subfield is often referred to as statistical time series analysis, or random functions. In these descriptors, the term “random” is here intended to mean nothing more than signals or noise or other functions with uncertain behavior not necessarily having anything to do with the traditional concept of population probability described above.

Empirical work involving uncertain functions is commonly conducted using time averages of the uncertain functions and time averages of functions of these uncertain functions such as the square of the uncertain function or the indicator function which takes on the value 1 at times for which the uncertain function exhibits behavior defining an event of interest and takes on the value of 0 for all other times.

Despite the ubiquity of this work based on these times averages, it is generally unknown today that there is a mature comprehensive theory of probability based on time averages of functions, which has nothing to do with the traditional concept of probability based on sample spaces or, equivalently, populations of functions. All the quantities encompassed by this alternative probability theory are defined in terms of measurements on a single function of time without any reference to a population of functions. Such quantities give rise to the most elegant theory when the averages are taken to be the limit as averaging time approaches infinity. This is analogous to the considerable utility of the law of large numbers in traditional probability theory, for which the expected value is arrived at by taking the limit of an average over a set of random samples from a sample space, as the set size approaches infinity.

The time average of the event-indicator function produces the length of time over which the event of interest occurs divided by the total length of time averaged over. That is, it is the fraction of time (FOT) over which an event of interest occurs. This can also be referred to as the relative frequency of occurrence over time of the event. In the limit as the averaging time approaches infinity, this relative frequency defines the FOT probability of the event. In comparison, the population probability of an event is (by the law of large numbers) the limit of the relative frequency of occurrence of the event among repeated random samples taken from the sample space, which also is a relative frequency of occurrence.

Despite this fundamental conceptual difference between the traditional population probability theory and the alternative non-population probability theory of functions, the mechanics of using these theories are essentially identical! The mathematics of the two theories based on alternative relative frequencies can be said to be dual. The crucial difference between these theories is that one theory can only be applied scientifically if there exists an actual population associated with the experiments being studied, whereas the other does not in any way involve populations and can be applied scientifically to a single function of time. Because of this crucial distinction, the alternative probability theory, when it applies, is more elegant and less abstract and, as a result, is able to be used more effectively. This has been proven in several fields of endeavor dating back to before the turn of the century including the fields of communications systems and signals intelligence systems, particularly for communications signals. In this latter field, the subject of signal interception has been revolutionized by the non-population probability theory of cyclostationarity (much of this work is protected by governments for national security). In fact, the cyclostationarity paradigm shift that is now mature was achieved by replacing the initial stochastic process formulation (the population-probability theory of time functions) with the fraction-of-time probability formulation (the non-population probability theory of time functions). More recently, in a nascent paradigm shift in econometrics, which goes by the somewhat misleading name “ergodicity economics” and which some say is revolutionizing econometrics, the inappropriateness of sample space averages for some purposes is being recognized and replaced with time averages.

As suggested by the term ergodicity economics, this paradigm shift from population probability to non-population probability has a connection to the concept and theory of ergodicity of samples spaces and probability measures. This historical connection is somewhat of a red herring on the path toward understanding probability modeling today. It is primarily useful for seeing the extremes to which analysts were forced when laboring under the delusion that population probability is the only probability available to them for studying uncertain functions. This unpleasant aspect of history can be summarized and dispensed with as follows.

Having only a population-based probability theory in the analysts mathematical tool box, while being faced with practical problems involving uncertain functions for which no populations exist, and while recognizing the considerable utility of time averaged measurements on the single function, the ergodic hypothesis was adopted: “Assume that the stochastic process model adopted is ergodic, meaning that sample space averages and time averages of any measurement both converge to the same expected values”.

The famous ergodic theorem of Birkhoff published in the early 1930s provided the necessary and sufficient mathematical condition on the probability measure for the sample space of a stochastic process to be ergodic. But it is a rare occasion in practice when the analyst is able to apply this theorem to a specific stochastic process model and determine whether or not that model is ergodic. So, many analysts continued with the non-scientific practice of using the ergodic hypothesis without ever determining if it is true or false for the particular model it was being applied to. Mathematical analysis was carried out based on the stochastic process model using expectation, and then time averages would be measured in experimental work with the objective of verifying the mathematical results, not knowing if there was any theoretical connection between the two results. Moreover, algorithms derived from the mathematics were often modified by replacing expected values in mathematically derived results with time averages so they could be implemented in applications for which there was no population of functions.

This unsavory work still goes on today. Why? Because academia has for the most part not updated its curricula to reflect the 37-year-old fact that non-population probability exists and is characterized by a complete theory that is highly analogous to the population-probability theory of stochastic processes that academia continues to teach exclusively.

To complete this overview of a paradigm shift between two alternative meanings of probability, it is important to surface two additional facts concerning these alternatives. The first fact favors the stochastic process. Non-population probability enables us to build mathematical models of functions for which the probabilities are time invariant and are called stationary. On the other hand, stochastic process models can be comprised of probabilities that vary in time in an almost arbitrarily non-stationary manner. This opens up the theory to many more applications. The only catch is that such applications must involve populations if this problem solving approach is to be considered scientific. Fortunately, there is no shortage of such applications. And for this reason, the subject of stochastic processes is very broad and has many applications.

Nevertheless, there is a way to modify how the time averages are calculated in order to generalize the definition of non-population probability so as to accommodate a very specific type of nonstationarity, which in the simplest case is called cyclostationarity and in the less simple case is called almost cyclostationarity. The modification required entails replacing the conventional uniformly weighted time average with sinusoidally weighted time averages. This requires some relatively detailed explanations and is excluded from this overview. However, it is important to mention here that the generalization to almost cyclostationary probability introduces a third distinct meaning of probability, in which the interpretation as a fraction-of-time of occurrence of an event is forfeited. Yet, there is no need to revert to sample spaces and population probability: the almost periodically time varying probabilities are constructed by combining genuine periodically time-varying FOT probabilities having incommensurate periods. Because important characteristics of cyclostationarity are characteristics of individual sample functions, not of expected functions representing average behavior over the sample space, the stochastic process can mask these characteristics that are surfaced with the non-population probability calculations. An example is the characteristic of a function of being able produce a finite strength additive sine wave component by being subjected to a memoryless nonlinear transformation, e.g., a simple squaring of the function. This characteristic has nothing to do with populations, expected values, and other facets of stochastic processes. The sine-wave generation property of a stochastic process is an expected property and can be present even if some sample paths are unable to generate such sine waves. In fact, a stochastic process can have a probability greater than 0 and less than 1 that its square will contain such sinewaves, whereas in the non-population theory, a function either can or cannot generate such sine waves. The amplitude and phase of such a sine wave do not have probability distributions over a population.

Further Discussion of the Choice Between Two Alternative Meanings of Probability

Probability & Statistics and Ergodicity

What does “probability & statistics” mean? These two terms are often used together, but they are two distinct entities. Mathematical statistics is what you get when you use probability theory to model statistics. But probability exists in its own right as an abstract mathematical theory and statistics exists in its own right as a collection of empirical methods for analyzing data. The blend of probability and statistics is a whole that is bigger than the sum of its parts, but those who forget that statistics are empirical and probability is mathematical do so at their own conceptual peril.

To those who dig below the surface in the field of applied mathematical statistics involving time series of data, the following question arises: which of two alternative theories of probability should one apply to the statistics of interest in each application? The answer is that it depends on the statistics of interest. If I am designing a digital communication system and I want the bit-error rate for a received signal over time to be less than 1 bit-decision error in 100 bit-decisions, on average over time, then I want the fraction of time the bit decision is in error for this signal to less than 1/100, which is called the fraction-of-time (FOT) probability of a bit error.

On the other hand, If I am producing a large number of communication systems and I want the number of systems that make bit-decision errors at any arbitrary time to be less than 1 in a 100, on average over the ensemble of systems, then I want the fraction of systems that make errors to be less than 1/100. This is the relative frequency of bit errors, and it converges as the ensemble size grows without bound to the relative-frequency (RF) probability of bit error which is, according to Kolmogorov’s Law of Large numbers, the stochastic probability of the bit-error event. This is a purely theoretical quantity in an abstract mathematical model of an ensemble of signals (one from each system) called a stochastic process.

These two probabilities are distinct and, in general, there is no reason to expect them to equal each other.

Nevertheless, to “make things nice” by having these two probabilities equal each other, the ergodic hypothesis was introduced in studies of time-series data, like communications signals. (Actually, it was borrowed from earlier studies in Physics of dynamical systems of large numbers of particles). The hypothesis is that the limit over an infinitely long period of time of the FOT probability of an event involving a time function such as a signal is equal to the limit over an infinite ensemble of time functions of the RF probability, which in turn equals the abstract stochastic probability. At about the same time that this hypothesis was beginning to become popular, Birkhoff introduced his ergodic theorem which consists of the necessary and sufficient condition on an abstract stochastic process—a mathematical model—for these two probabilities to equal each other for that process.

Because it is typically impossible to prove that the necessary and sufficient condition for ergodicity holds in real-world applications, in practice analysts usually simply invoke the ergodic hypothesis without making any effort to validate it.

A source of confusion by some who invoke the ergodic hypothesis is thinking it is a hypothesis about the real data they are analyzing when, in fact, it is a hypothesis about the mathematical model they have adopted. Confusion surrounding the ergodic hypothesis can be avoided in many applications by first determining what is of primary interest in the application being studied: Is it the behavior of long time averages or the behavior of large ensemble averages? If it is the former, the analysist should simply adopt FOT probability and forget all about stochastic probability and the ergodic hypothesis.

As simple and self-evident as this truth is, some experts indoctrinated in the theory of stochastic processes argue that FOT probability is an abomination that has no place in mathematical statistics. The purpose of this Page 3 is to establish once and for all how absurd this extreme position is by addressing concerns about FOT probability that have been expressed in the past and extinguishing these concerns and associated claims that there is a controversy, through careful conceptualization, mathematical modeling, and straightforward discussion. As explained on this Page 3, there is no basis for controversy; there is simply a need to make a choice between two options for modeling probability in each application of interest.

Yet, there is a wrinkle: before the limit is taken in each of the alternative types of probability, FOT and RF, these quantities are both statistics—they are computed from finite amounts of empirical data. They can be interpreted as estimates of the limiting mathematical quantities, and they can exhibit some of the same properties as the mathematical quantities, but they are statistics, not probabilities. Moreover, the quantity that each converges to is just a number for a given set of statistics from any single execution of the underlying experiment. These quantities are not mathematical models. But the collection of all such numbers obtained from all possible sets of statistics from the repeated trials of the underlying experiment behave according to a probabilistic model. The explanation given here of this wrinkle is probably confusing to those who do not already know what is so tersely stated here. Nevertheless, the purpose of pages 3.1 through 3.6, following the remainder of the narrative below, is to explain the statement here and the equal mathematical footing of the two alternative types of probability in sufficient detail to remove all ambiguity of meaning, thereby putting to rest all hypothetical challenges to the validity of what is said here.

Does it Need Fixing?

Colloquial saying: “If it ain’t broke, don’t fix it”.

Grammarian’s version: “If it isn’t broken don’t attempt to fix it.”

Regardless of how this is verbalized, the problem with how this way of thinking is often misapplied is that “It” IS often broken relative to what could be, but users are so accustomed to it that they don’t realize it could work much better.

Consider, as an example, the technology I used for preparing my doctoral dissertation in the early 1970s. I used an IBM Selectric typewriter and Snopake correction fluid (a fast-drying fluid that is opake and as “white as the driven snow”), which enables the typist to paint over a mistake and then retype on the dried paint (beware of retyping before the paint is dry). I used this same technology for the first two books I wrote in the mid-1980s, after writing several drafts in longhand. It seemed acceptable at the time but, in comparison with the word processing technology I used to prepare this website, it is abundantly clear just how broken that technology was. Of course, adopting the superior word processing technology required the effort to first learn how to operate a personal computer. This learning “hump” that writers needed to get over resulted in many potential benefactors avoiding (actually only postponing) the chore of “coming up to speed” with PCs or Macs (Apple computers). The paradigm shift began for some upon the 1984 release of the first Apple Macintosh system, following the 1976 release of the first Apple computer, and for others it began with the 1989 release of the first Microsoft Word application for PCs. Others began jumping on board throughout the 1990s and by the turn of the Century this paradigm shift was well on its way. Today, we have electronic research journals for which new knowledge need never be recorded on paper. Thankfully someone decided a long time ago that the IBM Selectric Typewrite was indeed broken. The term word processing was actually created way back in 1950 by Ulrech Steinhilper, a German IBM typewriter sales executive with vision.

So it goes with many users of stochastic processes today: they have used this tool for years—since around 1950—and they see it as unbroken and they want no part of coming up to speed on a replacement tool that they believe isn’t needed, even though they do not yet understand this new tool. Unfortunately, ways of thinking are harder to change than is accepting new technology.

The cyclostationarity paradigm shift did not really take off until several years following the publication of the seminal 1987 book [Bk2]. It seems the same is going to be true for the FOT-Probability paradigm shift, with this website playing a role similar to that played by the 1987 book. Interestingly, that book attempted to initiate this shift as well as the shift to cyclostationarity 35 years ago. But apparently, the relearning hump for replacing stochastic processes was found to be too high for many.

An Elevator Speech

An Elevator Speech is a very concise speech about a new business concept that is intended to capture the interest of an investor during the short time he spends with the speaker in an elevator between floors in a building (e.g., on the way to a venture capital office).

I believe most people who learn how to use the stochastic process concept and associated mathematical model tentatively accept the substantial level of abstraction it represents and, as time passes, become increasingly comfortable with that abstractness, and eventually accept it as a necessity and even as reality–something that should not be challenged. It is remarkable that our minds are able to adapt to such abstractions. At the same time, there are costs associated with unquestioning minds that accept such levels of abstraction without convincing themselves that there are no more-concrete alternatives. The position taken at this website is that the effectiveness with which the stochastic process model can be used in practice is limited by its level of abstraction—the typical absence of explicit specifications of both (1) its sample space (ensemble of sample paths) and (2) its probability measure defined on the sample space—and this in turn limits progress in conceiving, designing, and analyzing methods for statistical signal processing on the basis of such signal models.

There is a little-known (today) alternative to the stochastic process, which is much less abstract and, as a consequence, exposes fundamental misconceptions regarding stochastic processes and their use. The removal of the misconceptions that result from adoption of the alternative has enabled the Inventor to make significant advances in the theory and application of cyclostationary processes and more generally in data-adaptive statistical signal processing. Despite these advances, less questioning minds continue to ignore the role that the alternative has played in these advances and continue to try to force-fit the new knowledge into the unnecessarily abstract theory of stochastic processes. The alternative—the invention—is fully specified below on Page 3.1, and its consequential advances in understanding theory and method for random signals are taught on Pages 3.2 and 3.3, where the above generalized remarks are made specific and are proven mathematically. This alternative is called Fraction-of-Time (FOT) Probability.

On Pedagogy

As explained in this section, there are various choices to be made in deciding how best to present the more mathematical details of the theory of Non-Population Probability, and the choices made here are all based on pedagogy, not economics, or technology, or current fads, or idiosyncrasies.

These pedagogical considerations are explained in the following discussion.

children reactions first exposure — First Exposure to Stochastic Processes —
The subject does not come easily, especially for empiricists

The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.

Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.

Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website.

Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp with the continuous-time theory.

Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But the statistics sought with Monte Carlo simulations of stationary and cyclostationary time series can more easily be obtained from time averages on a single record instead of averages over independently produced records. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In fact, commercially available random sequence generators used for Monte Carlo simulations are actually time segments from a single long sequence. Consequently, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.3. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.

For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally trained employees who peter out and hire a new batch.

In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.

FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from most things empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.

BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t – s) (with some abuse of notation due to the use of x to denote two distinct functions). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.3. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.

The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of the subsections of this Page 3. An in-depth tutorial analysis and discussion of the similarities and difference between the classical stochastic process model and the alternative mathematical model based on Wold’s ensemble for stationary processes and Gardner’s complementary ensemble for cyclostationary processes is provided on Page 3.2. Further investigation of the differences between the measure-theoretic foundations for these two alternative approaches to signal modeling is reported on, in tutorial fashion, on Page 3.3. Page 3.4 presents a perspective from the past that is startling in terms of the number of unpleasant issues that arise from stochastic process models of cyclostationarity and identifies some still unsolved problems, Page 3.5 provides a brief outline of the hierarchy, according to the level of empiricism, of statistical and probabilistic models for random signals, and Page 3.6 reproduces a published debate on the pros and cons of these two alternatives for modeling random signals. Unfortunately—as good debates go—the arguments against the FOT probability alternative are shallow, unconvincing, and in places erroneous. One can take this as an indication that opponents of FOT Probability simply do not have a strong position to argue from.

The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically driven work, in which the theory of stochastic processes surfaced, which ran its course of primary development in about 50 years. The mathematically driven development of stochastic processes has continued beyond that initial period, but has centered on primarily nonstationary processes, rather than primarily stationary processes. The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20^th century and extending to the present.

Mathematically Driven Development of Probability Spaces and Stochastic Processes as the Preferred Conceptual/Mathematical Basis for Time Series Analysis (1900-1950)

- Josiah Willard Gibbs (Ensemble Average)
- Henri Leon Lebesgue (Probability Space)
- Maryan von Smoluchowski (Brownian Motion)
- Albert Einstein (Brownian Motion)
- Norbert Wiener (Brownian Motion)
- Aleksandr Jakovlevich Khinchin (Stochastic P.)
- Herman Ole Andreas Wold (Stochastic P.)
- Andrei Nikolaevich Kolmogorov (Stochastic P.)
- Harold Cramer (Stochastic Process)
- Joseph L. Doob (Stochastic Process)

Empirically Driven Development of Time-Series Analysis Methodology (1650-1950)

- Isaac Newton (1642-1727)
- Leonard Euler (1707-1783)
- Joseph Louis Lagrange (1736-1813)
- Christopher H. D. Buys-Ballot (1817-1890)
- George Gabriel Stokes (1819-1903)
- Sir Arthur Schuster (1851-1934)
- John Henry Poynting (1852-1914)
- Albert Abraham Michelson (1852-1931)
- George Udny Yule (1871-1951)
- Evgency Egenievish Slutsky (1880-1948)
- Karl Johann Stumpff (1895-1970)
- Herman Ole Andreas Wold (1908-1992)
- Charles Goutereau (18XX-19XX)
- Norbert Wiener (1894-1964)
- Percy John Daniell (1889-1946)
- Maurice Stevenson Bartlett (1910-2002)
- Ralph Beebe Blackman (1904-1990)

3.1 Fraction-of-Time Probability for Time-Series that Exhibit Cyclostationarity

The following article, FRACTION-OF-TIME PROBABILITY FOR TIME-SERIES THAT EXHIBIT CYCLOSTATIONARITY, Signal Processing, Vol. 23, No. 3, pp. 273-292, by William A Gardner and William A Brown [JP34], was published in 1991, 5 years after this novel probability theory was introduced in the book [Bk2]. Thirty years hence, this article remains the single most complete and easy-to-read accounting of this probability theory aimed at a readership of statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. time-series analysis practitioners. For this reason, it is incorporated here as part of this Page 3, as an encouragement to readers to make this their first detailed encounter with this novel probability theory. In comparison with other worthy sources on this theory, including primarily the originating book [Bk2], the 2006 survey paper [JP64], the 2006 development of a measure-theory foundation [J24], and the most recent and most comprehensive treatment of cyclostationarity in general, the 2019 book [B2], this treatment is both concise and quite complete.

The next two pages, 3.2 and 3.3, strongly complement this Page 3.1 by providing a broad perspective on the pros and cons of the classical stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process approach and the more recently developed fraction-of-time probability approach to conceptualizing and mathematically modeling random Random | adjectiveUnpredictable, but not necessarily modeled in terms of probability and not necessarily stochastic. signals.

However, before embarking on this venture, the reader is referred to the book [Bk5], based on the first workshop on cyclostationarity held in 1992, for a brief introductory discussion of the mathematical origin, as distinct from the historical origin, of Fraction-of-Time Probability. In Section 2.3, pages 11-13 of this book, it is shown in a few simple mathematical steps that each periodic component of any function of a time series can be expressed as a probability-weighted sum—an expected value—of that function, and the probability is the cyclostationary Fraction-of-Time Probability for that time series. The same is shown for the almost periodic component, which is the sum of all periodic components with incommensurate periods, except the cyclostationary Fraction-of-Time Probability is replaced with the almost cyclostationary Fraction-of-Time Probability. No muss, no fuss; just a little elementary calculus. This establishes just how fundamental the Fraction-of-Time Probability is in the study of cyclostationarity. (Clarification of Notation on pages 12, 13 in [Bk5]: the symbol “d to the power n” multiplied by “a function of n variables” is defined to be the product of n differentials of the function, each differential being w.r.t. one of the n variables”.)

Fraction-of-time Probability for Time-Series that Exhibit Cyclostationarity

This concise paper is based on the seminal 1987 book [Bk2]. Following are some quotations reflecting the reaction of leaders in the field to this book when it first came out.

The renowned researcher and author of over 20 books, Enders A. Robinson, wrote the following about the book:

“Professor Gardner has the ability to impart a fresh approach to many difficult problems. . . . His general approach is to go back to the basic foundations and lay a new framework. This gives him a way to circumvent many of the stumbling blocks confronted by other workers . . . he has discovered many avenues of approach which were either not known or neglected in the past. In this way his work more resembles some of the outstanding mathematicians and engineers of the past. . . . William’s success in the approach shows the strength of his engineering insight. He has been able to solve problems that others have left as being too difficult.”

Enders A. Robinson

National Academy of Engineering

Further to this, Robinson wrote a strongly supportive 1990 review of this book that includes the following excerpt:

“This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”

Similarly, the following quotation from Professor Ronald N. Bracewell, 1994 recipient of the Institute of Electrical and Electronics Engineers’ Heinrich Hertz medal for pioneering work in antenna aperture synthesis and image reconstruction as applied to radio astronomy and to computer-assisted tomography, taken from his Foreword to the 1987 book introducing FOT-Probability theory [Bk2], makes essentially the same point that Robinson makes:

“If we are to go beyond pure mathematical deduction and make advances in the realm of phenomena, theory should start from the data. To do otherwise risks failure to discover that which is not built into the model . . . Professor Gardner’s book demonstrates a consistent approach from data, those things which in fact are given, and shows that analysis need not proceed from assumed probability distributions or random processes. This is a healthy approach and one that can be recommended to any reader.”

Ronald Newbold Bracewell, 1921 – 2007

Foreign Associate Member of the Institute of Medicine of the U.S. National Academy of Sciences

As another example, Dr. Akiva Yaglom, Mathematician and Physicist, USSR Academy of Sciences, wrote in his review of the Book published in Theory of Probability and Its Applications:

“It is important . . . that until Gardner’s . . . book was published there was no attempt to present the modern spectral analysis of random processes consistently in language that uses only time-averaging rather than averaging over the statistical ensemble of realizations [of a stochastic process] . . . Professor Gardner’s book is a valuable addition to the literature”

Dr. Akiva Yaglom, 1921 – 2007
Mathematician and Physicist

Member of the USSR Academy of Sciences

A fourth example is the following succinct enthusiastic remark given by Professor James L. Massey, information theorist and cryptographer, Professor of Digital Technology at ETH Zurich, in a prepublication book review in 1986:

“I admire the scholarship of this book and its radical departure from the stochastic process bandwagon of the past 40 years.”

Professor James L. Massey, 1934 – 2013

Member of the National Academy of Engineering and the Royal Swedish Academy of Sciences

As a final example, Professor Thomas Kailath of Stanford University, member of the National Academy of Engineering, the US National Academy of Sciences, the American Academy of Arts and Sciences, the Indian National Academy of Engineering, and the Silicon Valley Engineering Hall of Fame, wrote the following about the 1987 book:

“It is always hard to go against the established order, but I am sure that the book will have a considerable impact. It will be a definitive text on spectral analysis.” — Professor Thomas Kailath.”

Professor Thomas Kailath, 1935 –
US Medal of Science, IEEE Medal of Honor, Fellow of IEEE, Fellow of Institute of Mathematical Statistics, Past President of IEEE Information Theory Group
3.2 Transitioning Away from Stochastic Process Models
For purposes of developing intuition and possibly deeper understanding regarding the FOT-Probability model introduced on Page 3.1, in contrast to the conventional stochastic process model, the reader is referred to the following article:

Click on the window to see all pages

This article was written and submitted as a feature article for the IEEE Signal Processing Magazine in March 2022, because it is appropriate to bring this conceptual problem-solving progress to the attention of the broad readership of this magazine, which 32 years earlier published the landmark paper EXPLOITATION OF SPECTRAL REDUNDANCY IN CYCLOSTATIONARY SIGNALS, introducing this proposed paradigm shift. But it was rejected by the Editor for Feature Articles, Dr. Laure Blanc-Féraud, because she thought it would not be of interest to today’s readership of this magazine. This exemplifies the uphill battle to get people to open their minds to different ways of thinking, once they have been indoctrinated in some other particular way of thinking—a topic addressed in some detail here on Page 7. This issue is further pursued on Page 3.3, where the mathematical pros and cons of these two alternative types of models are discussed in detail.

Even though journal editors, sharing Blanc-Feraud’s attitude of dismissal of the importance of educating readers about FOT-Probability are common (cf. page 3.6), there are those who “get it”, as exemplified by the highly respected researcher and editor, the late P.E. Doak, as illustrated below in correspondence between us. The above article was published in June 2023. The highlights of this article are as follows:
- - Comparison of two alternative generic stochastic process models for data analysis and inference
  - The standard model is relatively abstract, and the new model is better suited to non-population data
  - Mathematical and Pragmatic pros and cons are exposed
  - Continued support of a paradigm shift is urged
Phillip Ellis Doak, 1921 -2011

Founding Editor, Journal of Sound and Vibration, and Editor in Chief for 40 years

Quotation from 8 March 1990 letter to Professor Gardner:

“In my latter years, I have become more and more convinced of the validity of his [Percy W. Bridgman, Nobel Prize Laureate] outlook. Not only can ergodic mathematical concepts put students off, indeed I now believe that for physical scientists and engineers, they are “operationally erroneous”, and dangerous to mental health. Interpreting observations through ergodic spectacles is to misinterpret what the observations really mean. Not only does it confuse the issue, but also it inhibits the development of one’s intellectual capacity to ask the right questions about what the data means. Thus, in design, development, and research it is a model of reality which is counterproductive in respect to generating concepts which can lead to real progress in the real world.”

This perspective is aligned with that of Professor Enders A. Robinson, recipient of the Society of Exploration Geophysicists’ Maurice Ewing Medal, originator of the digital revolution in geophysics, and highest honored scientist in the field of geophysics, quoted on page 3.1 but repeated (in part) here:

Enders A. Robinson, 1930 – 2022

Member, National Academy of Engineering

“This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”
3.3 Advancing beyond the Need for Signal Models Requiring Unjustified Assumptions

For manmade signals, such as those typically encountered in communications and signals intelligence systems, applied R&D in statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. signal processing is typically based on formulaic signal models specified by explicit mathematical formulas containing deterministic functions of time, and—to be brief—individual random variables, sequences of independent random variables, and standard stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. processes, such as stationary Gaussian or uniformly distributed processes. In such cases, the user can sometimes derive mathematical properties that will be useful in mathematical analysis, such as derivations of solutions to statistical inference and decision-making optimization problems. But this is often beyond the scope of specific applications and, as a result, assumptions about the models are typically made without justification. Sometimes, very broad but unproven justifications are used. For example, the analyst may assume a specified model satisfies the axiomatic definition of a Kolmogorov stochastic process. Examples of properties of the probability measure defining a Kolmogorov process include sigma-additivity (additivity of countably infinite numbers of terms), sigma-linearity (linearity of an operator applied to a linear combination of a countably infinite number of terms), and joint-measurability of two or more processes, which is necessary for the existence of joint probability density functions. These properties of a specific model are often not verifiable, despite their being assumed to hold. While this is common practice, it does not follow the scientific method that should guide all science and engineering.

As an alternative to this expedient but unsavory practice, we consider throughout this multi-section Page 3 the alternative to the Kolmogorov Model of random signals called the Fraction-of-Time Probability Model of random signals. The remainder of this section is a copy of a recently published tutorial paper [JP66], [J44] on the mathematical pros and cons of these two alternative types of models. The objective is to promote the FOT-Probability model as a superior alternative for many applications involving statistical time-series analysis.

Click on the window to see all pages
3.4 A Perspective from the Past

To complement the recently written articles presented on Pages 3.2 and 3.3, this page consists of a set of slides used for Section IV of the opening Plenary Lecture for the first international Workshop on Cyclostationarity. To repeat an explanation regarding the other sections of this plenary lecture given on Page 2, some readers may wonder why this is appropriate considering that this workshop was held 30 years ago! (in 1992). I consider this appropriate because I developed these slides specifically for a broad group of highly motivated students. I say they were students solely because they traveled from far and wide specifically to attend this educational program. In fact, the participants of the workshop were mostly senior researchers in academia, industry, and government laboratories. Knowing the workshop was a success and knowing all the topics covered are as important today as they were then, I have chosen this presentation as ideal for the purposes of this website. In particular, although theoretical comparison of stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity has advanced considerably since the Workshop 30 years ago, as explained on Pages 3.2 and 3.3, especially considering the progress made on measure-theoretic considerations of these two alternative theories, some of the questions raised in 1992, particularly those involving stochastic-process models and surrounding the concept of cycloergodicity, are still not fully answered.

The unavoidable absence of detail in the presentation slides for Sec. IV presented below is made up for, to the extent that progress has been achieved in the ensuing 30 years, throughout this Page 3 and the sources linked to herein.

Click on the window to see all pages

Because the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity summarized in this Section IV of the Plenary Lecture is a relatively technical subject, it is recommended that students consider this section to be only a concise overview and that they follow up on it with Chapter 1 in the book [Bk5]. This chapter not only describes the duality between the stochastic and nonstochastic theories of cyclostationarity, but also derives the nonstochastic FOT-probabilistic theory from an inquiry into the nature of the property of time functions that is responsible for the defining characteristic of cyclostationarity: that finite-strength sine waves can be generated from cyclostationary functions by subjecting the functions to time-invariant nonlinear transformations. This inquiry leads naturally to the definitions of cyclic probabilistic moments and cyclic probability distributions and, more generally, cyclic expectation; and, in Chapter 2 of the book [Bk5], cyclic probabilistic cumulants. This is to be contrasted with the stochastic theory of cyclostationarity in which these key probabilistic quantities are simply posited on the basis of mathematical considerations only, with not even a mention of generating sine waves, which is a key characteristic of the physical manifestation of cyclostationarity.

The direct relevance of this discussion to the primary subject of this website is the claim herein that science and engineering were done great harm by mathematicians’ hard sell of the stochastic process model to the exclusion of the non-stochastic time-series model that came before.

With a brief look ahead at Page 7, one can surmise that this hard sell reflects inadequate Right-Brain (RB) activity which would have been required to reveal the absence of a necessity to use such unrealistic and overly abstract models—something that has unnecessarily burdened teachers and students alike, and of course practicing engineers and scientists, with the challenge to each and every one of them to bring to bear the considerable RB activity required to make sense of the huge conceptual gap between the reality from nature of a single time-series of measured/observed data and the mathematical fiction of a typically-infinite ensemble of hypothetical time-series together with a probability law (a mathematical creation) governing the ensemble average over all the fictitious time series. All these unsuspecting individuals were left to close this conceptual gap on their own, being armed with nothing more than a mathematical theorem, which only rarely can be applied in practice, that gives the mathematical condition on a stochastic process model under which its ensemble averages equal (in an abstract sense; i.e., with probability equal to 1) the time averages over individual time-series in the ensemble. This condition on the probability law ensures that expected values of a proposed stochastic process mathematically calculated (a Left-Brain (LB) activity) from the mathematical model equal time averages measured from a single time-series member of the ensemble, assumed to be the times series that actually exists in practice. But this equality imposes another condition, namely that we mathematically take the limit of the time average as the amount of averaging time approaches infinity. Thus, the theorem—called the Ergodic Theorem—doesn’t actually address reality, because one never has an infinitely long segment of time-series data. Moreover, the theorem is of little-to-no operational utility because the condition on the probability law can only rarely be tested for a given specific stochastic process model. Thus, most users of stochastic process theory rely conceptually on what is called the Ergodic Hypothesis by which one simply assumes the condition of the Ergodic Theorem is satisfied for whatever stochastic process model one chooses to work with. Faith of this sort has no place in science and engineering.

In my opinion, acceptance of all this gibberish and going forward with the stochastic process concept as the only option for mathematically modeling real time-series data requires abandonment of RB thinking. There really is no way to justify this abstraction of reality as a necessary evil. The fraction-of-time probabilistic model of single times series is an alternative option that avoids departing so far from the reality of measured/observed time-series data, its empirical statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. analysis, the mathematical modeling of the time-series, and the results of the analysis. The wholesale adoption by academicians of the stochastic process foisted upon them by mathematicians suggests these academicians, as well as the mathematicians, suffer from low-level RB activity. These general remarks are backed up by the detailed mathematical comparative analyses presented in other sections of this Page 3.
3.5 The Hierarchy of Non-Stochastic Probabilistic Models of Time Series

3.5.1 Introduction

Statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. metrics for time series such as mean, bias, variance, coefficient of variation, covariance, and correlation coefficient can be defined using finite-time averages as replacements for expected values in well-known probabilistic metrics. These statistical metrics also can be arrived at from nothing more than a little thought, without any reference to probability or expected value. In fact, many of these statistical metrics were in use long before the probabilistic theory of stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. processes was developed.

In the book [Bk2], such non-probabilistic statistical metrics are used for statistical spectral analysis. The resultant theory for understanding how to perform and study statistical spectral analysis is the lowest level in a hierarchy of non-stochastic theories of statistical spectral analysis and, more generally, time-series analysis. This level is referred to as the purely empirical non-probabilistic theory. It is quite adequate for many applications.

The next level up in the hierarchy is referred to as the purely empirical FOT-probabilistic theory, where FOT stands for Fraction-of-Time. The model upon which this theory is based is introduced in the presentation below. The third and highest level in the hierarchy is referred to as the non-stochastic FOT-probabilistic theory. This theory is fully developed in the book [Bk2]. The model is an asymptote of the model for the Finite-Time Theory described below. This asymptotic model can be approached as closely as one desires with the Finite-Time Model if enough time series data is available, but it cannot be reached exactly and still be empirical.

In subsection 3.5.2, the terms purely empirical, probabilistic, and non-stochastic are defined and the three individual levels of the hierarchy are defined and illustrated. The following material was mostly presented at the 2021 On-Line Grodek Conference on Cyclostationarity, but it is an improved version of that presentation.

Page 3.5 is concluded with a brief derivation in subsection 3.5.3 illustrating how the finite-time cyclostationary and poly-cyclostationary expectation operators are defined for finite segments of data, and another brief illustration in subsection 3.5.4 of how non-probabilistic expectation operators on finite data segments are defined and an explanation of the relationship between these and signal subspace methods of statistical inference. Both concepts use projections that are more general than those which function as constant-component extraction operators and periodic component extraction operators, both of which are probabilistic expectations.

3.5.2 Purely Empirical FOT-Probability Theory for Modeling and Analysis of Time Series that are Stationary (S), Cyclostationary (CS), and Poly-Cyclostationary (PCS)

The presentation slides presented immediately below address the mathematical foundation and framework, developed by the WCM, for statistical time-series analysis based on statistical functions such as correlations, higher-order moments, and cumulative probability distributions, without involving the abstract mathematical model of a stochastic process. The purpose is to facilitate conceptualization and practical application. The application addressed is statistical spectral correlation analysis.

Click on the window to see all pages

3.5.3 Modification of the FOT-Probability Theory of CS and Poly-CS Time Series from Infinitely Long Data Records to Finite Segments

The process by which the models of CS and Poly-CS time series are modified to render them applicable to data on finite-time intervals instead of infinite-time intervals is explained immediately below and supported with mathematical definitions.

Click on the window to see all pages

3.5.4 Subspace Signal Processing and Empirical Nonstationary FOT Expectation Operators

The topic for this page is addressed immediately below:

Click on the window to see all pages
3.6 Published Debate: Stochastic Process vs FOT-Probability Model
The 1987 book, Statistical Spectral Analysis: A Nonprobabilistic Theory, argues for more judicious use of the modern stochastic-process-model (arising from the work of mathematicians in the 1930s, such as Khinchin, Kolmogorov, and others) instead of the more realistic predecessor: the time-series model first developed mathematically by Norbert Wiener in 1930 (see also page 59 of Wiener 1949, written in 1942, regarding the historical relationship between his and Kolmogorov’s approaches), that was briefly revisited in the 1960s by engineers before it was buried by mathematicians. The brief tongue-in-cheek essay Ensembles in Wonderland, published in IEEE Signal Processing Magazine, AP Forum, 1994 and reproduced below, is an attempt at satirizing the outrage typified by narrow-minded thinkers exemplified by two outspoken skeptics, Neil Gerr and Melvin Hinich, who wrote scathing remarks and a book review characterizing this book as utter nonsense. (Page 7.6 offers an explanation for the behavior of these two naysayers in terms of weak right-brain thinking.)

But first, let us consider the parallel to the book Alice in Wonderland; the following is comprised of excerpts taken from https://en.wikipedia.org/wiki/Alice’s_Adventures_in_Wonderland : Martin Gardner and other scholars have shown the book Alice in Wonderland [written by Lutwidge Dodgson under the pseudonym Lewis Carroll] to be filled with many parodies of Victorian popular culture. Since Carroll was a mathematician at Christ Church, it has been argued that there are many references and mathematical concepts in both this story and his later story Through the Looking Glass; examples include what have been suggested to be illustrations of the concept of a limit, number bases and positional numeral systems, the converse relation in logic, and the ring of integers modulo a specific integer. Deep abstraction of concepts, such as non-Euclidean geometry, abstract algebra, and the beginnings of mathematical logic, was taking over mathematics at the time Alice in Wonderland was being written (the 1860s). Literary scholar Melanie Bayley asserted in the magazine New Scientist that Alice in Wonderland in its final form was written as a scathing satire on new modern mathematics that was emerging in the mid-19th century.

Today, Dodgson’s satire appears to be backward looking because, after all, there are strong arguments that modern mathematics has triumphed. Coming back to the topic of interest here, stochastic processes also have triumphed in terms of being wholly adopted in mathematics and science and engineering, except for a relatively small contingent of empirically-minded scientists and engineers. Yet, recent mathematical arguments, described in tutorial fashion on pages 3.2.and 3.3 and further supported with references cited there, provide a sound logical basis for reversing this outcome, especially when the overwhelming evidence of practical, pragmatic, pedagogic, and overarching conceptual advantages provided in the 1987 book and expanded on pages 3.2 and 3.3 here, is considered. The present dominance of the more abstract and less realistic stochastic process theory might be viewed as an example of the pitfalls of what has become known as groupthink or the inertia of human nature that resists changes in thinking, which is discussed in considerable detail based on numerous historical sources on Page 7.

Before presenting the several letters comprising the debate, including the standalone article “Ensembles in Wonderland”, the final letter to SP Forum in the debate is reproduced here first to provide hindsight, especially for interpreting “Ensembles in Wonderland”. The bracketed text, e.g., [text], below was added to the published debate specifically for this book to enhance clarity.

3.6.1 Preliminary Material

July 2, 1995 (published in Nov 1995)

To the Editor:

Introduction

This is my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], and carried on by Mr. Neil Gerr through his letters to SP Forum.

In this letter, I supplement my previous remarks aimed at clarifying the precariousness of Hinich’s and Gerr’s position by explaining the link between my argument in favor of the utility of fraction-of-time (FOT) probability and the subject of a plenary lecture delivered at ICASSP ’94. In the process of discussing this link I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition–that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework–simply cannot be defended if argument is to be based on fact and logic.

David J. Thomson’s Transcontinental Waveguide Problem

To illustrate that the stochastic-process conceptual framework is often applied to physical situations where the time-average framework is a more natural choice, I have chosen an example from D. J. Thomson’s recent plenary lecture on the project that gave birth to the multiple-window method of spectral analysis [2]. The project that was initiated back in the mid-1960s was to study the feasibility of a transcontinental millimeter waveguide for a telecommunications transmission system potentially targeted for introduction in the mid-1980s. It was found that accumulated attenuation of a signal propagating along a circular waveguide was directly dependent on the spectrum of the series, indexed by distance, of the erratic diameters of the waveguide. So, the problem that Thomson tackled was that of estimating the spectrum for the more than 4,000-mile-long distance-series using a relatively small segment of this series that was broken into a number of 30-foot long subsegments. (It would take more than 700,000 such 30-foot sections to span 4,000 miles.) The spectrum had a dynamic range of over 100 dB and contained many periodic components, indicating the unusual challenge faced by Thomson.

When a signal travels down a waveguide (at the speed of light) it encounters the distance-series [consisting of the distances traveled as time progresses]. Because of the constant velocity, the distance-series is equivalent to a time-series. Similarly, the series of diameters that is measured for purposes of analysis is—due to the constant effective velocity of the measurement device—equivalent to a time-series [of measurements]. So, here we have a problem where there is one and only one long time-series of interest (which is equivalent to a distance-series)—there is no ensemble of long series over which average characteristics are of interest and, therefore, there is no obvious reason to introduce the concept of a stochastic process. That is, in the physical problem being investigated, there was no desire to build an ensemble of transcontinental waveguides. Only one (if any at all) was to be built, and it was the spectral density of distance-averaged (time-averaged) power of the single long distance-series (time-series) that was to be estimated, using a relatively short segment, not the spectral density of ensemble-averaged power. Similarly, if one wanted to analytically characterize the average behavior of the spectral density estimate (the estimator mean) it was the average of a sliding estimator over distance (time), not the average over some hypothetical ensemble, that was of interest. Likewise, to characterize the variability of the estimator, it was the distance-average squared deviation of the sliding estimator about its distance-average value (the estimator variance) that was of interest, not the variance over an ensemble. The only apparent reason for introducing a stochastic process model with its associated ensemble, instead of a time-series model, is that one might have been trained to think about spectral analysis of erratic data only in terms of such a conceptual artifice and might, therefore, have been unaware of the fact that one could think in terms of a more suitable alternative that is based entirely on the concept of time averaging over the single time-series. (Although it is true that the time-series segments obtained from multiple 30 ft. sections of waveguide could be thought of as independent random samples from a population, this still does not motivate the concept of an ensemble of infinitely long time-series–a stationary stochastic process. The fact remains that, physically, the 30-foot sections represent subsegments of one long time-series in the communications system concept that was being studied.) [And even if Mr. Thomson was aware of the fact that one could conceptualize the problem entirely in terms of time averages, he had good reason to fear that this approach would be off-putting to his readers all of whom were likely indoctrinated only in statistical spectral analysis theory couched in terms of stochastic processes—an unfortunate situation].

It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process (the model adopted by Thomson) except to accommodate lack of familiarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework. Somehow, he does not recognize the mental gyrations required to force this and other physical problems into the stochastic process framework.

Gerr’s Letter

Having explained the link between my argument in favor of the utility of FOT probability and Thomson’s work, let us return to Gerr’s letter. Mr. Gerr, in discussing what he refers to as “a battle of philosophies,” states that I have erred in likening skeptics to religious fanatics. But in the same paragraph we find him defensively trying to convince his readers that the “statistical/probabilistic paradigm” has not “run out of gas” when no one has even suggested that it has. No one, to my knowledge, is trying to make blanket negative statements about the value of what is obviously a conceptual tool of tremendous importance (probability) and no one is trying to denigrate statistical concepts and methods. It is only being explained that interpreting probability in terms of the fraction-of-time of occurrence of an event is a useful concept in some applications. To argue, as Mr. Gerr does again in the same paragraph, that in general this concept “has no obvious advantages” and using it is “like building a house without power tools: it can certainly be done, but to what end?” is, as I stated in my previous letter, to behave like a religious fanatic — one who believes there can be only One True Religion. This is a very untenable position in scientific research.

As I have also pointed out in my previous letter, Mr. Gerr is not at all careful in his thinking. To illustrate his lack of care, I point out that Gerr’s statement “Professor Gardner has chosen to work within the context of an alternative paradigm [fraction-of-time probability]”, and the implications of this statement in Gerr’s following remarks, completely ignore the facts that I have written entire books and many papers within the stochastic process framework, that I teach this subject to my students, and that I have always extolled its benefits where appropriate. If Mr. Gerr believes in set theory and logic, then he would see that I cannot be “within” paradigm A and also within paradigm B unless A and B are not mutually exclusive. But he insists on making them mutually exclusive, as illustrated in the statement “From my perspective, developing signal processing results using the fraction-of-time approach (and not probability/statistics) … .” (The parenthetical remark in this quotation is part of Mr. Gerr’s statement.) Why does Mr. Gerr continue to deny that the fraction-of-time approach involves both probability and statistics?

Another example of the lack of care in Mr. Gerr’s thinking is the convoluted logic that leads him to conclude “Thus, spectral smoothing of the biperiodogram is to be preferred when little is known of the signal a priori.” As I stated in my previous letter, it is mathematically proven* in [1] that the frequency smoothing and time averaging methods yield approximately the same result. Gerr has given us no basis for arguing that one is superior to the other and yet he continues to try to make such an argument. And what does this have to do with the utility of the fraction-of-time concept anyway? These are data processing methods; they do not belong to one or another conceptual framework.

To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing referred to above was first derived by using the fraction-of-time conceptual framework [1]. If there is no conceptual advantage to this framework, why wasn’t such a fundamental result derived during the half century of research based on stochastic processes that preceded [1]? The second example is taken from the first attempt to develop a theory of higher-order cyclostationarity for the conceptualization and solution of problems in communication system design. In [3], it is shown that a fundamental inquiry into the nature of communication signals subjected to nonlinear transformations led naturally to the fraction-of-time probability concept and to a derivation of the cumulant as the solution to a practically motivated problem. This is, to my knowledge, the first derivation of the cumulant. In all other work, which is based on stochastic processes (or non-fraction-of-time probability) and which dates back to the turn of the century, cumulants are defined, by analogy with moments, to be coefficients in an infinite series expansion of a transformation of the probability density function (the characteristic function), which has some useful properties. If there is no conceptual advantage to the fraction-of-time framework, why wasn’t the cumulant derived as the solution to the above-mentioned practical problem or some other practical problem using the orthodox stochastic-probability framework?

Conclusion

Since no one in the preceding year has entered the debate to indicate that they have new arguments for or against the philosophy and corresponding theory and methodology presented in [1], it seems fair to proclaim the debate closed. The readers may decide for themselves whether the resolution put forth in [1] was defeated or was upheld.

But regarding the skeptics, I sign off with a humorous anecdote:

When Mr. Fulton first showed off his new invention, the steamboat, skeptics were crowded on the bank, yelling ‘It’ll never start, it’ll never start.’

It did. It got going with a lot of clanking and groaning and, as it made its way down the river, the skeptics were quiet.

For one minute.

Then they started shouting. ‘It’ll never stop, it’ll never stop.’

— William A. Gardner

* A more detailed and tutorial proof of this fundamental equivalence is given in the article “The history and the equivalence of two methods of spectral analysis,” Signal Processing Magazine, July 1996, No.4, pp.20 – 23, which is copied into the Appendix farther down this Page.

References
1. W. A. Gardner. Statistical Spectral Analysis: A Nonprobabilistic Theory. Prentice-Hall, Englewood Cliffs, NJ, 1987.
2. D. J. Thomson. “An Overview of Multiple-window and quadratic-inverse spectrum estimation methods,” Plenary Lecture, Proceedings of 1994 International Conference on Acoustics, Speech and Signal Processing, pp. VI-185 – VI-194.
3. W. A. Gardner and C. M. Spooner. “The Cumulant Theory of Cyclostationary time-series, Part I: Foundation,” IEEE Transactions on Signal Processing, Vol. 42, December 1994, pp. 3387-3408.
Excerpts from earlier versions of above letter to the editor before it was condensed for publication:

April 15, 1995

Introduction

In this, my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], I shall begin by addressing two remarks in the opening paragraph of Mr. Neil Gerr’s last letter (in March 1995 SP Forum). In the first remark, Mr. Gerr suggests that the “bumps and bruises” he sustained by venturing into the “battle” [debate] were to be expected. But I think that such injuries could have been avoided if he had all the relevant information at hand before deciding to enter the debate. This reminds me of a story I recently heard:

Georgios and Melvin liked to hunt. Hearing about the big moose up north, they went to the wilds of Canada to hunt. They had hunted for a week, and each had bagged a huge moose. When their pilot Neil landed on the lake to take them out of the wilderness, he saw their gear and the two moose. He said, “I can’t fly out of here with you, your gear, and both moose.”

“Why not?” Georgios asked.

“Because the load will be too heavy. The plane won’t be able to take off.”

They argued for a few minutes, and then Melvin said, “I don’t understand. Last year, each of us had a moose, and the pilot loaded everything.”

“Well,” said Neil, “I guess if you did it last year, I can do it too.”

So, they loaded the plane. It moved slowly across the lake and rose toward the mountain ahead. Alas, it was too heavy and crashed into the mountain side. No one was seriously hurt and, as they crawled out of the wreckage in a daze, the bumped and bruised Neil asked, “Where are we?”

Melvin and Georgios surveyed the scene and answered, “Oh, about a mile farther than we got last year.”

If Mr. Gerr had read the book [1] and put forth an appropriate level of effort to understand what it was telling him, he would have questioned Mr. Hinich’s book review and would have seen that the course he was about to steer together with the excess baggage he was about to take on made a crash inevitable.

A friend of mine recently offered me some advice regarding my participation in this debate. “Why challenge the status quo”, he said, “when everybody seems happy with the way things are.” My feeling about this is summed up in the following anecdote:

“Many years ago, a large American shoe manufacturer sent two sales reps out to different parts of the Australian outback to see if they could drum up some business among the aborigines. Sometime later, the company received telegrams from both agents.

The first one said. ‘No business. Natives don’t wear shoes.’

The second one said, ‘Great opportunity here–natives don’t wear shoes.'”

Another friend asked “why spend your time on this [debate] when you could be solving important problems.” I think Albert Einstein answered that question when he wrote:

“The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science”

This underscores my belief that we are overemphasizing “engineering training” in our university curricula at the expense of “engineering science.” It is this belief that motivates my participation in this debate. Instead of plodding along in our research and teaching with the same old stochastic process model for every problem involving time-series data, we should be looking for new ways to think about time-series analysis.

In the second remark in Mr. Gerr’s opening paragraph, regarding my response to Mr. Gerr’s October 1994 SP Forum letter in sympathy with “Hinich’s gleefully vicious no-holds-barred review” of [1], Mr. Gerr says “Even by New York standards, it [my response] seemed a bit much.” Well, I guess I was thinking about what John Hancock said, on boldly signing the Declaration of Independence:

There, I guess King George will be able to read that!

Like the King of England who turned a deaf ear to the messages coming from the new world, orthodox statisticians, like Messrs. Hinich and Gerr who are mired in tradition seem to be hard of hearing–a little shouting might be needed to get through to them.

Nevertheless, I am disappointed to see no apparent progress, on Mr. Gerr’s part, in understanding the technical issues involved in his and Hinich’s unsupportable position that the time-average framework for statistical signal processing has, and I quote Gerr’s most recent letter, “no obvious advantages.” I hasten to point out, however, that this most recent position is a giant step back from the earlier even more indefensible position taken by Hinich in his book review, reprinted in April 1994 SP Forum, where much more derogatory language was used.

In this letter, I make a final attempt to clarify the precariousness of Hinich’s and Gerr’s position by explaining links between my arguments and the subjects of two plenary lectures delivered at ICASSP ’94. In the process of discussing these links and this paper, I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition, that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework, simply cannot be defended if argument is to be based on fact and logic.

Lotfi Zadeh and Fuzzy Logic

I wish that Mr. Gerr would let go of the fantasy about “the field where the Fraction-of-Timers and Statisticians do battle.” There do not exist two mutually exclusive groups of people—one of which can think only in terms of fraction-of-time probability and the other of which call themselves Statisticians. How many times and in how many ways does this have to be said before Mr. Gerr will realize that some people are capable of using both fraction-of-time probability and stochastic process concepts, and of making choices between these alternatives by assessing the appropriateness of each for each particular application? Mr. Gerr’s “battle” of “fraction-of-time versus probability/statistics” simply does not exist. This insistence on a dichotomy of thought is strongly reminiscent of the difficulties some people have had accepting the proposition that the concept of fuzziness is a useful alternative to the concept of probability. The vehement protests against fuzziness are for most of us now almost laughable.

To quote Professor Lotfi Zadeh in his recent plenary lecture [2]

“[although fuzzy logic] offers an enhanced ability to model real-world phenomena…[and] eventually fuzzy logic will pervade most scientific theories…the successes of fuzzy logic have also generated a skeptical and sometimes hostile reaction…Most of the criticisms directed at fuzzy logic are rooted in a misunderstanding of what it is and/or a lack of familiarity with it.”

I would not suggest that the time-average approach to probabilistic modeling and statistical inference is as deep a concept, as large a departure from orthodox thinking, or as broadly applicable as is fuzzy logic, but there are some definite parallels, and Professor Zadeh’s explanation of the roots of criticism of fuzzy logic applies equally well to the roots of criticism of the time-average approach as an alternative to the ensemble-average or, more accurately, the stochastic-process approach. In the case of fuzzy logic, its proponents are not saying that one must choose either conventional logic and conventional set theory or their fuzzy counterparts as two mutually exclusive alternative truths. Each has its own place in the world. Those opponents who argue vehemently that the unorthodox alternative is worthless can be likened to religious fanatics. This kind of intolerance should have no place in science. But it is all too commonplace and it has been so down through the history of science. So surely, one cannot expect to find its absence in connection with the time-average approach to probabilistic modeling and statistical inference. Even though experimentalists in time-series analysis (including communication systems analysis and other engineered-systems analysis) have been using the time-average approach (to various extents) for more than half a century, there are those like Gerr and Hinich who “see no obvious advantages.” This seems to imply that Mr. Gerr has one and only one interpretation of a time-average measurement on time series data—namely an estimate of some random variable in an abstract stochastic process model. To claim that this mathematical model is, in all circumstances, the preferred one is just plain silly.

David J. Thomson and the Transcontinental Waveguide –addition to published discussion:

[It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process except to accommodate unfamiliarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework.] It is correct in this case that a sufficiently capable person would obtain the same result using either framework, but it is incorrect to not recognize the mental gyrations required to force this physical problem into the stochastic process framework. My claim—and the reason I wrote the book [1]—is that our students deserve to be made aware of the fact that there are two alternatives. It is pigheaded to hide this from our students and force them to go through the unnecessary and sometimes confusing mental gyrations required to force-fit the stochastic process framework to real-world problems where it is truly an unnecessary and, possibly, even inappropriate artifice.

Gerr’s Letter—addition to published letter:

To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing, whose proof is outlined in the Appendix at the end of this letter, was first derived by using the fraction-of-time conceptual framework [1].

An Illustration of Blinding Prejudice

To further illustrate the extent to which Mr. Gerr’s prejudiced approach to scientific inquiry has blinded him, I have chosen one of his research papers on the subject of cyclostationary stochastic processes. In [5], Mr. Gerr (and his coauthor) tackle the problem of detecting the presence of cyclostationarity in an observed time-series. He includes an introduction and references sprinkled throughout that tie his work to great probabilists, statisticians, and mathematicians. (We might think of these as the “Saints” in Mr. Gerr’s One True Religion.) This is strange, since his paper is nothing more than an illustration of the application of a known statistical test (and a minor variation thereof) to synthetic data. It is even more strange that he fails to properly reference work that is far more relevant to the problem of cyclostationarity detection. But I think we can see that there is no mystery here. The highly relevant work that is not cited is authored by someone who champions the value of fraction-of-time probabilistic concepts. The fact that the relevant publications (known to Gerr) actually use the stochastic process framework apparently does not remove Mr. Gerr’s blinders. All he can see–it would seem–is that the author is known to argue (elsewhere) that the stochastic process framework is not always the most appropriate one for time-series analysis, and this is enough justification for Mr. Gerr to ignore the highly relevant work by this “heretic” author (author of the book [1] that Hinich all but said should be burned).

To be specific, Mr. Gerr completely ignores the paper [6] (published 1-1/2 years prior to the submission of Gerr’s paper) and the book [7] (published 4 years prior) wherein the problem of cyclostationarity detection is tackled using maximum-likelihood [6], maximum-signal-to-noise ratio [6], [7], and other optimality criteria, all of which lead to detection statistics that involve smoothed biperiodograms (and that also identify optimal smoothing) which are treated by Gerr as if they were ad hoc. Mr. Gerr also cites a 1990 publication (which does not appear in his reference list) that purportedly shows that the integrated biperiodogram (cyclic periodogram) equals the cyclic mean square value of the data (cf. (12)); but this is a special case of the much more useful result, derived much earlier than 1990, that the inverse Fourier transform of the cyclic periodogram equals the cyclic correlogram. The argument, by example, that Gerr proffers to show that (12) (the cyclic correlogram at zero lag) is sometimes a good test statistic and sometimes a bad one is trivialized by this Fourier transform relation (cf. [1]) and the numerous mathematical models for data for which the idealized quantities (cyclic autocorrelations, and cyclic spectral densities) in this relation have been explicitly calculated (cf. [1], [7]). These models include, as special cases, the examples that Gerr discusses superficially. The results in [1], [7] show clearly when and why the choice of zero lag made by Gerr in (12) is a poor choice. As another example, consider Mr. Gerr’s offhand remark that a Mr. Robert Lund (no reference cited) “has recently shown that for the current example (an AM signal with a square wave carrier) only lines [corresponding to cycle frequencies] spaced at even multiples of d=8 [the reciprocal of the period of the carrier] will have nonzero spectral (rz) measure.” This result was established in a more general form many years earlier in his coauthor’s Ph.D. dissertation (as well as in [1]) where one need only apply the extremely well-known fact that a symmetrical square wave contains only odd harmonics.

To go on, the coherence statistic that Gerr borrows from Goodman for application to cyclostationary processes has been shown in [7] to be nothing more than the standard sample statistic for the standard coherence function (a function of a single frequency variable) for two processes obtained from the one process of interest by frequency-shifting data transformations–except for one minor modification; namely, that time-averaged values of expected values are used in place of non-averaged expected values in the definition of coherence because the processes are asymptotically mean stationary, rather than stationary. Therefore, the well-known issues regarding frequency smoothing in these cross-spectrum statistics need not be discussed further, particularly in the haphazard way this is done by Gerr, with no reliance on analysis of specific underlying stochastic process models.

Continuing, the incoherent average (13) proposed by Gerr for use with the coherence statistic is the only novel contribution of this paper, and I claim that it is a poor statistic. The examples used by Gerr show that this “incoherent statistic” outperforms the “coherent statistic,” but what he does not recognize is that he chose the wrong coherent statistic for comparison. He chose the cyclic correlogram with zero lag (12), which is known to be a poor choice for his examples. For his example in Figure 9, zero lag produces a useless statistic, whereas a lag equal to T/2 is known to be optimum, and produces a “coherent statistic” that is superior to Gerr’s incoherent statistic. Thus, previous work [1], [7] suggests that a superior alternative to Gerr’s incoherent statistic is the maximum over a set of lag-indexed coherent statistics.

Finally, Mr. Gerr’s vague remarks about choosing the frequency-smoothing window-width parameter M are like stabs in the dark by comparison with the thorough and careful mathematical analysis carried out within–guess what–the time-average conceptual framework in [1] in which the exact mathematical dependence of bias and variance of smoothed biperiodograms on the data-tapering window shape, the spectral-smoothing window shape, and the ideal spectral correlation function for the data model are derived, and in which the equivalence between spectral correlation measurement and conventional cross-spectrum measurement is exploited to show how conventional wisdom [1, chapter 5, 7] applies to spectral correlation measurement [1,chapters 11, 13, 15].

In summary, Gerr’s paper is completely trivialized by previously published work of which he was fully aware. What appears to be his choice to “stick his head in the sand” because the author of much of this earlier highly relevant work was not a member of his One True Religion exemplifies what Gerr is trying to deny. Thus, I repeat it is indeed appropriate to liken those (including Gerr) who Gerr would like to call skeptics to religious fanatics who are blinded by their faith.

Conclusion

In closing this letter, I would like to request that Mr. Gerr refrain from writing letters to the editor on this subject. To say, as he does in his last letter, “There are many points on which Professor Gardner and I disagree, but only two that are worthy of further discussion,” is to try to worm his way out of the debate without admitting defeat. I claim to have used careful reasoning to refute beyond all reasonable doubt every point Mr. Gerr (and Mr. Hinich) has attempted to make. Since he has shown that he cannot provide convincing arguments based on fact and logic to support his position, he should consider the debate closed. To sum up the debate:

– The resolution, cited in the introductory section of my 2 July 1995 letter to the editor, in contrapositive form, was made by myself in [1].

– The resolution was challenged by Hinich and defended by myself in April 1994 SP Forum.

– Hinich’s challenge was supported and my defense was challenged by Gerr in October 1994 SP Forum.

– Gerr’s arguments were challenged by myself in January 1995 SP Forum.

– Gerr defended his arguments in March 1995 SP Forum.

– Gerr’s presumably-final defense was challenged and the final arguments in support of the resolution are made by myself in this letter.

APPENDIX from July 2, 1995 letter to Editor (published in Nov 1995)

– Proof of Equivalence Between Time-Averaged and Frequency-Smoothed Cyclic Periodograms

History and Equivalence of Two Methods of Spectral Analysis

Published in IEEE SIGNAL PROCESSING MAGAZINE, July 1996

The purpose of this article is to present a brief history of two methods of spectral analysis and to present, in a tutorial fashion, the derivation of the deterministic relationship that exists between these two methods

History

Two of the oldest and currently most popular methods of measuring statistical (average) power spectral densities (PSD’s) are the frequency smoothing method (FSM) and the time averaging method (TAM). The FSM was thought to have originated in 1930 with Norbert Wiener’s work on generalized harmonic analysis [1], and to have been rediscovered in 1946 by Percy John Daniell [2]. But it was discovered only a few years ago (cf. [3]) that Albert Einstein had introduced the method in 1914 [4]. The currently popular method of deriving the FSM begins by showing that adjacent frequency bins in the periodogram have approximately the same correct mean values and the same large variances, and are approximately uncorrelated with each other. Then, it is observed that averaging these bins together retains the correct mean value, while reducing the variance.

The TAM is often attributed to a 1967 paper by P.D. Welch in the IEEE Transactions on Audio and Electroacoustics [5], but in fact the earliest known proposal of the TAM was by Maurice Stevenson Bartlett in 1948 [6]. The reasoning behind the TAM is similar to that for the FSM: the periodograms on adjacent segments of a data record have approximately the same correct mean values and the same large variances, and they are approximately uncorrelated with each other. Therefore, averaging them together will retain the correct mean value, while reducing the variance. (A more detailed historical account of the FSM, TAM, and other methods is given in [7].) Essentially, every spectral analysis software package available today includes either the FSM or the TAM, or both, often in addition to others. These other methods include, for example, the Fourier transformed tapered autocorrelation method, attributed to Ralph Beebe Blackman and John Wilder Tukey [8] (but used as early as 1898 by Albert A. Michelson [9]); and various model fitting methods that grew out of pioneering work by George Udny Yule in 1927 [10] and Gilbert Walker in 1931 [11].

It is well known that both the FSM and the TAM yield PSD estimates that can be made to converge to the exact PSD in some probabilistic sense, like in mean square as the length of the data record processed approaches infinity, However, it is much less commonly known that these two methods are much more directly related to each other. The pioneering methods due to Michelson, Einstein, Wiener, Yule, and Walker were all introduced without knowledge of the concept of a stochastic process. But starting in the 1950s (based on the work of mathematicians such as Khinchin, Wold, Kolmogorov, and Cramér in the 1930s and 1940s , the stochastic-process point of view essentially took over. It appears as though this mathematical formalism, in which analysts focus on calculating means and variances and other probabilistic measures of performance, delayed the discovery of the deterministic relationship between the FSM and TAM for about 40 years. That is, apparently it was not until the non-stochastic approach to understanding statistical (averaged) spectral analysis was revived and more fully developed in [7] that a deterministic relationship between these two fundamental methods was derived.

The next section presents, in a tutorial fashion, the derivation of the deterministic relationship between the FSM and TAM, but generalized from frequency-smoothed and time-averaged versions of the periodogram to same for the biperiodogram (also called the cyclic periodogram [7]). This deterministic relationship is actually an approximation of the time-averaged biperiodogram (TAB) by the frequency-smoothed biperiodogram (FSB) and, of course, vice versa. For evidence of the limited extent to which this deterministic relationship is known, the reader is referred to letters that have appeared in the SP Forum section of this magazine in the October 1994, January 1995, March 1995, and November 1995 issues.

Equivalence

Definitions

Let $a(t)$ be a data-tapering window satisfying $a(t)=0$ for $|t|>T / 2$ , let $r_{a}(\tau)$ be its autocorrelation

$r_{a}(\tau)=\int_{-T / 2}^{T / 2} a(t+\tau / 2) a(t-\tau / 2) d t$

and let $A(f)$ be its Fourier transform

$A(f)=\int_{-T / 2}^{T / 2} a(t) e^{-i 2 \pi ft} d t$

Let $X_{a}(t, f)$ be the sliding (in time $t$ ) complex spectrum of data $x(t)$ seen through window $a$

$X_{a}(t, f)=\int_{-T / 2}^{T / 2} a(w) x(t+w) e^{-i 2 \pi f(t+w)} d w$

Similarly, let $b(t)$ be a rectangular window of width $V$ , centered at the origin, and let $X_{b}(t, f)$ be the corresponding sliding complex spectrum (without tapering). Also, let $R_{a}^{\alpha}(t, \tau)$ be the sliding cyclic correlogram for the tapered data

$\begin{aligned} R_{a}^{\alpha}(t, \tau)=\int_{-(T-| \tau |) / 2}^{(T-| \tau |) / 2} a(v+\tau / 2) x(t+[v+\tau / 2]) \cdot \\ a(v-\tau / 2) x(t+[v-\tau / 2]) e^{-i 2 \pi \alpha(t+v)} d v \end{aligned}$

and let $R_{b}^{\alpha}(t, \tau)$ be the sliding cyclic correlogram without tapering

$R_{b}^{\alpha}(t, \tau)=\frac{1}{V} \cdot \int_{-(V-| \tau |) / 2}^{(V+| \tau |) / 2} x(t+[v+\tau / 2]) x(t+[v-\tau / 2]) \cdot e^{-2 \pi \alpha(t+v)} d v$

To complete the definitions, let $S_{a} (t ; f_{1}, f_{2})$ and $S_{b} (t ; f_{1}, f_{2})$ be the sliding biperiodograms (or cyclic periodograms) for the data $x(t)$

$S_{a} (t ; f_{1}, f_{2})=\frac{1}{T} X_{a}(t, f_{1}) X_{a}^{*}(t, f_{2})$

$S_{b} (t ; f_{1}, f_{2})=\frac{1}{V} X_{b} (t, f_{1}) X_{b}^{*} (t, f_{2})$

Derivation

It can be shown (using $\alpha=f_{1}-f_{2}$ ) that (cf. [7, Chapter 11])

$\begin{aligned} &\frac{1}{V} \int_{-V / 2}^{V / 2} S_{a}\left(t-u ; f_{1}, f_{2}\right) d u \\ &=\frac{1}{V} \int_{-V / 2}^{V / 2} \int_{-T}^{T} R_{a}^{\alpha}(t-u, \tau) d u e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau d u \\ &=\int_{-T}^{T} \frac{1}{V} \int_{-V / 2}^{V / 2} R_{a}^{\alpha}(t-u, \tau) d u e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau \\ & \cong \int_{-T}^{T} R_{b}^{\alpha}(t, \tau) r_{a}(\tau) e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau \\ &=\int_{-\infty}^{\infty} S_{a}\left(t ; f_{1}-g, f_{2}-g\right) \frac{1}{T}|A(g)|^{2} d g \end{aligned}$

The above approximation, namely

$\frac{1}{V} \int_{-V / 2}^{V / 2} R_{a}^{\alpha}(t-u, \tau) d u \cong R_{b}^{\alpha}(t, \tau) r_{a}(\tau)$

for $|\tau| \leqslant T$ , becomes more accurate as the inequality $V \gg T$ grows in strength (assuming that there are no outliers in the data near the edges of the $V$ -length segment, cf. exercise 1 in [7, Chapt. 3] exercise 4b in [7, Chapt. 5], and Section B in [7, Chapt. 11]). For example, if the data is bounded by $M$ , $|x(t)| \leqslant M$ , and $a(t) \geqslant 0$ , then it can be shown that the error in this approximation is worst-case bounded by $r_{a}(\tau) M^{2} T / V$ . The first and last equalities above are simply applications of the cyclic-periodogram/cyclic-correlogram relation first established in [7, Chapter 11] together with the convolution theorem (which is used in the last equality).

Interpretation

The left-most member of the above string of equalities (and an approximation) is a biperiodogram of tapered data seen through a sliding window of length $T$ and time-averaged over a window of length $V$ . If this average is discretized, then we are averaging a finite number of biperiodograms of overlapping subsegments over the $V$ -length data record. (It is fairly well known that little is gained – although nothing but computational efficiency is lost – by overlapping segments more than about 50 percent.) The right-most member of the above string is a biperiodogram of un-tapered data seen through a window of length $V$ and frequency-smoothed along the anti-diagonal $g=\left(f_{1}+f_{2}\right) / 2$ , using a smoothing window $(1 / T)|A(g)|^{2}$ , for each fixed diagonal $\alpha=f_{1}-f_{2}$ . Therefore, given a $V$ -length segment of data, one obtains approximately the same result, whether one averages biperiodograms on subsegments (TAM) or frequency smoothes one biperiodogram on the undivided segment (FSM). Given $V$ , the choice of $T$ determines both the width of the frequency smoothing windows in FSM and the length of the subsegments in TAM. Given $V$ and choosing $T \ll V$ , one can choose either of these two methods and obtain approximately the same result (barring outliers within $T$ of the edges of the data segment of length $V$ . By choosing $f_{1}=f_{2}$ (i.e., $\alpha = {0}$ ), we see the biperiodograms reduce to the more common periodograms, and the equivalence then applies to methods of estimation of power spectral densities, rather than bispectra. Bispectra are also called cyclic spectral densities and spectral correlation functions [7]. As first proved in [7], the FSM and TAM spectral correlation measurements converge to exactly the same quantity, namely, the limit spectral correlation function (when it exists), in the limit as $V \rightarrow \infty$ and $T \rightarrow \infty$ , in this order. Further this limit spectral correlation function, also called the limit cyclic spectral density, is equal to the Fourier transform of the limit cyclic autocorrelation, as first proved in [7], where this relation is called the cyclic Wiener relation because it generalizes the Wiener relation between the PSD and autocorrelation from $\alpha = {0}$ to $\alpha \neq 0$

$S_{x}^{\alpha}(f)= \int R_{x}^{\alpha}(\tau) e^{-i 2 \pi f t} d \tau$

where

$R_{x}^{\alpha}(\tau) \triangleq \lim _{T \rightarrow \infty} R_{a}^{\alpha}(t, \tau)$

$S_{x}^{\alpha}(f) \triangleq \lim _{T \rightarrow \infty} \lim _{V \rightarrow \infty} \frac{1}{V} \int_{-V / 2}^{V / 2} S_{a}\left(t-u ; f_{1}, f_{2}\right) d u$

with $\alpha=f_{1}-f_{2}$ .

In the special circumstance where the inequality $T \ll V$ cannot be satisfied because of the degree of spectral resolution (smallness of $1 / T$ , that is required, there is no known general and provable argument that either method is superior to the other. It has been argued [e.g., by Gerr] that, since the TAM involves time averaging, it is less appropriate than the FSM for nonstationary data. The results presented here, however, show that, for $T \ll V$ , neither the TAM nor the FSM is more appropriate than the other for nonstationary data. And, when $T \ll V$ is not satisfied, there is no known evidence that favors either method for nonstationary data.

The derivation of the approximation between the FSM and TAM presented here uses a continuous-time model. However, a completely analogous derivation of an approximation between the discrete-time FSM and TAM is easily constructed. When the spectral correlation function is being measured for many values of the frequency-separation parameter, $\alpha$ , the TAM, modified to what is called the FFT accumulation method (FAM), is much more computationally efficient than the FSM implemented with an FFT [12].

William A. Gardner
Professor, Department of Electrical and Computer Engineering
University of California,
Davis, CA.

References
1. Wiener, N., “Generalized harmonic analysis,” Acta Mathematika, Vol. 55, pp. 117-258, 1930.
2. Daniell, P. J., “Discussion of ‘On the theoretical specification and sampling properties of autocorrelated time-series’,” J Royal Statistic. Soc., Vol. 8B, No. 1, pp 27-97, 1946.
3. Gardner, W. A., “Introduction to Einstein’s contribution to time-series analysis,” IEEE Signal Processing Magazine, Vol. 4, pp. 4-5, 1987.
4. Einstein, A., “Méthode pour la détermination de valeurs statistiques d’observations concernant des grandeurs sourmises à des fluctuations irrégulières,” Archives des Sciences Physiques et Naturelles, Vol. 37, pp. 254-256, 1914.
5. Welch, P. D., “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Transactions on Audio and Electroacoustics, Vol. AU-15, pp. 70-73, 1967.
6. Bartlett, M. S., “Smoothing periodograms from time-series with continuous spectra,” Nature, Vol. 161, pp. 686-687, 1948.
7. Gardner, W. A., Statistical Spectral Analysis: A Nonprobabilistic Theory. Englewood Cliffs, NJ: Prentice-Hall, 1987.
8. Blackman, R. B. and J. W. Tukey, The Measurement of Power Spectra, New York: AT&T, 1958 (Also New York: Dover, 1959).
9. Michelson, A. A. and S. W. Stratton, “A new harmonic analyzer,” American Journal of Science, Vol. 5, pp. 1-13, 1898.
10. Yule, G. U., “On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers,” Phil. Trans. Royal Soc: London A, Vol. 226, pp. 267-298, 1927.
11. Walker, G., “On periodicity in series of related terms,” Proceedings of the Royal Society, Vol. 131, pp. 518-532, 1931.
12. Roberts, R. S., W. A. Brown, and H. H. Loomis, Jr., “Computationally efficient algorithms for cyclic spectral analysis,” IEEE Signal Processing Magazine, Vol. 8, pp. 38-49, 1991.
3.6.2 The debate

This section is comprised of the following letters to the editor of IEEE Signal Processing Magazine:

1 – Apr 1994, pp. 14, 16 (reprint in SP Magazine of Hinich’s book review in SIAM Review (1991), pp. 677-678)

2 – Apr 1994, pp. 16, 18, 20, 22, 23 (Gardner’s Comments including Ensembles in Wonderland)

3 – Oct 1994, p. 12 (Gerr’s comments)

4 – Jan 1995, pp. 12, 14 (Gardner’s comments in response to Gerr)

5 – Mar 1995, p. 16 (Gerr’s comments—2^nd try)

6 – Jul 1995, pp. 19 – 21 Gardner’s final response (reproduced at beginning of page 3.4.1 above)

3.6.3 Final Words

Considering all of the evidence presented on this page 3, if this is the best the naysayers have to offer in support of their proposition in this debate, then their performance in this debate is pitiful.

No one since this debate has picked up the mantle from the opposition; rather, a deafening silence has been accompanied by business as usual—almost all contributors to journals in statistical signal processing continue to use stationary and cyclostationary and almost cyclostationary stochastic process models as if there were no alternative. The comprehensive study of the relative advantages and disadvantages of population probability models (stochastic processes or ensembles of persistent functions of time) and non-population probability models (single persistent functions of time), comprised of a series of solid peer-reviewed journal papers over a period of decades and several solid books by multiple independent and collaborating authors with impeccable records of contributions to the field, is apparently not seen or not believed or dismissed as irrelevant to goals of personal advancement of research-paper authors. This poor scholarship has cost both the educational and research communities in retardation of progress.

I believe the silence about the debate is not because the issue is unimportant to the community’s effort to advance knowledge and understanding in the field of statistical signal processing, but rather because there is absolutely no reasonable defense of the opposition’s proposition, which is simply that non-population probability is an abomination that has no rightful place in statistics in general. Not one single argument in support of this thesis has been offered by the opposition!

That being said, the reason for there not having been a movement in academia reflected in curriculum-change adoptions of the proposed paradigm shift from population probability models only to a mix, determined by application, of population and non-population probability models is most likely an exemplary sign of the times: to a large extent across swaths of academia, scholarship, in the sense of study of the origins and history of ideas, is mostly dead; the pursuit of truth has become a foreign concept; and the name of the degree doctor of philosophy is often no longer relevant. Updates in curricula appear to be driven primarily if not exclusively by advances in technology affecting, for example, computational capacity. The original Greek meaning of philosophy in this title is “love of wisdom” and the word doctor comes from the Latin word for teach. I used to tell my students that once you have earned the degree of Doctor of Philosophy, you should know your area of expertise well enough to philosophize about it. The name of the degree today has little to do with today’s requirements for award of the degree. Typical curricula in Science, Technology, Engineering, and Mathematics today consist more of training than education. This means these curricula lean heavily toward addressing development of only the left brain, not the right brain (cf. page 7). This probably disqualifies most PhDs today from participating meaningfully in scientific debate.
3.7 When is the Use of Probability Unscientific?
Question: When is the use of probability unscientific?

Answer: When the axioms of probability are, in the particular setting of interest, unscientific.

Explanation: The probability axiom that assumes the existence of a sample space is unscientific when such a sample space does not exist in the setting of interest.

Examples:
1. In the setting of time-series data analysis in studies of astrophysics, the assumption of a sample space comprised of many statistically identical universes is unscientific, because there exists no scientific evidence that supports the existence of more than a single universe. Stated another way, the assumption of a sample space of universes is unfalsifiable and is therefore said to be pseudoscientific.
2. Even in the far more limited setting of time series analysis in studies of our planet Earth, there is no scientific evidence that there exists a sample space of planets that are statistically identical to Earth.
3. Focusing down even further, in studies of oceanographic time-series data, there is no large sample space of oceans.
4. Even in the more constrained settings of time series analysis for some manmade systems, such as a) a mechanical machine for which time series of machine vibrations are to be studied, or b) an electronic machine for which times series of electrical signals are to be studied, if there is no large (theoretically infinite) ensemble of such systems, the assumption that such an ensemble exists is merely a hypothesis generally not expected to be verifiable and is therefore an unscientific axiom.
It is clear from the above straightforward observations that the practice of probabilistic analysis is in many if not most fields of study not a scientific endeavor. It is an otherworldly mental activity. The reason that probabilistic analysis pervades so many fields of scientific study is perhaps the fact that there are indeed many settings in which populations of entities, which can reasonably be modeled as sample spaces, do exist—a prime example being sub-populations, selected for specific traits, of humans in studies of medicine. Another example is high-volume manufacturing, which produces many instances of a product, the collection of which can reasonable by modeled as a sample space. But the existence of some applications of probabilistic analysis that can be said to be scientific does not condone the present essentially universal use of probabilistic analysis throughout the many fields of study as if it were a scientific practice in all cases.

Although it is not uncommon for people to not “believe in” probability, this rejection would appear to be based mostly on intuition, not a realization of the fact that a key axiom of probability is in many settings unscientific. Speaking of common beliefs, science itself often suffers from lack of credibility among common people. It can be argued that this is largely a result of scientists’ abuse of the scientific method, greatly exacerbated by rampant stretching of the truth by marketers of unproven potentially scientific hypotheses for the sole purpose of making money, truth be damned.

Perhaps an increased awareness of the nonscientific nature of probability in many settings would serve to reduce abuses of the scientific method, which one would expect to benefit science in general and thereby benefit humanity.

Interestingly, there is an alternative concept of probability that does not suffer from an axiom that is often unscientific, and this alternative applies specifically to studies involving time-series data analysis. It is called Fraction-of-Time (FOT) probability and is the focus of this entire page 3.

3. Ensemble Statistics, Stochastic Processes, and the Fraction-of-Time Probability Alternative

Briefing Summary: Fraction-of-Time Probability vs. Stochastic Process Probability

Main Themes

Historical Development

Call for Change in Education

Key Quotes and Supporting Arguments

Conclusion

History of a Paradigm Shift Between Two Alternative Meanings of Probability

Narrative Explanation of Two Alternative Meanings of Probability

Further Discussion of the Choice Between Two Alternative Meanings of Probability

Probability & Statistics and Ergodicity

Does it Need Fixing?

An Elevator Speech

On Pedagogy

Mathematically Driven Development of Probability Spaces and Stochastic Processes as the Preferred Conceptual/Mathematical Basis for Time Series Analysis (1900-1950)

Empirically Driven Development of Time-Series Analysis Methodology (1650-1950)

3.5.1 Introduction

3.5.2 Purely Empirical FOT-Probability Theory for Modeling and Analysis of Time Series that are Stationary (S), Cyclostationary (CS), and Poly-Cyclostationary (PCS)

3.5.3 Modification of the FOT-Probability Theory of CS and Poly-CS Time Series from Infinitely Long Data Records to Finite Segments

3.5.4 Subspace Signal Processing and Empirical Nonstationary FOT Expectation Operators

3.6.1 Preliminary Material

3.6.2 The debate

3.6.3 Final Words