Table Of Contents

3. Ensemble Statistics, Stochastic Processes, and the Fraction-of-Time Probability Alternative

Introductory Narrative

Probability & Statistics and Ergodicity

What does “probability & statistics” mean? These two terms are often used together, but they are two distinct entities. Mathematical statistics is what you get when you use probability theory to model statistics. But probability exists in its own right as an abstract mathematical theory and statistics exists in its own right as a collection of empirical Empirical Data | noun Numerical quantities derived from observation or measurement in contrast to those derived from theory or logic alone.methods for analyzing data. The blend of probability and statistics is a whole that is bigger than the sum of its parts, but those who forget that statistics are empirical and probability is mathematical do so at their own conceptual peril.

To those who dig below the surface in the field of applied mathematical statistics involving time series of data, the following question arises: which of two alternative theories of probability should one apply to the statistics of interest in each application? The answer is that it depends on the statistics of interest. If I am designing a digital communication system and I want the bit-error rate for a received signal over time to be less than 1 bit-decision error in 100 bit-decisions, on average over time, then I want the fraction of time the bit decision is in error for this signal to less than 1/100, which is called the fraction-of-time (FOT) probability of a bit error.

On the other hand, If I am producing a large number of communication systems and I want the number of systems that make bit-decision errors at any arbitrary time to be less than 1 in a 100, on average over the ensemble of systems, then I want the fraction of systems that make errors to be less than 1/100. This is the relative frequency of bit errors, and it converges as the ensemble size grows without bound to the relative-frequency (RF) probability of bit error which is, according to Kolmogorov’s Law of Large numbers, the stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables.probability of the bit-error event. This is a purely theoretical quantity in an abstract mathematical model of an ensemble of signals (one from each system) called a stochastic process.

These two probabilities are distinct and, in general, there is no reason to expect them to equal each other.

Nevertheless, to “make things nice” by having these two probabilities equal each other, the ergodic hypothesis was introduced in studies of time-series data, like communications signals. (Actually, it was borrowed from earlier studies in Physics of dynamical systems of large numbers of particles). The hypothesis is that the limit over an infinitely long period of time of the FOT probability of an event involving a time function such as a signal is equal to the limit over an infinite ensemble of time functions of the RF probability, which in turn equals the abstract stochastic probability. At about the same time that this hypothesis was beginning to become popular, Birkhoff introduced his ergodic theorem which consists of the necessary and sufficient condition on an abstract stochastic process—a mathematical model—for these two probabilities to equal each other for that process. 

Because it is typically impossible to prove that the necessary and sufficient condition for ergodicity holds in real-world applications, in practice analysts usually simply invoke the ergodic hypothesis without making any effort to validate it.  

A source of confusion by some who invoke the ergodic hypothesis is thinking it is a hypothesis about the real data they are analyzing when, in fact, it is a hypothesis about the mathematical model they have adopted. Confusion surrounding the ergodic hypothesis can be avoided in many applications by first determining what is of primary interest in the application being studied: Is it the behavior of long time averages or the behavior of large ensemble averages? If it is the former, the analysist should simply adopt FOT probability and forget all about stochastic probability and the ergodic hypothesis.

As simple and self-evident as this truth is, some experts indoctrinated in the theory of stochastic processes argue that FOT probability is an abomination that has no place in mathematical statistics. The purpose of this Page 3 is to establish once and for all how absurd this extreme position is by addressing concerns about FOT probability that have been expressed in the past and extinguishing these concerns and associated claims that there is a controversy, through careful conceptualization, mathematical modeling, and straightforward discussion. As explained on this Page 3, there is no basis for controversy; there is simply a need to make a choice between two options for modeling probability in each application of interest.

Yet, there is a wrinkle: before the limit is taken in each of the alternative types of probability, FOT and RF, these quantities are both statistics—they are computed from finite amounts of empirical data. They can be interpreted as estimates of the limiting mathematical quantities, and they can exhibit some of the same properties as the mathematical quantities, but they are statistics, not probabilities. Moreover, the quantity that each converges to is just a number for a given set of statistics from any single execution of the underlying experiment. These quantities are not mathematical models. But the collection of all such numbers obtained from all possible sets of statistics from the repeated trials of the underlying experiment behave according to a probabilisticProbabilistic | noun Based on the theoretical concept of probability, e.g., a mathematical model of data comprised of probabilities of occurrence of specific data values. model. The explanation given here of this wrinkle is probably confusing to those who do not already know what is so tersely stated here. Nevertheless, the purpose of pages 3.1 through 3.6, following the remainder of the narrative below, is to explain the statement here and the equal mathematical footing of the two alternative types of probability in sufficient detail to remove all ambiguity of meaning, thereby putting to rest all hypothetical challenges to the validity of what is said here.

Further Comment

Colloquial saying: “If it ain’t broke, don’t fix it”.

Grammarian’s version: “If it isn’t broken don’t attempt to fix it.”

Regardless of how this is verbalized, the problem with how this way of thinking is often misapplied is that “It” IS often broken relative to what could be, but users are so accustomed to it that they don’t realize it could work much better.

Consider, as an example, the technology I used for preparing my doctoral dissertation in the early 1970s. I used an IBM Selectric typewriter and Snopake correction fluid (a fast-drying fluid that is opake and as “white as the driven snow”), which enables the typist to paint over a mistake and then retype on the dried paint (beware of retyping before the paint is dry). I used this same technology for the first two books I wrote in the mid-1980s, after writing several drafts in longhand. It seemed acceptable at the time but, in comparison with the word processing technology I used to prepare this website, it is abundantly clear just how broken that technology was. Of course, adopting the superior word processing technology required the effort to first learn how to operate a personal computer. This learning “hump” that writers needed to get over resulted in many potential benefactors avoiding (actually only postponing) the chore of “coming up to speed” with PCs. The paradigm shift began for some upon the 1984 release of the first Apple Macintosh system, following the 1976 release of the first Apple computer, and for others it began with the 1989 release of the first Microsoft Word application for PCs. Others began jumping on board throughout the 1990s and by the turn of the Century this paradigm shift was well on its way. Today, we have electronic research journals for which new knowledge need never be recorded on paper. Thankfully someone decided a long time ago that the IBM Selectric Typewrite was indeed broken. The term word processing was actually created way back in 1950 by Ulrech Steinhilper, a German IBM typewriter sales executive with vision. 

So it goes with many users of stochastic processes today: they have used this tool for years—since around 1950—and they see it as unbroken and they want no part of coming up to speed on a replacement tool that they believe isn’t needed, even though they do not yet understand this new tool. Unfortunately, ways of thinking are harder to change than is accepting new technology.

The cyclostationarity paradigm shift did not really take off until several years following the publication of the seminal 1987 book [Bk2]. It seems the same is going to be true for the FOT-Probability paradigm shift, with this website playing a role similar to that played by the 1987 book. Interestingly, that book attempted to initiate this shift as well as the shift to cyclostationarity 35 years ago. But apparently, the relearning hump for replacing stochastic processes was found to be too high for many.

FOT Probability—An Elevator Speech1

I believe most people who learn how to use the stochastic process concept and associated mathematical model tentatively accept the substantial level of abstraction it represents and, as time passes, become increasingly comfortable with that abstractness, and eventually accept it as a necessity and even as reality–something that should not be challenged. It is remarkable that our minds are able to adapt to such abstractions. At the same time, there are costs associated with unquestioning minds that accept such levels of abstraction without convincing themselves that there are no more-concrete alternatives. The position taken at this website is that the effectiveness with which the stochastic process model can be used in practice is limited by its level of abstraction—the typical absence of explicit specifications of both (1) its sample space (ensemble of sample paths) and (2) its probability measure defined on the sample space—and this in turn limits progress in conceiving, designing, and analyzing methods for statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. signal processing on the basis of such signal models. 

There is a little-known (today) alternative to the stochastic process, which is much less abstract and, as a consequence, exposes fundamental misconceptions regarding stochastic processes and their use. The removal of the misconceptions that result from adoption of the alternative has enabled the Inventor to make significant advances in the theory and application of cyclostationary processes and more generally in data-adaptive statistical signal processing. Despite these advances, less questioning minds continue to ignore the role that the alternative has played in these advances and continue to try to force-fit the new knowledge into the unnecessarily abstract theory of stochastic processes. The alternative—the invention—is fully specified below on Page 3.1, and its consequential advances in understanding theory and method for random Random | adjectiveUnpredictable, but not necessarily modeled in terms of probability and not necessarily stochastic. signals are taught on Pages 3.2 and 3.3, where the above generalized remarks are made specific and are proven mathematically. This alternative is called Fraction-of-Time (FOT) Probability.


1 An Elevator Speech is a very concise speech about a new business concept that is intended to capture the interest of an investor during the short time he spends with the speaker in an elevator between floors in a building (e.g., on the way to a venture capital office).

 

Theme: For stationary and cyclostationary time series, a wrong turn in their mathematical modeling was taken almost a century ago. Today, Academia should engage in remediation to overcome the detrimental influence on the teaching and practice of time-series analysis in Science and Engineering.

Universality: This same theme is beginning to play out in the field of economics, as distinct from the various fields of science and engineering. This parallel trend speaks to the universality of the relevance of the proposed paradigm shift initiated in 1987. The parallels in recommendations going forward are, in fact, remarkable as illustrated in the two articles explaining the area of study called Ergodicity Economics: 1) and 2). However, the driving objective in Ergodicity Economics is to focus analysis on time averages of a single time series, rather than expected values over an ensemble of a non-ergodic process, whereas in the applications fields of interest focused on in this website, the driving objective is to focus analysis on time averages of single time series, because this is a more elegant way to proceed compared with introducing ergodic process models and using expected values. 

The objective of this page is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the more abstract theory of Stochastic Processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Discussion of the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.

children reactions first exposure
First Exposure to Stochastic Processes —
The subject does not come easily, especially for empiricists

The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.

Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.

Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website. 

Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp with the continuous-time theory.

Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But the statistics sought with Monte Carlo simulations of stationary and cyclostationary time series can more easily be obtained from time averages on a single record instead of averages over independently produced records. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In fact, commercially available random sequence generators used for Monte Carlo simulations are actually time segments from a single long sequence. Consequently, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.3. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.

For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally trained employees who peter out and hire a new batch.

In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.

FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from anything empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.

BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t s) (with some abuse of notation due to the use of x to denote two distinct functions). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.3. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.

The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of this Page 3. An in-depth tutorial analysis and discussion of the similarities and difference between the classical stochastic process model and the alternative mathematical model based on Wold’s ensemble for stationary processes and Gardner’s complementary ensemble for cyclostationary processes is provided on Page 3.2. Further investigation of the differences between the measure-theoretic foundations for these two alternative approaches to signal modeling is reported on, in tutorial fashion, on Page 3.3. Page 3.4 presents a perspective from the past and identifies some still unsolved problems, Page 3.5 provides a brief outline of the hierarchy, according to the level of empiricism, of statistical and probabilistic models for random signals, and Page 3.6 reproduces a published debate on the pros and cons of these two alternatives for modeling random signals. Unfortunately—as good debates go—the arguments against the FOT probability alternative are shallow, unconvincing, and in places erroneous. One can take this as an indication that opponents of FOT Probability simply do not have a strong position to argue from.

The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically driven work, in which the theory of stochastic processes surfaced, which ran its course in about 50 years. The mathematically driven development of stochastic processes has continued beyond that initial period, but has centered on primarily nonstationary processes, rather than primarily stationary processes. The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20th century and extending to the present.

Mathematically Driven Development of Probability Spaces and Stochastic Processes
as the Preferred Conceptual/Mathematical Basis for Time Series Analysis (1900-1950)
    • Josiah Willard Gibbs (Ensemble Average)
    • Henri Leon Lebesgue (Probability Space)
    • Maryan von Smoluchowski (Brownian Motion)
    • Albert Einstein (Brownian Motion)
    • Norbert Wiener (Brownian Motion)
    • Aleksandr Jakovlevich Khinchin (Stochastic P.)
    • Herman Ole Andreas Wold (Stochastic Process)
    • Andrei Nikolaevich Kolmogorov (Stochastic P.)
    • Harold Cramer (Stochastic Process)
    • Joseph L. Doob (Stochastic Process)
Empirically Driven Development of Time-Series Analysis Methodology (1650-1950)
    • Isaac Newton (1642-1727)
    • Leonard Euler (1707-1783)
    • Joseph Louis Lagrange (1736-1813)
    • Christopher H. D. Buys-Ballot (1817-1890)
    • George Gabriel Stokes (1819-1903)
    • Sir Arthur Schuster (1851-1934)
    • John Henry Poynting (1852-1914)
    • Albert Abraham Michelson (1852-1931)
    • George Udny Yule (1871-1951)
    • Evgency Egenievish Slutsky (1880-1948)
    • Karl Johann Stumpff (1895-1970)
    • Herman Ole Andreas Wold (1908-1992) 
    • Charles Goutereau (18XX-19XX)
    • Norbert Wiener (1894-1964)
    • Percy John Daniell (1889-1946)
    • Maurice Stevenson Bartlett (1910-2002)
    • Ralph Beebe Blackman (1904-1990)
  • 3.1 Fraction-of-Time Probability for Time-Series that Exhibit Cyclostationarity

    The following article, FRACTION-OF-TIME PROBABILITY FOR TIME-SERIES THAT EXHIBIT CYCLOSTATIONARITY, Signal Processing, Vol. 23, No. 3, pp. 273-292, by William A Gardner and William A Brown [JP34], was published in 1991, 5 years after this novel probability theory was introduced in the book [Bk2]. Thirty years hence, this article remains the single most complete and easy-to-read accounting of this probability theory aimed at a readership of statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. time-series analysis practitioners. For this reason, it is incorporated here as part of this Page 3, as an encouragement to readers to make this their first detailed encounter with this novel probability theory. In comparison with other worthy sources on this theory, including primarily the originating book [Bk2], the 2006 survey paper [JP64], the 2006 development of a measure-theory foundation [J24], and the most recent and most comprehensive treatment of cyclostationarity in general, the 2019 book [B2], this treatment is both concise and quite complete.

    The next two pages, 3.2 and 3.3, strongly complement this Page 3.1 by providing a broad perspective on the pros and cons of the classical stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process approach and the more recently developed fraction-of-time probability approach to conceptualizing and mathematically modeling random Random | adjectiveUnpredictable, but not necessarily modeled in terms of probability and not necessarily stochastic. signals.

    However, before embarking on this venture, the reader is referred to the book [Bk5], based on the first workshop on cyclostationarity held in 1992, for a brief introductory discussion of the mathematical origin, as distinct from the historical origin, of Fraction-of-Time Probability. In Section 2.3, pages 11-13 of this book, it is shown in a few simple mathematical steps that each periodic component of any function of a time series can be expressed as a probability-weighted sum—an expected value—of that function, and the probability is the cyclostationary Fraction-of-Time Probability for that time series. The same is shown for the almost periodic component, which is the sum of all periodic components with incommensurate periods, except the cyclostationary Fraction-of-Time Probability is replaced with the almost cyclostationary Fraction-of-Time Probability. No muss, no fuss; just a little elementary calculus. This establishes just how fundamental the Fraction-of-Time Probability is in the study of cyclostationarity. (Clarification of Notation on pages 12, 13 in [Bk5]: the symbol “d to the power n” multiplied by “a function of n variables” is defined to be the product of n differentials of the function, each differential being w.r.t. one of the n variables”.)

    Fraction-of-time Probability for Time-Series that Exhibit Cyclostationarity

    This concise paper is based on the seminal 1987 book [Bk2]. Following are some quotations reflecting the reaction of leaders in the field to this book when it first came out.

    The renowned researcher and author of over 20 books, Enders A. Robinson, wrote the following about the book:

    “Professor Gardner has the ability to impart a fresh approach to many difficult problems. . . . His general approach is to go back to the basic foundations and lay a new framework. This gives him a way to circumvent many of the stumbling blocks confronted by other workers . . . he has discovered many avenues of approach which were either not known or neglected in the past. In this way his work more resembles some of the outstanding mathematicians and engineers of the past. . . . William’s success in the approach shows the strength of his engineering insight. He has been able to solve problems that others have left as being too difficult.” 

    Enders A. Robinson

    Further to this, Robinson wrote a strongly supportive 1990 review of this book that includes the following excerpt:

    “This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”

     

    Similarly, the following quotation from Professor Ronald N. Bracewell, 1994 recipient of the Institute of Electrical and Electronics Engineers’ Heinrich Hertz medal for pioneering work in antenna aperture synthesis and image reconstruction as applied to radio astronomy and to computer-assisted tomography, taken from his Foreword to the 1987 book introducing FOT-Probability theory [Bk2], makes essentially the same point that Robinson makes:

    “If we are to go beyond pure mathematical deduction and make advances in the realm of phenomena, theory should start from the data. To do otherwise risks failure to discover that which is not built into the model . . . Professor Gardner’s book demonstrates a consistent approach from data, those things which in fact are given, and shows that analysis need not proceed from assumed probability distributions or random processes. This is a healthy approach and one that can be recommended to any reader.”

    Ronald Newbold Bracewell, 1921 – 2007
    Foreign Associate Member of the Institute of Medicine of the U.S. National Academy of Sciences

     

    As another example, Dr. Akiva Yaglom, Mathematician and Physicist, USSR Academy of Sciences, wrote in his review of the Book published in Theory of Probability and Its Applications:

    “It is important . . . that until Gardner’s . . . book was published there was no attempt to present the modern spectral analysis of random processes consistently in language that uses only time-averaging rather than averaging over the statistical ensemble of realizations [of a stochastic process] . . . Professor Gardner’s book is a valuable addition to the literature”

    Dr. Akiva Yaglom, Mathematician and Physicist

     

    A fourth example is the following succinct enthusiastic remark given by Professor James L. Massey, information theorist and cryptographer, Professor of Digital Technology at ETH Zurich, in a prepublication book review in 1986:

    “I admire the scholarship of this book and its radical departure from the stochastic process bandwagon of the past 40 years.”

    Professor James L. Massey
    Member of the National Academy of Engineering and the Royal Swedish Academy of Sciences

     

    As a final example, Professor Thomas Kailath of Stanford University, member of the National Academy of Engineering, the US National Academy of Sciences, the American Academy of Arts and Sciences, the Indian National Academy of Engineering, and the Silicon Valley Engineering Hall of Fame, wrote the following about the 1987 book:

    “It is always hard to go against the established order, but I am sure that the book will have a considerable impact. It will be a definitive text on spectral analysis.” — Professor Thomas Kailath.”

    US Medal of Science, IEEE Medal of Honor, Fellow of IEEE, Fellow of Institute of Mathematical Statistics, Past President of IEEE Information Theory Group
  • 3.2 Transitioning Away from Stochastic Process Models

    For purposes of developing intuition and possibly deeper understanding regarding the FOT-Probability model introduced on Page 3.1, in contrast to the conventional stochastic process model, the reader is referred to the article here. This article was written and submitted as a feature article for the IEEE Signal Processing Magazine in March 2022, because it is appropriate to bring this conceptual problem-solving progress to the attention of the broad readership of this magazine, which 32 years earlier published the landmark paper EXPLOITATION OF SPECTRAL REDUNDANCY IN CYCLOSTATIONARY SIGNALS, introducing this proposed paradigm shift. But it was rejected by the Editor for Feature Articles, Dr. Laure Blanc-Féraud, because she thought it would not be of interest to today’s readership of this magazine. This exemplifies the uphill battle to get people to open their minds to different ways of thinking, once they have been indoctrinated in some other particular way of thinking—a topic addressed in some detail here on Page 7. This issue is further pursued on Page 3.3, where the mathematical pros and cons of these two alternative types of models are discussed in detail.

    Even though journal editors, sharing Blanc-Feraud’s attitude of dismissal of the importance of educating readers about FOT-Probability are common (cf. page 3.6), there are those who “get it”, as exemplified by the highly respected researcher and editor, the late P.E. Doak, as illustrated below in correspondence between us. The article here entitled “Transitioning Away from Stochastic Process Models” is in press for publication in the Journal of Sound and Vibration, Spring 2023. The highlights of this paper are as follows:

      • Comparison of two alternative generic stochastic process models for data analysis and inference
      • The standard model is relatively abstract, and the new model is better suited to empirical data
      • Mathematical and Pragmatic pros and cons are exposed
      • A paradigm shift is urged
    Phillip Ellis Doak, 1921 -2011
    Founding Editor, Journal of Sound and Vibration, and Editor in Chief for 40 years

    Quotation from 8 March 1990 letter to Professor Gardner:

     “In my latter years, I have become more and more convinced of the validity of his [Percy W. Bridgman, Nobel Prize Laureate] outlook. Not only can ergodic mathematical concepts put students off, indeed I now believe that for physical scientists and engineers, they are “operationally erroneous”, and dangerous to mental health. Interpreting observations through ergodic spectacles is to misinterpret what the observations really mean. Not only does it confuse the issue, but also it inhibits the development of one’s intellectual capacity to ask the right questions about what the data means. Thus, in design, development, and research it is a model of reality which is counterproductive in respect to generating concepts which can lead to real progress in the real world.”

     

    This perspective is aligned with that of Professor Enders A. Robinson, recipient of the Society of Exploration Geophysicists’ Maurice Ewing Medal, originator of the digital revolution in geophysics, and highest honored scientist in the field of geophysics, quoted on page 3.1 but repeated (in part) here: 

    Enders A. Robinson
    Member, National Academy of Engineering

    “This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”

  • 3.3 Advancing beyond the Need for Signal Models Requiring Unjustified Assumptions

    For manmade signals, such as those typically encountered in communications and signals intelligence systems, applied R&D in statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. signal processing is typically based on formulaic signal models specified by explicit mathematical formulas containing deterministic functions of time, and—to be brief—individual random variables, sequences of independent random variables, and standard stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. processes, such as stationary Gaussian or uniformly distributed processes. In such cases, the user can sometimes derive mathematical properties that will be useful in mathematical analysis, such as derivations of solutions to statistical inference and decision-making optimization problems. But this is often beyond the scope of specific applications and, as a result, assumptions about the models are typically made without justification. Sometimes, very broad but unproven justifications are used. For example, the analyst may assume a specified model satisfies the axiomatic definition of a Kolmogorov stochastic process. Examples of properties of the probability measure defining a Kolmogorov process include sigma-additivity (additivity of countably infinite numbers of terms), sigma-linearity (linearity of an operator applied to a linear combination of a countably infinite number of terms), and joint-measurability of two or more processes, which is necessary for the existence of joint probability density functions. These properties of a specific model are often not verifiable, despite their being assumed to hold. While this is common practice, it does not follow the scientific method that should guide all science and engineering.

    As an alternative to this expedient but unsavory practice, we consider throughout this multi-section Page 3 the alternative to the Kolmogorov Model of random signals called the Fraction-of-Time Probability Model of random signals. The remainder of this section is a copy of a recently published tutorial paper [JP66], [J44] on the mathematical pros and cons of these two alternative types of models. The objective is to promote the FOT-Probability model as a superior alternative for many applications involving statistical time-series analysis.

    Click on the window to see all pages
  • 3.4 A Perspective from the Past

    To complement the recently written articles presented on Pages 3.2 and 3.3, this page consists of a set of slides used for Section IV of the opening Plenary Lecture for the first international Workshop on Cyclostationarity. To repeat an explanation regarding the other sections of this plenary lecture given on Page 2, some readers may wonder why this is appropriate considering that this workshop was held 30 years ago! (in 1992). I consider this appropriate because I developed these slides specifically for a broad group of highly motivated students. I say they were students solely because they traveled from far and wide specifically to attend this educational program. In fact, the participants of the workshop were mostly senior researchers in academia, industry, and government laboratories. Knowing the workshop was a success and knowing all the topics covered are as important today as they were then, I have chosen this presentation as ideal for the purposes of this website. In particular, although theoretical comparison of stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity has advanced considerably since the Workshop 30 years ago, as explained on Pages 3.2 and 3.3, especially considering the progress made on measure-theoretic considerations of these two alternative theories, some of the questions raised in 1992, particularly those involving stochastic-process models and surrounding the concept of cycloergodicity, are still not fully answered.

    The unavoidable absence of detail in the presentation slides for Sec. IV presented below is made up for, to the extent that progress has been achieved in the ensuing 30 years, throughout this Page 3 and the sources linked to herein.

    Click on the window to see all pages

    Because the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity summarized in this Section IV of the Plenary Lecture is a relatively technical subject, it is recommended that students consider this section to be only a concise overview and that they follow up on it with Chapter 1 in the book [Bk5]. This chapter not only describes the duality between the stochastic and nonstochastic theories of cyclostationarity, but also derives the nonstochastic FOT-probabilistic theory from an inquiry into the nature of the property of time functions that is responsible for the defining characteristic of cyclostationarity: that finite-strength sine waves can be generated from cyclostationary functions by subjecting the functions to time-invariant nonlinear transformations. This inquiry leads naturally to the definitions of cyclic probabilistic moments and cyclic probability distributions and, more generally, cyclic expectation; and, in Chapter 2 of the book [Bk5]cyclic probabilistic cumulants. This is to be contrasted with the stochastic theory of cyclostationarity in which these key probabilistic quantities are simply posited on the basis of mathematical considerations only, with not even a mention of generating sine waves, which is a key characteristic of the physical manifestation of cyclostationarity.

    The direct relevance of this discussion to the primary subject of this website is the claim herein that science and engineering were done great harm by mathematicians’ hard sell of the stochastic process model to the exclusion of the non-stochastic time-series model that came before.

    With a brief look ahead at Page 7, one can surmise that this hard sell reflects inadequate Right-Brain (RB) activity which would have been required to reveal the absence of a necessity to use such unrealistic and overly abstract models—something that has unnecessarily burdened teachers and students alike, and of course practicing engineers and scientists, with the challenge to each and every one of them to bring to bear the considerable RB activity required to make sense of the huge conceptual gap between the reality from nature of a single time-series of measured/observed data and the mathematical fiction of a typically-infinite ensemble of hypothetical time-series together with a probability law (a mathematical creation) governing the ensemble average over all the fictitious time series. All these unsuspecting individuals were left to close this conceptual gap on their own, being armed with nothing more than a mathematical theorem, which only rarely can be applied in practice, that gives the mathematical condition on a stochastic process model under which its ensemble averages equal (in an abstract sense; i.e., with probability equal to 1) the time averages over individual time-series in the ensemble. This condition on the probability law ensures that expected values of a proposed stochastic process mathematically calculated (a Left-Brain (LB) activity) from the mathematical model equal time averages measured from a single time-series member of the ensemble, assumed to be the times series that actually exists in practice. But this equality imposes another condition, namely that we mathematically take the limit of the time average as the amount of averaging time approaches infinity. Thus, the theorem—called the Ergodic Theorem—doesn’t actually address reality, because one never has an infinitely long segment of time-series data. Moreover, the theorem is of little-to-no operational utility because the condition on the probability law can only rarely be tested for a given specific stochastic process model. Thus, most users of stochastic process theory rely conceptually on what is called the Ergodic Hypothesis by which one simply assumes the condition of the Ergodic Theorem is satisfied for whatever stochastic process model one chooses to work with. Faith of this sort has no place in science and engineering.

    In my opinion, acceptance of all this gibberish and going forward with the stochastic process concept as the only option for mathematically modeling real time-series data requires abandonment of RB thinking. There really is no way to justify this abstraction of reality as a necessary evil. The fraction-of-time probabilistic model of single times series is an alternative option that avoids departing so far from the reality of measured/observed time-series data, its empirical statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. analysis, the mathematical modeling of the time-series, and the results of the analysis. The wholesale adoption by academicians of the stochastic process foisted upon them by mathematicians suggests these academicians, as well as the mathematicians, suffer from low-level RB activity. These general remarks are backed up by the detailed mathematical comparative analyses presented in other sections of this Page 3.

  • 3.5 The Hierarchy of Non-Stochastic Probabilistic Models of Time Series
    3.5.1 Introduction

    Statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. metrics for time series such as mean, bias, variance, coefficient of variation, covariance, and correlation coefficient can be defined using finite-time averages as replacements for expected values in well-known probabilistic metrics. These statistical metrics also can be arrived at from nothing more than a little thought, without any reference to probability or expected value. In fact, many of these statistical metrics were in use long before the probabilistic theory of stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. processes was developed.

    In the book [Bk2], such non-probabilistic statistical metrics are used for statistical spectral analysis. The resultant theory for understanding how to perform and study statistical spectral analysis is the lowest level in a hierarchy of non-stochastic theories of statistical spectral analysis and, more generally, time-series analysis. This level is referred to as the purely empirical non-probabilistic theory. It is quite adequate for many applications.

    The next level up in the hierarchy is referred to as the purely empirical FOT-probabilistic theory, where FOT stands for Fraction-of-Time. The model upon which this theory is based is introduced in the presentation below. The third and highest level in the hierarchy is referred to as the non-stochastic FOT-probabilistic theory. This theory is fully developed in the book [Bk2]. The model is an asymptote of the model for the Finite-Time Theory described below. This asymptotic model can be approached as closely as one desires with the Finite-Time Model if enough time series data is available, but it cannot be reached exactly and still be empirical.

    In subsection 3.5.2, the terms purely empirical, probabilistic, and non-stochastic are defined and the three individual levels of the hierarchy are defined and illustrated. The following material was mostly presented at the 2021 On-Line Grodek Conference on Cyclostationarity, but it is an improved version of that presentation.

    Page 3.5 is concluded with a brief derivation in subsection 3.5.3 illustrating how the finite-time cyclostationary and poly-cyclostationary expectation operators are defined for finite segments of data, and another brief illustration in subsection 3.5.4 of how non-probabilistic expectation operators on finite data segments are defined and an explanation of the relationship between these and signal subspace methods of statistical inference. Both concepts use projections that are more general than those which function as constant-component extraction operators and periodic component extraction operators, both of which are probabilistic expectations.

    3.5.2 Purely Empirical FOT-Probability Theory for Modeling and Analysis of Time Series that are Stationary (S), Cyclostationary (CS), and Poly-Cyclostationary (PCS)

    The presentation slides presented here address the mathematical foundation and framework, developed by the WCM, for statistical time-series analysis based on statistical functions such as correlations, higher-order moments, and cumulative probability distributions, without involving the abstract mathematical model of a stochastic process. The purpose is to facilitate conceptualization and practical application. The application addressed is statistical spectral correlation analysis.

    3.5.3 Modification of the FOT-Probability Theory of CS and Poly-CS Time Series from Infinitely Long Data Records to Finite Segments

    The process by which the models of CS and Poly-CS time series are modified to render them applicable to data on finite-time intervals instead of infinite-time intervals is explained here and supported with mathematical definitions.

    3.5.4 Subspace Signal Processing and Empirical Nonstationary FOT Expectation Operators

    The topic for this page is addressed here.

  • 3.6 Published Debate: Stochastic Process vs FOT-Probability Model

    The 1987 book, Statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. Spectral Analysis: A Nonprobabilistic Theory, argues for more judicious use of the modern stochastic-process-model (arising from the work of mathematicians in the 1930s, such as Khinchin, Kolmogorov, and others) instead of the more realistic predecessor: the time-series model first developed mathematically by Norbert Wiener in 1930 (see also page 59 of Wiener 1949, written in 1942, regarding the historical relationship between his and Kolmogorov’s approaches), that was briefly revisited in the 1960s by engineers before it was buried by mathematicians. The brief tongue-in-cheek essay Ensembles in Wonderland, published in IEEE Signal Processing Magazine, AP Forum, 1994 and reproduced below, is an attempt at satirizing the outrage typified by narrow-minded thinkers exemplified by two outspoken skeptics, Neil Gerr and Melvin Hinich, who wrote scathing remarks and a book review characterizing this book as utter nonsense. (Page 7.6 offers an explanation for the behavior of these two naysayers in terms of weak right-brain thinking.)

    But first, let us consider the parallel to the book Alice in Wonderland; the following is comprised of excerpts taken from https://en.wikipedia.org/wiki/Alice’s_Adventures_in_Wonderland : Martin Gardner and other scholars have shown the book Alice in Wonderland [written by Lutwidge Dodgson under the pseudonym Lewis Carroll] to be filled with many parodies of Victorian popular culture. Since Carroll was a mathematician at Christ Church, it has been argued that there are many references and mathematical concepts in both this story and his later story Through the Looking Glass; examples include what have been suggested to be illustrations of the concept of a limit, number bases and positional numeral systems, the converse relation in logic, and the ring of integers modulo a specific integer. Deep abstraction of concepts, such as non-Euclidean geometry, abstract algebra, and the beginnings of mathematical logic, was taking over mathematics at the time Alice in Wonderland was being written (the 1860s). Literary scholar Melanie Bayley asserted in the magazine New Scientist that Alice in Wonderland in its final form was written as a scathing satire on new modern mathematics that was emerging in the mid-19th century.

    Today, Dodgson’s satire appears to be backward looking because, after all, there are strong arguments that modern mathematics has triumphed. Coming back to the topic of interest here, stochastic processes also have triumphed in terms of being wholly adopted in mathematics and science and engineering, except for a relatively small contingent of empirically-minded scientists and engineers. Yet, recent mathematical arguments, described in tutorial fashion on pages 3.2.and 3.3 and further supported with references cited there, provide a sound logical basis for reversing this outcome, especially when the overwhelming evidence of practical, pragmatic, pedagogic, and overarching conceptual advantages provided in the 1987 book and expanded on pages 3.2 and 3.3 here, is considered. The present dominance of the more abstract and less realistic stochastic process theory might be viewed as an example of the pitfalls of what has become known as groupthink or the inertia of human nature that resists changes in thinking, which is discussed in considerable detail based on numerous historical sources on Page 7.

    Before presenting the several letters comprising the debate, including the standalone article “Ensembles in Wonderland”, the final letter to SP Forum in the debate is reproduced here first to provide hindsight, especially for interpreting “Ensembles in Wonderland”. The bracketed text, e.g., [text], below was added at the time this material was posted on this website to enhance clarity.

    3.6.1 Preliminary Material

    July 2, 1995 (published in Nov 1995)

    To the Editor:

    Introduction

    This is my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], and carried on by Mr. Neil Gerr through his letters to SP Forum.

    In this letter, I supplement my previous remarks aimed at clarifying the precariousness of Hinich’s and Gerr’s position by explaining the link between my argument in favor of the utility of fraction-of-time (FOT) probability and the subject of a plenary lecture delivered at ICASSP ’94. In the process of discussing this link I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition–that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework–simply cannot be defended if argument is to be based on fact and logic.

    David J. Thomson’s Transcontinental Waveguide Problem

    To illustrate that the stochastic-process conceptual framework is often applied to physical situations where the time-average framework is a more natural choice, I have chosen an example from D. J. Thomson’s recent plenary lecture on the project that gave birth to the multiple-window method of spectral analysis [2]. The project that was initiated back in the mid-1960s was to study the feasibility of a transcontinental millimeter waveguide for a telecommunications transmission system potentially targeted for introduction in the mid-1980s. It was found that accumulated attenuation of a signal propagating along a circular waveguide was directly dependent on the spectrum of the series, indexed by distance, of the erratic diameters of the waveguide. So, the problem that Thomson tackled was that of estimating the spectrum for the more than 4,000-mile-long distance-series using a relatively small segment of this series that was broken into a number of 30-foot long subsegments. (It would take more than 700,000 such 30-foot sections to span 4,000 miles.) The spectrum had a dynamic range of over 100 dB and contained many periodic components, indicating the unusual challenge faced by Thomson.

    When a signal travels down a waveguide (at the speed of light) it encounters the distance-series [consisting of the distances traveled as time progresses]. Because of the constant velocity, the distance-series is equivalent to a time-series. Similarly, the series of diameters that is measured for purposes of analysis is—due to the constant effective velocity of the measurement device—equivalent to a time-series [of measurements]. So, here we have a problem where there is one and only one long time-series of interest (which is equivalent to a distance-series)—there is no ensemble of long series over which average characteristics are of interest and, therefore, there is no obvious reason to introduce the concept of a stochastic process. That is, in the physical problem being investigated, there was no desire to build an ensemble of transcontinental waveguides. Only one (if any at all) was to be built, and it was the spectral density of distance-averaged (time-averaged) power of the single long distance-series (time-series) that was to be estimated, using a relatively short segment, not the spectral density of ensemble-averaged power. Similarly, if one wanted to analytically characterize the average behavior of the spectral density estimate (the estimator mean) it was the average of a sliding estimator over distance (time), not the average over some hypothetical ensemble, that was of interest. Likewise, to characterize the variability of the estimator, it was the distance-average squared deviation of the sliding estimator about its distance-average value (the estimator variance) that was of interest, not the variance over an ensemble. The only apparent reason for introducing a stochastic process model with its associated ensemble, instead of a time-series model, is that one might have been trained to think about spectral analysis of erratic data only in terms of such a conceptual artifice and might, therefore, have been unaware of the fact that one could think in terms of a more suitable alternative that is based entirely on the concept of time averaging over the single time-series. (Although it is true that the time-series segments obtained from multiple 30 ft. sections of waveguide could be thought of as independent random samples from a population, this still does not motivate the concept of an ensemble of infinitely long time-series–a stationary stochastic process. The fact remains that, physically, the 30-foot sections represent subsegments of one long time-series in the communications system concept that was being studied.) [And even if Mr. Thomson was aware of the fact that one could conceptualize the problem entirely in terms of time averages, he had good reason to fear that this approach would be off-putting to his readers all of whom were likely indoctrinated only in statistical spectral analysis theory couched in terms of stochastic processes—an unfortunate situation].

    It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process (the model adopted by Thomson) except to accommodate lack of familiarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework. Somehow, he does not recognize the mental gyrations required to force this and other physical problems into the stochastic process framework.

    Gerr’s Letter

    Having explained the link between my argument in favor of the utility of FOT probability and Thomson’s work, let us return to Gerr’s letter. Mr. Gerr, in discussing what he refers to as “a battle of philosophies,” states that I have erred in likening skeptics to religious fanatics. But in the same paragraph we find him defensively trying to convince his readers that the “statistical/probabilistic paradigm” has not “run out of gas” when no one has even suggested that it has. No one, to my knowledge, is trying to make blanket negative statements about the value of what is obviously a conceptual tool of tremendous importance (probability) and no one is trying to denigrate statistical concepts and methods. It is only being explained that interpreting probability in terms of the fraction-of-time of occurrence of an event is a useful concept in some applications. To argue, as Mr. Gerr does again in the same paragraph, that in general this concept “has no obvious advantages” and using it is “like building a house without power tools: it can certainly be done, but to what end?” is, as I stated in my previous letter, to behave like a religious fanatic — one who believes there can be only One True Religion. This is a very untenable position in scientific research.

    As I have also pointed out in my previous letter, Mr. Gerr is not at all careful in his thinking. To illustrate his lack of care, I point out that Gerr’s statement “Professor Gardner has chosen to work within the context of an alternative paradigm [fraction-of-time probability]”, and the implications of this statement in Gerr’s following remarks, completely ignore the facts that I have written entire books and many papers within the stochastic process framework, that I teach this subject to my students, and that I have always extolled its benefits where appropriate. If Mr. Gerr believes in set theory and logic, then he would see that I cannot be “within” paradigm A and also within paradigm B unless A and B are not mutually exclusive. But he insists on making them mutually exclusive, as illustrated in the statement “From my perspective, developing signal processing results using the fraction-of-time approach (and not probability/statistics) … .” (The parenthetical remark in this quotation is part of Mr. Gerr’s statement.) Why does Mr. Gerr continue to deny that the fraction-of-time approach involves both probability and statistics?

    Another example of the lack of care in Mr. Gerr’s thinking is the convoluted logic that leads him to conclude “Thus, spectral smoothing of the biperiodogram is to be preferred when little is known of the signal a priori.” As I stated in my previous letter, it is mathematically proven* in [1] that the frequency smoothing and time averaging methods yield approximately the same result. Gerr has given us no basis for arguing that one is superior to the other and yet he continues to try to make such an argument. And what does this have to do with the utility of the fraction-of-time concept anyway? These are data processing methods; they do not belong to one or another conceptual framework.

    To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing referred to above was first derived by using the fraction-of-time conceptual framework [1]. If there is no conceptual advantage to this framework, why wasn’t such a fundamental result derived during the half century of research based on stochastic processes that preceded [1]? The second example is taken from the first attempt to develop a theory of higher-order cyclostationarity for the conceptualization and solution of problems in communication system design. In [3], it is shown that a fundamental inquiry into the nature of communication signals subjected to nonlinear transformations led naturally to the fraction-of-time probability concept and to a derivation of the cumulant as the solution to a practically motivated problem. This is, to my knowledge, the first derivation of the cumulant. In all other work, which is based on stochastic processes (or non-fraction-of-time probability) and which dates back to the turn of the century, cumulants are defined, by analogy with moments, to be coefficients in an infinite series expansion of a transformation of the probability density function (the characteristic function), which has some useful properties. If there is no conceptual advantage to the fraction-of-time framework, why wasn’t the cumulant derived as the solution to the above-mentioned practical problem or some other practical problem using the orthodox stochastic-probability framework?

    Conclusion

    Since no one in the preceding year has entered the debate to indicate that they have new arguments for or against the philosophy and corresponding theory and methodology presented in [1], it seems fair to proclaim the debate closed. The readers may decide for themselves whether the resolution put forth in [1] was defeated or was upheld.

    But regarding the skeptics, I sign off with a humorous anecdote:

    When Mr. Fulton first showed off his new invention, the steamboat, skeptics were crowded on the bank, yelling ‘It’ll never start, it’ll never start.’

    It did. It got going with a lot of clanking and groaning and, as it made its way down the river, the skeptics were quiet.

    For one minute.

    Then they started shouting. ‘It’ll never stop, it’ll never stop.’

    — William A. Gardner

    * A more detailed and tutorial proof of this fundamental equivalence is given in the article “The history and the equivalence of two methods of spectral analysis,” Signal Processing Magazine, July 1996, No.4, pp.20 – 23, which is copied into the Appendix farther down this Page.

    References

    1. W. A. Gardner. Statistical Spectral Analysis: A Nonprobabilistic Theory. Prentice-Hall, Englewood Cliffs, NJ, 1987.
    2. D. J. Thomson. “An Overview of Multiple-window and quadratic-inverse spectrum estimation methods,” Plenary Lecture, Proceedings of 1994 International Conference on Acoustics, Speech and Signal Processing, pp. VI-185 – VI-194.
    3. W. A. Gardner and C. M. Spooner. “The Cumulant Theory of Cyclostationary time-series, Part I: Foundation,” IEEE Transactions on Signal Processing, Vol. 42, December 1994, pp. 3387-3408.

     

    Excerpts from earlier versions of above letter to the editor before it was condensed for publication:

    April 15, 1995

    Introduction

    In this, my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], I shall begin by addressing two remarks in the opening paragraph of Mr. Neil Gerr’s last letter (in March 1995 SP Forum). In the first remark, Mr. Gerr suggests that the “bumps and bruises” he sustained by venturing into the “battle” [debate] were to be expected. But I think that such injuries could have been avoided if he had all the relevant information at hand before deciding to enter the debate. This reminds me of a story I recently heard:

    Georgios and Melvin liked to hunt. Hearing about the big moose up north, they went to the wilds of Canada to hunt. They had hunted for a week, and each had bagged a huge moose. When their pilot Neil landed on the lake to take them out of the wilderness, he saw their gear and the two moose. He said, “I can’t fly out of here with you, your gear, and both moose.”

    “Why not?” Georgios asked.

    “Because the load will be too heavy. The plane won’t be able to take off.”

    They argued for a few minutes, and then Melvin said, “I don’t understand. Last year, each of us had a moose, and the pilot loaded everything.”

    “Well,” said Neil, “I guess if you did it last year, I can do it too.”

    So, they loaded the plane. It moved slowly across the lake and rose toward the mountain ahead. Alas, it was too heavy and crashed into the mountain side. No one was seriously hurt and, as they crawled out of the wreckage in a daze, the bumped and bruised Neil asked, “Where are we?”

    Melvin and Georgios surveyed the scene and answered, “Oh, about a mile farther than we got last year.”

    If Mr. Gerr had read the book [1] and put forth an appropriate level of effort to understand what it was telling him, he would have questioned Mr. Hinich’s book review and would have seen that the course he was about to steer together with the excess baggage he was about to take on made a crash inevitable.

    A friend of mine recently offered me some advice regarding my participation in this debate. “Why challenge the status quo”, he said, “when everybody seems happy with the way things are.” My feeling about this is summed up in the following anecdote:

    “Many years ago, a large American shoe manufacturer sent two sales reps out to different parts of the Australian outback to see if they could drum up some business among the aborigines. Sometime later, the company received telegrams from both agents.

    The first one said. ‘No business. Natives don’t wear shoes.’

    The second one said, ‘Great opportunity here–natives don’t wear shoes.'”

    Another friend asked “why spend your time on this [debate] when you could be solving important problems.” I think Albert Einstein answered that question when he wrote:

    “The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science”

    This underscores my belief that we are overemphasizing “engineering training” in our university curricula at the expense of “engineering science.” It is this belief that motivates my participation in this debate. Instead of plodding along in our research and teaching with the same old stochastic process model for every problem involving time-series data, we should be looking for new ways to think about time-series analysis.

    In the second remark in Mr. Gerr’s opening paragraph, regarding my response to Mr. Gerr’s October 1994 SP Forum letter in sympathy with “Hinich’s gleefully vicious no-holds-barred review” of [1], Mr. Gerr says “Even by New York standards, it [my response] seemed a bit much.” Well, I guess I was thinking about what John Hancock said, on boldly signing the Declaration of Independence:

    There, I guess King George will be able to read that!

    Like the King of England who turned a deaf ear to the messages coming from the new world, orthodox statisticians, like Messrs. Hinich and Gerr who are mired in tradition seem to be hard of hearing–a little shouting might be needed to get through to them.

    Nevertheless, I am disappointed to see no apparent progress, on Mr. Gerr’s part, in understanding the technical issues involved in his and Hinich’s unsupportable position that the time-average framework for statistical signal processing has, and I quote Gerr’s most recent letter, “no obvious advantages.” I hasten to point out, however, that this most recent position is a giant step back from the earlier even more indefensible position taken by Hinich in his book review, reprinted in April 1994 SP Forum, where much more derogatory language was used.

    In this letter, I make a final attempt to clarify the precariousness of Hinich’s and Gerr’s position by explaining links between my arguments and the subjects of two plenary lectures delivered at ICASSP ’94. In the process of discussing these links and this paper, I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition, that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework, simply cannot be defended if argument is to be based on fact and logic.

    Lotfi Zadeh and Fuzzy Logic

    I wish that Mr. Gerr would let go of the fantasy about “the field where the Fraction-of-Timers and Statisticians do battle.” There do not exist two mutually exclusive groups of people—one of which can think only in terms of fraction-of-time probability and the other of which call themselves Statisticians. How many times and in how many ways does this have to be said before Mr. Gerr will realize that some people are capable of using both fraction-of-time probability and stochastic process concepts, and of making choices between these alternatives by assessing the appropriateness of each for each particular application? Mr. Gerr’s “battle” of “fraction-of-time versus probability/statistics” simply does not exist. This insistence on a dichotomy of thought is strongly reminiscent of the difficulties some people have had accepting the proposition that the concept of fuzziness is a useful alternative to the concept of probability. The vehement protests against fuzziness are for most of us now almost laughable.

    To quote Professor Lotfi Zadeh in his recent plenary lecture [2]

    “[although fuzzy logic] offers an enhanced ability to model real-world phenomena…[and] eventually fuzzy logic will pervade most scientific theories…the successes of fuzzy logic have also generated a skeptical and sometimes hostile reaction…Most of the criticisms directed at fuzzy logic are rooted in a misunderstanding of what it is and/or a lack of familiarity with it.”

    I would not suggest that the time-average approach to probabilistic modeling and statistical inference is as deep a concept, as large a departure from orthodox thinking, or as broadly applicable as is fuzzy logic, but there are some definite parallels, and Professor Zadeh’s explanation of the roots of criticism of fuzzy logic applies equally well to the roots of criticism of the time-average approach as an alternative to the ensemble-average or, more accurately, the stochastic-process approach. In the case of fuzzy logic, its proponents are not saying that one must choose either conventional logic and conventional set theory or their fuzzy counterparts as two mutually exclusive alternative truths. Each has its own place in the world. Those opponents who argue vehemently that the unorthodox alternative is worthless can be likened to religious fanatics. This kind of intolerance should have no place in science. But it is all too commonplace and it has been so down through the history of science. So surely, one cannot expect to find its absence in connection with the time-average approach to probabilistic modeling and statistical inference. Even though experimentalists in time-series analysis (including communication systems analysis and other engineered-systems analysis) have been using the time-average approach (to various extents) for more than half a century, there are those like Gerr and Hinich who “see no obvious advantages.” This seems to imply that Mr. Gerr has one and only one interpretation of a time-average measurement on time series data—namely an estimate of some random variable in an abstract stochastic process model. To claim that this mathematical model is, in all circumstances, the preferred one is just plain silly.

    David J. Thomson and the Transcontinental Waveguide –addition to published discussion:

    [It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process except to accommodate unfamiliarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework.] It is correct in this case that a sufficiently capable person would obtain the same result using either framework, but it is incorrect to not recognize the mental gyrations required to force this physical problem into the stochastic process framework. My claim—and the reason I wrote the book [1]—is that our students deserve to be made aware of the fact that there are two alternatives. It is pigheaded to hide this from our students and force them to go through the unnecessary and sometimes confusing mental gyrations required to force-fit the stochastic process framework to real-world problems where it is truly an unnecessary and, possibly, even inappropriate artifice.

    Gerr’s Letter—addition to published letter:

    To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing, whose proof is outlined in the Appendix at the end of this letter, was first derived by using the fraction-of-time conceptual framework [1].

    An Illustration of Blinding Prejudice

    To further illustrate the extent to which Mr. Gerr’s prejudiced approach to scientific inquiry has blinded him, I have chosen one of his research papers on the subject of cyclostationary stochastic processes. In [5], Mr. Gerr (and his coauthor) tackle the problem of detecting the presence of cyclostationarity in an observed time-series. He includes an introduction and references sprinkled throughout that tie his work to great probabilists, statisticians, and mathematicians. (We might think of these as the “Saints” in Mr. Gerr’s One True Religion.) This is strange, since his paper is nothing more than an illustration of the application of a known statistical test (and a minor variation thereof) to synthetic data. It is even more strange that he fails to properly reference work that is far more relevant to the problem of cyclostationarity detection. But I think we can see that there is no mystery here. The highly relevant work that is not cited is authored by someone who champions the value of fraction-of-time probabilistic concepts. The fact that the relevant publications (known to Gerr) actually use the stochastic process framework apparently does not remove Mr. Gerr’s blinders. All he can see–it would seem–is that the author is known to argue (elsewhere) that the stochastic process framework is not always the most appropriate one for time-series analysis, and this is enough justification for Mr. Gerr to ignore the highly relevant work by this “heretic” author (author of the book [1] that Hinich all but said should be burned).

    To be specific, Mr. Gerr completely ignores the paper [6] (published 1-1/2 years prior to the submission of Gerr’s paper) and the book [7] (published 4 years prior) wherein the problem of cyclostationarity detection is tackled using maximum-likelihood [6], maximum-signal-to-noise ratio [6], [7], and other optimality criteria, all of which lead to detection statistics that involve smoothed biperiodograms (and that also identify optimal smoothing) which are treated by Gerr as if they were ad hoc. Mr. Gerr also cites a 1990 publication (which does not appear in his reference list) that purportedly shows that the integrated biperiodogram (cyclic periodogram) equals the cyclic mean square value of the data (cf. (12)); but this is a special case of the much more useful result, derived much earlier than 1990, that the inverse Fourier transform of the cyclic periodogram equals the cyclic correlogram. The argument, by example, that Gerr proffers to show that (12) (the cyclic correlogram at zero lag) is sometimes a good test statistic and sometimes a bad one is trivialized by this Fourier transform relation (cf. [1]) and the numerous mathematical models for data for which the idealized quantities (cyclic autocorrelations, and cyclic spectral densities) in this relation have been explicitly calculated (cf. [1], [7]). These models include, as special cases, the examples that Gerr discusses superficially. The results in [1], [7] show clearly when and why the choice of zero lag made by Gerr in (12) is a poor choice. As another example, consider Mr. Gerr’s offhand remark that a Mr. Robert Lund (no reference cited) “has recently shown that for the current example (an AM signal with a square wave carrier) only lines [corresponding to cycle frequencies] spaced at even multiples of d=8 [the reciprocal of the period of the carrier] will have nonzero spectral (rz) measure.” This result was established in a more general form many years earlier in his coauthor’s Ph.D. dissertation (as well as in [1]) where one need only apply the extremely well-known fact that a symmetrical square wave contains only odd harmonics.

    To go on, the coherence statistic that Gerr borrows from Goodman for application to cyclostationary processes has been shown in [7] to be nothing more than the standard sample statistic for the standard coherence function (a function of a single frequency variable) for two processes obtained from the one process of interest by frequency-shifting data transformations–except for one minor modification; namely, that time-averaged values of expected values are used in place of non-averaged expected values in the definition of coherence because the processes are asymptotically mean stationary, rather than stationary. Therefore, the well-known issues regarding frequency smoothing in these cross-spectrum statistics need not be discussed further, particularly in the haphazard way this is done by Gerr, with no reliance on analysis of specific underlying stochastic process models.

    Continuing, the incoherent average (13) proposed by Gerr for use with the coherence statistic is the only novel contribution of this paper, and I claim that it is a poor statistic. The examples used by Gerr show that this “incoherent statistic” outperforms the “coherent statistic,” but what he does not recognize is that he chose the wrong coherent statistic for comparison. He chose the cyclic correlogram with zero lag (12), which is known to be a poor choice for his examples. For his example in Figure 9, zero lag produces a useless statistic, whereas a lag equal to T/2 is known to be optimum, and produces a “coherent statistic” that is superior to Gerr’s incoherent statistic. Thus, previous work [1], [7] suggests that a superior alternative to Gerr’s incoherent statistic is the maximum over a set of lag-indexed coherent statistics.

    Finally, Mr. Gerr’s vague remarks about choosing the frequency-smoothing window-width parameter M are like stabs in the dark by comparison with the thorough and careful mathematical analysis carried out within–guess what–the time-average conceptual framework in [1] in which the exact mathematical dependence of bias and variance of smoothed biperiodograms on the data-tapering window shape, the spectral-smoothing window shape, and the ideal spectral correlation function for the data model are derived, and in which the equivalence between spectral correlation measurement and conventional cross-spectrum measurement is exploited to show how conventional wisdom [1, chapter 5, 7] applies to spectral correlation measurement [1,chapters 11, 13, 15].

    In summary, Gerr’s paper is completely trivialized by previously published work of which he was fully aware. What appears to be his choice to “stick his head in the sand” because the author of much of this earlier highly relevant work was not a member of his One True Religion exemplifies what Gerr is trying to deny. Thus, I repeat it is indeed appropriate to liken those (including Gerr) who Gerr would like to call skeptics to religious fanatics who are blinded by their faith.

    Conclusion

    In closing this letter, I would like to request that Mr. Gerr refrain from writing letters to the editor on this subject. To say, as he does in his last letter, “There are many points on which Professor Gardner and I disagree, but only two that are worthy of further discussion,” is to try to worm his way out of the debate without admitting defeat. I claim to have used careful reasoning to refute beyond all reasonable doubt every point Mr. Gerr (and Mr. Hinich) has attempted to make. Since he has shown that he cannot provide convincing arguments based on fact and logic to support his position, he should consider the debate closed. To sum up the debate:

    – The resolution, cited in the introductory section of my 2 July 1995 letter to the editor, in contrapositive form, was made by myself in [1].

    – The resolution was challenged by Hinich and defended by myself in April 1994 SP Forum.

    – Hinich’s challenge was supported and my defense was challenged by Gerr in October 1994 SP Forum.

    – Gerr’s arguments were challenged by myself in January 1995 SP Forum.

    – Gerr defended his arguments in March 1995 SP Forum.

    – Gerr’s presumably-final defense was challenged and the final arguments in support of the resolution are made by myself in this letter.

    APPENDIX from July 2, 1995 letter to Editor (published in Nov 1995)

     – Proof of Equivalence Between Time-Averaged and Frequency-Smoothed Cyclic Periodograms

    History and Equivalence of Two Methods of Spectral Analysis

    Published in IEEE SIGNAL PROCESSING MAGAZINE, July 1996

    The purpose of this article is to present a brief history of two methods of spectral analysis and to present, in a tutorial fashion, the derivation of the deterministic relationship that exists between these two methods

    History

    Two of the oldest and currently most popular methods of measuring statistical (average) power spectral densities (PSD’s) are the frequency smoothing method (FSM) and the time averaging method (TAM). The FSM was thought to have originated in 1930 with Norbert Wiener’s work on generalized harmonic analysis [1], and to have been rediscovered in 1946 by Percy John Daniell [2]. But it was discovered only a few years ago (cf. [3]) that Albert Einstein had introduced the method in 1914 [4]. The currently popular method of deriving the FSM begins by showing that adjacent frequency bins in the periodogram have approximately the same correct mean values and the same large variances, and are approximately uncorrelated with each other. Then, it is observed that averaging these bins together retains the correct mean value, while reducing the variance.

    The TAM is often attributed to a 1967 paper by P.D. Welch in the IEEE Transactions on Audio and Electroacoustics [5], but in fact the earliest known proposal of the TAM was by Maurice Stevenson Bartlett in 1948 [6]. The reasoning behind the TAM is similar to that for the FSM: the periodograms on adjacent segments of a data record have approximately the same correct mean values and the same large variances, and they are approximately uncorrelated with each other. Therefore, averaging them together will retain the correct mean value, while reducing the variance. (A more detailed historical account of the FSM, TAM, and other methods is given in [7].) Essentially, every spectral analysis software package available today includes either the FSM or the TAM, or both, often in addition to others. These other methods include, for example, the Fourier transformed tapered autocorrelation method, attributed to Ralph Beebe Blackman and John Wilder Tukey [8] (but used as early as 1898 by Albert A. Michelson [9]); and various model fitting methods that grew out of pioneering work by George Udny Yule in 1927 [10] and Gilbert Walker in 1931 [11].

    It is well known that both the FSM and the TAM yield PSD estimates that can be made to converge to the exact PSD in some probabilistic sense, like in mean square as the length of the data record processed approaches infinity, However, it is much less commonly known that these two methods are much more directly related to each other. The pioneering methods due to Michelson, Einstein, Wiener, Yule, and Walker were all introduced without knowledge of the concept of a stochastic process. But starting in the 1950s (based on the work of mathematicians such as Khinchin, Wold, Kolmogorov, and Cramér in the 1930s and 1940s , the stochastic-process point of view essentially took over. It appears as though this mathematical formalism, in which analysts focus on calculating means and variances and other probabilistic measures of performance, delayed the discovery of the deterministic relationship between the FSM and TAM for about 40 years. That is, apparently it was not until the non-stochastic approach to understanding statistical (averaged) spectral analysis was revived and more fully developed in [7] that a deterministic relationship between these two fundamental methods was derived.

    The next section presents, in a tutorial fashion, the derivation of the deterministic relationship between the FSM and TAM, but generalized from frequency-smoothed and time-averaged versions of the periodogram to same for the biperiodogram (also called the cyclic periodogram [7]). This deterministic relationship is actually an approximation of the time-averaged biperiodogram (TAB) by the frequency-smoothed biperiodogram (FSB) and, of course, vice versa. For evidence of the limited extent to which this deterministic relationship is known, the reader is referred to letters that have appeared in the SP Forum section of this magazine in the October 1994, January 1995, March 1995, and November 1995 issues.

    Equivalence

    Definitions

    Let a(t) be a data-tapering window satisfying a(t)=0 for |t|>T / 2, let r_{a}(\tau) be its autocorrelation

        \[ r_{a}(\tau)=\int_{-T / 2}^{T / 2} a(t+\tau / 2) a(t-\tau / 2) d t \]

    and let A(f) be its Fourier transform

        \[ A(f)=\int_{-T / 2}^{T / 2} a(t) e^{-i 2 \pi ft} d t \]

    Let X_{a}(t, f) be the sliding (in time t) complex spectrum of data x(t) seen through window a

        \[ X_{a}(t, f)=\int_{-T / 2}^{T / 2} a(w) x(t+w) e^{-i 2 \pi f(t+w)} d w \]

    Similarly, let b(t) be a rectangular window of width V, centered at the origin, and let X_{b}(t, f) be the corresponding sliding complex spectrum (without tapering). Also, let R_{a}^{\alpha}(t, \tau) be the sliding cyclic correlogram for the tapered data

        \[ \begin{aligned} R_{a}^{\alpha}(t, \tau)=\int_{-(T-| \tau |) / 2}^{(T-| \tau |) / 2} a(v+\tau / 2) x(t+[v+\tau / 2]) \cdot \\ a(v-\tau / 2) x(t+[v-\tau / 2]) e^{-i 2 \pi \alpha(t+v)} d v \end{aligned} \]

    and let R_{b}^{\alpha}(t, \tau) be the sliding cyclic correlogram without tapering

        \[ R_{b}^{\alpha}(t, \tau)=\frac{1}{V} \cdot \int_{-(V-| \tau |) / 2}^{(V+| \tau |) / 2} x(t+[v+\tau / 2]) x(t+[v-\tau / 2]) \cdot e^{-2 \pi \alpha(t+v)} d v \]

    To complete the definitions, let S_{a} (t ; f_{1}, f_{2}) and S_{b} (t ; f_{1}, f_{2}) be the sliding biperiodograms (or cyclic periodograms) for the data x(t)

        \[ S_{a} (t ; f_{1}, f_{2})=\frac{1}{T} X_{a}(t, f_{1}) X_{a}^{*}(t, f_{2}) \]

        \[ S_{b} (t ; f_{1}, f_{2})=\frac{1}{V} X_{b} (t, f_{1}) X_{b}^{*} (t, f_{2}) \]

    Derivation

    It can be shown (using \alpha=f_{1}-f_{2} ) that (cf. [7, Chapter 11])

        \[ \begin{aligned} &\frac{1}{V} \int_{-V / 2}^{V / 2} S_{a}\left(t-u ; f_{1}, f_{2}\right) d u \\ &=\frac{1}{V} \int_{-V / 2}^{V / 2} \int_{-T}^{T} R_{a}^{\alpha}(t-u, \tau) d u e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau d u \\ &=\int_{-T}^{T} \frac{1}{V} \int_{-V / 2}^{V / 2} R_{a}^{\alpha}(t-u, \tau) d u e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau \\ & \cong \int_{-T}^{T} R_{b}^{\alpha}(t, \tau) r_{a}(\tau) e^{-i \pi\left(f_{1}+f_{2}\right) \tau} d \tau \\ &=\int_{-\infty}^{\infty} S_{a}\left(t ; f_{1}-g, f_{2}-g\right) \frac{1}{T}|A(g)|^{2} d g \end{aligned} \]

    The above approximation, namely

        \[ \frac{1}{V} \int_{-V / 2}^{V / 2} R_{a}^{\alpha}(t-u, \tau) d u \cong R_{b}^{\alpha}(t, \tau) r_{a}(\tau) \]

    for |\tau| \leqslant T, becomes more accurate as the inequality V \gg T grows in strength (assuming that there are no outliers in the data near the edges of the V-length segment, cf. exercise 1 in [7, Chapt. 3] exercise 4b in [7, Chapt. 5], and Section B in [7, Chapt. 11]). For example, if the data is bounded by M, |x(t)| \leqslant M, and a(t) \geqslant 0, then it can be shown that the error in this approximation is worst-case bounded by r_{a}(\tau) M^{2} T / V. The first and last equalities above are simply applications of the cyclic-periodogram/cyclic-correlogram relation first established in [7, Chapter 11] together with the convolution theorem (which is used in the last equality).

    Interpretation 

    The left-most member of the above string of equalities (and an approximation) is a biperiodogram of tapered data seen through a sliding window of length T and time-averaged over a window of length V. If this average is discretized, then we are averaging a finite number of biperiodograms of overlapping subsegments over the V-length data record. (It is fairly well known that little is gained – although nothing but computational efficiency is lost – by overlapping segments more than about 50 percent.) The right-most member of the above string is a biperiodogram of un-tapered data seen through a window of length V and frequency-smoothed along the anti-diagonal g=\left(f_{1}+f_{2}\right) / 2, using a smoothing window (1 / T)|A(g)|^{2}, for each fixed diagonal \alpha=f_{1}-f_{2}. Therefore, given a V-length segment of data, one obtains approximately the same result, whether one averages biperiodograms on subsegments (TAM) or frequency smoothes one biperiodogram on the undivided segment (FSM). Given V, the choice of T determines both the width of the frequency smoothing windows in FSM and the length of the subsegments in TAM. Given V and choosing T \ll V, one can choose either of these two methods and obtain approximately the same result (barring outliers within T of the edges of the data segment of length V. By choosing f_{1}=f_{2} (i.e., \alpha = {0}), we see the biperiodograms reduce to the more common periodograms, and the equivalence then applies to methods of estimation of power spectral densities, rather than bispectra. Bispectra are also called cyclic spectral densities and spectral correlation functions [7]. As first proved in [7], the FSM and TAM spectral correlation measurements converge to exactly the same quantity, namely, the limit spectral correlation function (when it exists), in the limit as V \rightarrow \infty and T \rightarrow \infty, in this order. Further this limit spectral correlation function, also called the limit cyclic spectral density, is equal to the Fourier transform of the limit cyclic autocorrelation, as first proved in [7], where this relation is called the cyclic Wiener relation because it generalizes the Wiener relation between the PSD and autocorrelation from \alpha = {0} to \alpha \neq 0

        \[ S_{x}^{\alpha}(f)= \int R_{x}^{\alpha}(\tau) e^{-i 2 \pi f t} d \tau \]

    where

        \[ R_{x}^{\alpha}(\tau) \triangleq \lim _{T \rightarrow \infty} R_{a}^{\alpha}(t, \tau) \]

        \[ S_{x}^{\alpha}(f) \triangleq \lim _{T \rightarrow \infty} \lim _{V \rightarrow \infty} \frac{1}{V} \int_{-V / 2}^{V / 2} S_{a}\left(t-u ; f_{1}, f_{2}\right) d u \]

    with \alpha=f_{1}-f_{2}.

    In the special circumstance where the inequality T \ll V cannot be satisfied because of the degree of spectral resolution (smallness of 1 / T, that is required, there is no known general and provable argument that either method is superior to the other. It has been argued [e.g., by Gerr] that, since the TAM involves time averaging, it is less appropriate than the FSM for nonstationary data. The results presented here, however, show that, for T \ll V, neither the TAM nor the FSM is more appropriate than the other for nonstationary data. And, when T \ll V is not satisfied, there is no known evidence that favors either method for nonstationary data.

    The derivation of the approximation between the FSM and TAM presented here uses a continuous-time model. However, a completely analogous derivation of an approximation between the discrete-time FSM and TAM is easily constructed. When the spectral correlation function is being measured for many values of the frequency-separation parameter, \alpha, the TAM, modified to what is called the FFT accumulation method (FAM), is much more computationally efficient than the FSM implemented with an FFT [12].

    William A. Gardner
    Professor, Department of Electrical and Computer Engineering
    University of California,
    Davis, CA.

    References

    1. Wiener, N., “Generalized harmonic analysis,” Acta Mathematika, Vol. 55, pp. 117-258, 1930.
    2. Daniell, P. J., “Discussion of ‘On the theoretical specification and sampling properties of autocorrelated time-series’,” J Royal Statistic. Soc., Vol. 8B, No. 1, pp 27-97, 1946.
    3. Gardner, W. A., “Introduction to Einstein’s contribution to time-series analysis,” IEEE Signal Processing Magazine, Vol. 4, pp. 4-5, 1987.
    4. Einstein, A., “Méthode pour la détermination de valeurs statistiques d’observations concernant des grandeurs sourmises à des fluctuations irrégulières,” Archives des Sciences Physiques et Naturelles, Vol. 37, pp. 254-256, 1914.
    5. Welch, P. D., “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Transactions on Audio and Electroacoustics, Vol. AU-15, pp. 70-73, 1967.
    6. Bartlett, M. S., “Smoothing periodograms from time-series with continuous spectra,” Nature, Vol. 161, pp. 686-687, 1948.
    7. Gardner, W. A., Statistical Spectral Analysis: A Nonprobabilistic Theory. Englewood Cliffs, NJ: Prentice-Hall, 1987.
    8. Blackman, R. B. and J. W. Tukey, The Measurement of Power Spectra, New York: AT&T, 1958 (Also New York: Dover, 1959).
    9. Michelson, A. A. and S. W. Stratton, “A new harmonic analyzer,” American Journal of Science, Vol. 5, pp. 1-13, 1898.
    10. Yule, G. U., “On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers,” Phil. Trans. Royal Soc: London A, Vol. 226, pp. 267-298, 1927.
    11. Walker, G., “On periodicity in series of related terms,” Proceedings of the Royal Society, Vol. 131, pp. 518-532, 1931.
    12. Roberts, R. S., W. A. Brown, and H. H. Loomis, Jr., “Computationally efficient algorithms for cyclic spectral analysis,” IEEE Signal Processing Magazine, Vol. 8, pp. 38-49, 1991.
    3.6.2 The debate

    This section is comprised of the following letters to the editor of IEEE Signal Processing Magazine:

    1 – Apr 1994, pp. 14, 16 (reprint in SP Magazine of Hinich’s book review in SIAM Review (1991), pp. 677-678)

    2 – Apr 1994, pp. 16, 18, 20, 22, 23 (Gardner’s Comments including Ensembles in Wonderland)

    3 – Oct 1994, p. 12 (Gerr’s comments)

    4 – Jan 1995, pp. 12, 14 (Gardner’s comments in response to Gerr)

    5 – Mar 1995, p. 16 (Gerr’s comments—2nd try)

    6 – Jul 1995, pp. 19 – 21 Gardner’s final response (reproduced at beginning of page 3.4.1 above)