Theme: For stationary and cyclostationary time series, a wrong turn in their mathematical modeling was taken almost a century ago. Today, Academia should engage in remediation to overcome the detrimental influence on the teaching and practice of time-series analysis in Science and Engineering.
The objective of this page is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the theory of Stochastic processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Notes on the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.
The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.
Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.
Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website.
Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp.
Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But, the statistics sought with Monte Carlo simulations of stationary and cyclostationary time series can more easily be obtained from time averages on a single record. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In such cases, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.4. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.
For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist, and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally-trained employees who peter out and hire a new batch.
In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.
FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from anything empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.
BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t – s). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.4. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.
The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of this page 3. On page 3.4 it is shown that the differences referred to here are in some cases advantageous for the FOT-probabilistic model and in other cases disadvantageous.
The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically-driven work, in which the theory of stochastic processes surfaced, which ran its course in about 50 years. The mathematically driven development of stochastic processes, has continued beyond that initial period, but primarily centered on nonstationary processes, rather than primarily stationary processes. The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20^{th} century and extending to the present.
Statistical metrics for time series such as mean, bias, variance, coefficient of variation, covariance, and correlation coefficient can be defined using finite-time averages as replacements for expected values in well-known probabilistic metrics. These statistical metrics also can be arrived at from nothing more than a little thought, without any reference to probability or expected value. In fact, all of these statistical metrics were in use long before the probabilistic theory of stochastic processes was developed.
In the book [Bk2], such non-probabilistic statistical metrics are used for statistical spectral analysis. The resultant theory for understanding how to perform and study statistical spectral analysis is the lowest level in a hierarchy of non-stochastic theories of statistical spectral analysis and, more generally, time-series analysis. This level is referred to as the purely empirical non-probabilistic theory.
The next level up in the hierarchy is referred to as the purely empirical FOT-probabilistic theory, where, as explained elsewhere, FOT stands for Fraction-of-Time. This theory is defined in the presentation below. The third and highest level in the hierarchy is referred to as the non-stochastic FOT-probabilistic theory. This theory is fully developed in the book [Bk2].
In this section, the terms purely empirical, probabilistic, and stochastic are defined and the three individual levels of the hierarchy are defined and illustrated. The following material was prepared for presentation at the 2021 On-Line Grodek Conference on Cyclostationarity.
Preliminary Definitions – –
Def: PURELY EMPIRICAL (THEORY/METHOD)
– Excludes ensembles of outcomes of hypothetical experiments
– Excludes mathematical limits as some parameter approaches infinity, such as averaging time
– Excludes all quantities (e.g., Expected Values) that are not identifiable as, or cannot be calculated/computed from recorded physical measurements or observations
– Consequence: As applied to Statistical Spectral Analysis: the mathematical descriptions of calculations consist of primarily integral calculus including Fourier transform theory and/or discrete-time counterpart
Def: STATISTICAL SPECTRUM
– A Statistical Spectrum is an empirically-averaged spectrum
– A Probabilistic Spectrum (e.g., the standard Power Spectral Density of a stochastic process) is a mathematical quantity
– Ex: The Statistical Probabilistic Theory of Communications. (The traditional name used is statistical, but the theory is mostly probabilistic)
In order of Ease of Mathematization (proving existence of key quantities; level of mathematical sophistication required):
1. The Purely Empirical NON-probabilistic theory (finite time) for approximately S (stationary), CS (cyclostationary), and PCS (poly-cyclostationary) time series–introduced in 1987 for statistical spectral analysis
2. NEW: The Purely Empirical FOT-probabilistic theory (finite time) for approximately S, CS, and PCS time series–formally introduced in 2021
3. The Nonstochastic FOT-probabilistic theory (infinite time) for exactly S, CS, PCS, and almost cyclostationary (ACS) time series–introduced in 1987 for statistical spectral analysis
– Existing theory for stationary and cyclostationary time series is not purely empirical because the key quantities in the theory are based on infinite limits of time averages and evaluation of these limits requires an analytical model of a time series, not just empirical measurements represented by mathematical symbols.
– The required property of joint relative measurability in the 2006 Leskow-Napolitano theory cannot be verified empirically, because it requires analytical calculation based on an analytical model of an infinitely long time series.
– Strictly speaking, no times series can be said to be “at hand” if it is infinitely long. This is a term we have used since my 1987 SSA book to refer to a single time series as distinguished from a hypothetical ensemble of time series. But this term is used loosely when applied to infinitely long time series.
– Similarly, Almost CS time series cannot be distinguished from PolyCS time series in a purely empirical theory (PolyCS means exhibits at most a finite number of harmonically unrelated cycle frequencies)
– The motivation for using infinitely-long time averages is that it enables exact quantification, not just approximation (analogous to expected values).
– But all averages in a purely empirical theory must be based on finite-time averages. Finite-length time series can indeed be “at hand”.
– My 1987 SSA book presents an analytical nonprobabilistic theory of statistical (time-averaged) spectral analysis that approximately quantifies temporal and spectral resolution and reliability (repeatability over time) for finite-length time series. [see parts of Chapters 2, 3, 7, 11, and slide 5 here]
– But this theory does not use the concept of probability—the closest thing to it that is used is the calculated finite-time temporal coefficient of variation of time-dependent measurements, such as spectral density and spectral correlation density.
– When the concept of probability is introduced in the 1987 SSA book, it is done in terms of infinite limits of time averages and it, therefore, forfeits the empiricism of the finite-time nonprobabilistic theory.
– We can easily construct a Purely Empirical FOT-Probabilistic theory as long as we accept approximate quantification of some of the relationships.
– Example 1: We can show that a statistical spectrum is approximately normal for sufficiently large averaging time, but we cannot prove it is asymptotically exactly normal, because infinite limits are outside the scope of the calculations allowed by an empirical theory
– Example 2: We can show that the difference between time-averaged and frequency-smoothed statistical spectral correlation measurements can be made small when the temporal/spectral resolution product is large, but we cannot prove it is asymptotically zero, because infinite limits are outside the scope . . .
–
–
– Definitions of finite-time FOT Cumulative Distributions for approximately S, CS, and PolyCS time series
– Same as those for exactly S, CS, and PolyCS time series, but without the limits
– These FOT Cumulative Distributions can be computed from empirical data
– Fundamental Theorems of Averaging and Sine-Wave Component Extraction: same
Same as existing infinite-time CD definitions, but without taking the limit as averaging time
– Approximately Stationary FOT CD
– Approximately Cyclostationary FOT CD
– Approximate -Cyclic FOT CD
– Approximate -Periodic FOT CD
– Exact Relationship (see slide 8 for proof)
– Approximate Polyperiodic FOT CD
Notes:
1) The shift by is introduced in order to exclude at .
2) Because is discontinuous, the use of the Dirac Deltas as shown above requires special justification: because the approximately -periodic CD is defined for all time in an interval of length , the undefined finite values of time samples that occur at time points of discontinuity can be ignored if they occur only for times confined to a possibly-disjoint set of measure zero. Conditions on that guarantee this are being sought.
Application To Cyclic Moments
Increase as the number of periods averaged over increases:
– It has been shown that there exists an entirely empirical FOT probabilistic theory of approximately stationary, cyclostationary and poly-cyclostationary times series.
– All quantities occurring in the theory can be calculated from physically measured/observed time series data on finite intervals
– This theory should appeal to practitioners who actually analyze or otherwise process empirical data.
– Relative to the idealized non-empirical FOT probabilistic theory of exactly (but non-empirical) cyclostationary and poly-cyclostationary and almost cyclostationary time series, this new theory probably has some drawbacks even if the users are restricted to empiricists. Approximate relationships can become messy relative to exact relationships. But, in such situations, one can always temporarily resort to the exact theory based on limits.
This page consists of a compilation of unpublished essays, workshop presentations, published articles, brief technical notes, communications between collaborators, etc. on the pros and cons of a paradigm shift from the stochastic process theory of cyclostationarity to the Fraction-of-Time probabilistic theory of cyclostationarity.
The first item is the set of slides used for Section IV of the opening Plenary Lecture for the first international Workshop on Cyclostationarity. To repeat an explanation given on Page 2, some readers may wonder why this is appropriate considering that this workshop was held 30 years ago! (in 1992). I consider this appropriate because I developed these slides specifically for a broad group of highly motivated students. I say they were students solely because they traveled from far and wide specifically to attend this educational program. In fact, the participants of the workshop were mostly senior researchers in academia, industry, and government laboratories. Knowing the workshop was a success and knowing all the topics covered are as important today as they were then, I have chosen this presentation as ideal for the purposes of this website. In particular, the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity is in about the same state it was in 30 years ago, with the important exception of progress on measure-theoretic considerations of these two alternative theories that is reported on Page 3.4. That is, many of the questions raised in1992 remain unanswered, though a few have been addressed in published journal papers that are cited throughout Page 3.
The unavoidable absence of detail in the presentation slides for Sec. IV presented below is made up for, to the extent that progress has been achieved in the ensuing 30 years, throughout this Page 3 and the sources linked to herein.
Because the theoretical comparison of stochastic process modeling and non-stochastic FOT-probabilistic modeling of cyclostationarity summarized in this Section IV of the Plenary is a relatively technical subject, it is recommended that students consider this section to be only a concise overview and that they follow up on it with Chapter 1 in the book [Bk5], This chapter not only describes the duality between the stochastic and nonstochastic theories of cyclostationarity, but also derives the nonstochastic FOT-probabilistic theory from an inquiry into the nature of the property of time functions that is responsible for the defining characteristic of cyclostationarity: that fine-strength sine waves can be generated from cyclostationary functions by subjecting the functions to time-invariant nonlinear transformations. This inquiry leads naturally to the definitions of cyclic probabilistic moments and cyclic probability distributions and, more generally, cyclic expectation; and, in Chapter 2 of the book [Bk5], cyclic probabilistic cumulants. This is to be contrasted with the stochastic theory of cyclostationarity in which these key probabilistic quantities are simply posited on the basis of mathematical considerations only, with not even a mention of generating sine waves, which is a key characteristic of the physical manifestation of cyclostationarity.
The direct relevance of this discussion to the primary subject of this website is the claim herein that science and engineering were done great harm by mathematicians’ hard sell of the stochastic process model to the exclusion of the non-stochastic time-series model that came before.
With a brief look ahead at Page 7, one can surmise that this hard sell reflects inadequate Right-Brain (RB) activity which would have been required to reveal the absence of a necessity to use such unrealistic and overly abstract models—something that has unnecessarily burdened teachers and students alike, and of course practicing engineers and scientists, with the challenge to each and every one of them to bring to bear the considerable RB activity required to make sense of the huge conceptual gap between the reality from nature of a single time-series of measured/observed data and the mathematical fiction of a typically-infinite ensemble of hypothetical time-series together with a probability law (a mathematical creation) governing the ensemble average over all the fictitious time series. All these poor unsuspecting individuals were left to close this conceptual gap on their own, being armed with nothing more than a mathematical theorem, which only rarely can be applied in practice, that gives the mathematical condition on a stochastic process model under which its ensemble averages equal (in an abstract sense; i.e., with probability equal to 1) the time averages over individual time-series in the ensemble. This condition on the probability law ensures that expected values of a proposed stochastic process mathematically calculated (a Left-Brain (LB) activity) from the mathematical model equal time averages measured from a single time-series member of the ensemble, assumed to be the times series that actually exists in practice. But this equality imposes another condition, namely that we mathematically take the limit of the time average as the amount of averaging time approaches infinity. Thus, the theorem—called the Ergodic Theorem—doesn’t actually address reality, because one never has an infinitely long segment of time-series data. Moreover, the theorem is of little-to-no operational utility because the condition on the probability law can only rarely be tested for a given specific stochastic process model. Thus, most users of stochastic process theory rely conceptually on what is called the Ergodic Hypothesis by which one simply assumes the condition of the Ergodic Theorem is satisfied for whatever stochastic process model one chooses to work with. Faith of this sort has no place in science and engineering.
In my opinion, acceptance of all this gibberish and going forward with the stochastic process concept as the only option for mathematically modeling real time-series data requires abandonment of RB thinking. There really is no way to justify this abstraction of reality as a necessary evil. The fraction-of-time probabilistic model of single times series is an alternative option that avoids departing so far from the reality of measured/observed time-series data, its empirical statistical analysis, the mathematical modeling of the time-series, and the results of the analysis. The wholesale adoption by academicians of the stochastic process foisted upon them by mathematicians suggests these academicians, as well as the mathematicians, suffer from low-level RB activity.
Discussion to be continued . . .
The 1987 book, Statistical Spectral Analysis: A Nonprobabilistic Theory, argues for more judicious use of the modern stochastic-process-model (arising from the work of mathematicians in the 1930s, such as Khinchin, Kolmogorov, and others) instead of the more realistic predecessor: the time-series model first developed mathematically by Norbert Wiener in 1930 (see also page 59 of Wiener 1949, written in 1942, regarding the historical relationship between his and Kolmogorov’s approaches), that was briefly revisited in the 1960s by engineers before it was buried by mathematicians. The brief tongue-in-cheek essay Ensembles in Wonderland, published in IEEE Signal Processing Magazine, AP Forum, 1994 and reproduced below, is an attempt at satirizing the outrage typified by narrow-minded thinkers exemplified by two outspoken skeptics, Neil Gerr and Melvin Hinich, who wrote scathing remarks and a book review characterizing this book as utter nonsense.
Consider the parallel to the book Alice in Wonderland; the following is comprised of excerpts taken from https://en.wikipedia.org/wiki/Alice’s_Adventures_in_Wonderland: Martin Gardner and other scholars have shown the book Alice in Wonderland [written by Lutwidge Dodgson under the pseudonym Lewis Carroll] to be filled with many parodies of Victorian popular culture. Since Carroll was a mathematician at Christ Church, it has been argued that there are many references and mathematical concepts in both this story and his later story Through the Looking Glass; examples include what have been suggested to be illustrations of the concept of a limit, number bases and positional numeral systems, the converse relation in logic, the ring of integers modulo a specific integer. Deep abstraction of concepts, such as non-Euclidean geometry, abstract algebra, and the beginnings of mathematical logic, was taking over mathematics at the time Alice in Wonderland was being written (the 1860s). Literary scholar Melanie Bayley asserted in the magazine New Scientist that Alice in Wonderland in its final form was written as a scathing satire on new modern mathematics that was emerging in the mid-19th century.
Today, Dodgson’s satire appears to be backward looking because, after all, there are strong arguments that modern mathematics has triumphed. Coming back to the topic of interest here, stochastic processes have triumphed in terms of being wholly adopted in mathematics and science and engineering, except for a relatively small contingent of empirically-minded scientists and engineers. Yet, recent mathematical arguments, summarized in [B2], provide a sound mathematical basis for reversing this outcome, especially when the overwhelming evidence of practical and pragmatic and pedagogic and overarching conceptual advantages provided in the 1987 book is considered. The present dominance of the more abstract and less realistic stochastic process theory might be viewed as an example of the pitfalls of what has become known as groupthink or the inertia of human nature that resists changes in thinking, which is exemplified on Page 7.
Before presenting the article “Ensembles in Wonderland”, the final letter to SP Forum in the debate is reproduced first for the sake of hindsight.
The debate:
July 2, 1995 (published in Nov 1995)
To the Editor:
Introduction
This is my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], and carried on by Mr. Neil Gerr through his letters to SP Forum.
In this letter, I supplement my previous remarks aimed at clarifying the precariousness of Hinich’s and Gerr’s position by explaining the link between my argument in favor of the utility of fraction-of-time (FOT) probability and the subject of a plenary lecture delivered at ICASSP ’94. In the process of discussing this link I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition–that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework–simply cannot be defended if argument is to be based on fact and logic.
David J. Thomson’s Transcontinental Waveguide Problem
To illustrate that the stochastic-process conceptual framework is often applied to physical situations where the time-average framework is a more natural choice, I have chosen an example from D. J. Thomson’s recent plenary lecture on the project that gave birth to the multiple-window method of spectral analysis [2]. The project that was initiated back in the mid-1960s was to study the feasibility of a transcontinental millimeter waveguide for a telecommunications transmission system potentially targeted for introduction in the mid-1980s. It was found that accumulated attenuation of a signal propagating along a circular waveguide was directly dependent on the spectrum of the series, indexed by distance, of the erratic diameters of the waveguide. So, the problem that Thomson tackled was that of estimating the spectrum for the more than 4,000-mile-long distance-series using a relatively small segment of this series that was broken into a number of 30-foot long subsegments. (It would take more than 700,000 such 30-foot sections to span 4,000 miles.) The spectrum had a dynamic range of over 100 dB and contained many periodic components, indicating the unusual challenge faced by Thomson.
When a signal travels down a waveguide (at the speed of light) it encounters the distance-series of erratic waveguide-diameters. Because of the constant velocity, the distance-series is equivalent to a time-series. Similarly, the series of diameters that is measured for purposes of analysis is—due to the constant effective velocity of the measurement device—equivalent to a time-series. So, here we have a problem where there is one and only one long time-series of interest (which is equivalent to a distance-series)–-there is no ensemble of long series over which average characteristics are of interest and, therefore, there is no obvious reason to introduce the concept of a stochastic process. That is, in the physical problem being investigated, there was no desire to build an ensemble of transcontinental waveguides. Only one (if any at all) was to be built, and it was the spectral density of distance-averaged (time-averaged) power of the single long distance-series (time-series) that was to be estimated, using a relatively short segment, not the spectral density of ensemble-averaged power. Similarly, if one wanted to analytically characterize the average behavior of the spectral density estimate (the estimator mean) it was the average of a sliding estimator over distance (time), not the average over some hypothetical ensemble, that was of interest. Likewise, to characterize the variability of the estimator, it was the distance-average squared deviation of the sliding estimator about its distance-average value (the estimator variance) that was of interest, not the variance over an ensemble. The only apparent reason for introducing a stochastic process model with its associated ensemble, instead of a time-series model, is that one might have been trained to think about spectral analysis of erratic data only in terms of such a conceptual artifice and might, therefore, have been unaware of the fact that one could think in terms of a more suitable alternative that is based entirely on the concept of time averaging over the single time-series. (Although it is true that the time-series segments obtained from multiple 30 ft. sections of waveguide could be thought of as independent random samples from a population, this still does not motivate the concept of an ensemble of infinitely long time-series–a stationary stochastic process. The fact remains that, physically, the 30-foot sections represent subsegments of one long time-series in the communications system concept that was being studied.)
It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process (the model adopted by Thomson) except to accommodate lack of familiarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework. Somehow, he does not recognize the mental gyrations required to force this and other physical problems into the stochastic process framework.
Gerr’s Letter
Having explained the link between my argument in favor of the utility of FOT probability and Thomson’s work, let us return to Gerr’s letter. Mr. Gerr, in discussing what he refers to as “a battle of philosophies,” states that I have erred in likening skeptics to religious fanatics. But in the same paragraph we find him defensively trying to convince his readers that the “statistical/probabilistic paradigm” has not “run out of gas” when no one has even suggested that it has. No one, to my knowledge, is trying to make blanket negative statements about the value of what is obviously a conceptual tool of tremendous importance (probability) and no one is trying to denigrate statistical concepts and methods. It is only being explained that interpreting probability in terms of the fraction-of-time of occurrence of an event is a useful concept in some applications. To argue, as Mr. Gerr does again in the same paragraph, that in general this concept “has no obvious advantages” and using it is “like building a house without power tools: it can certainly be done, but to what end?” is, as I stated in my previous letter, to behave like a religious fanatic — one who believes there can be only One True Religion. This is a very untenable position in scientific research.
As I have also pointed out in my previous letter, Mr. Gerr is not at all careful in his thinking. To illustrate his lack of care, I point out that Gerr’s statement “Professor Gardner has chosen to work within the context of an alternative paradigm [fraction-of-time probability]”, and the implications of this statement in Gerr’s following remarks, completely ignore the facts that I have written entire books and many papers within the stochastic process framework, that I teach this subject to my students, and that I have always extolled its benefits where appropriate. If Mr. Gerr believes in set theory and logic, then he would see that I cannot be “within” paradigm A and also within paradigm B unless A and B are not mutually exclusive. But he insists on making them mutually exclusive, as illustrated in the statement “From my perspective, developing signal processing results using the fraction-of-time approach (and not probability/statistics) … .” (The parenthetical remark in this quotation is part of Mr. Gerr’s statement.) Why does Mr. Gerr continue to deny that the fraction-of-time approach involves both probability and statistics?
Another example of the lack of care in Mr. Gerr’s thinking is the convoluted logic that leads him to conclude “Thus, spectral smoothing of the biperiodogram is to be preferred when little is known of the signal a priori.” As I stated in my previous letter, it is mathematically proven* in [1] that the frequency smoothing and time averaging methods yield approximately the same result. Gerr has given us no basis for arguing that one is superior to the other and yet he continues to try to make such an argument. And what does this have to do with the utility of the fraction-of-time concept anyway? These are data processing methods; they do not belong to one or another conceptual framework.
To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing referred to above was first derived by using the fraction-of-time conceptual framework [1]. If there is no conceptual advantage to this framework, why wasn’t such a fundamental result derived during the half century of research based on stochastic processes that preceded [1]? The second example is taken from the first attempt to develop a theory of higher-order cyclostationarity for the conceptualization and solution of problems in communication system design. In [3], it is shown that a fundamental inquiry into the nature of communication signals subjected to nonlinear transformations led naturally to the fraction-of-time probability concept and to a derivation of the cumulant as the solution to a practically motivated problem. This is, to my knowledge, the first derivation of the cumulant. In all other work, which is based on stochastic processes (or non-fraction-of-time probability) and which dates back to the turn of the century, cumulants are defined, by analogy with moments, to be coefficients in an infinite series expansion of a transformation of the probability density function (the characteristic function), which has some useful properties. If there is no conceptual advantage to the fraction-of-time framework, why wasn’t the cumulant derived as the solution to the above-mentioned practical problem or some other practical problem using the orthodox stochastic-probability framework?
Conclusion
Since no one in the preceding year has entered the debate to indicate that they have new arguments for or against the philosophy and corresponding theory and methodology presented in [1], it seems fair to proclaim the debate closed. The readers may decide for themselves whether the resolution put forth in [1] was defeated or was upheld.
* A more detailed and tutorial proof of this fundamental equivalence is given in the article “The history and the equivalence of two methods of spectral analysis,” Signal Processing Magazine, July 1996, No.4, pp.20 – 23, which is copied into the Appendix farther down this Page .
But regarding the skeptics, I sign off with a humorous anecdote:
When Mr. Fulton first showed off his new invention, the steamboat, skeptics were crowded on the bank, yelling ‘It’ll never start, it’ll never start.’
It did. It got going with a lot of clanking and groaning and, as it made its way down the river, the skeptics were quiet.
For one minute.
Then they started shouting. ‘It’ll never stop, it’ll never stop.’
— William A. Gardner
References
Excerpts from earlier versions of above letter to the editor before it was condensed for publication:
April 15, 1995
Introduction
In this, my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], I shall begin by addressing two remarks in the opening paragraph of Mr. Neil Gerr’s last letter (in March 1995 SP Forum). In the first remark, Mr. Gerr suggests that the “bumps and bruises” he sustained by venturing into the “battle” [debate] were to be expected. But I think that such injuries could have been avoided if he had all the relevant information at hand before deciding to enter the debate. This reminds me of a story I recently heard:
Georgios and Melvin liked to hunt. Hearing about the big moose up north, they went to the wilds of Canada to hunt. They had hunted for a week, and each had bagged a huge moose. When their pilot Neil landed on the lake to take them out of the wilderness, he saw their gear and the two moose. He said, “I can’t fly out of here with you, your gear, and both moose.”
“Why not?” Georgios asked.
“Because the load will be too heavy. The plane won’t be able to take off.”
They argued for a few minutes, and then Melvin said, “I don’t understand. Last year, each of us had a moose, and the pilot loaded everything.”
“Well,” said Neil, “I guess if you did it last year, I can do it too.”
So, they loaded the plane. It moved slowly across the lake and rose toward the mountain ahead. Alas, it was too heavy and crashed into the mountain side. No one was seriously hurt and, as they crawled out of the wreckage in a daze, the bumped and bruised Neil asked, “Where are we?”
Melvin and Georgios surveyed the scene and answered, “Oh, about a mile farther than we got last year.”
If Mr. Gerr had read the book [1] and put forth an appropriate level of effort to understand what it was telling him, he would have questioned Mr. Hinich’s book review and would have seen that the course he was about to steer together with the excess baggage he was about to take on made a crash inevitable.
A friend of mine recently offered me some advice regarding my participation in this debate. “Why challenge the status quo”, he said, “when everybody seems happy with the way things are.” My feeling about this is summed up in the following anecdote:
“Many years ago, a large American shoe manufacturer sent two sales reps out to different parts of the Australian outback to see if they could drum up some business among the aborigines. Sometime later, the company received telegrams from both agents.
The first one said. ‘No business. Natives don’t wear shoes.’
The second one said, ‘Great opportunity here–natives don’t wear shoes.'”
Another friend asked “why spend your time on this [debate] when you could be solving important problems.” I think Albert Einstein answered that question when he wrote:
“The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science”
This underscores my belief that we are overemphasizing “engineering training” in our university curricula at the expense of “engineering science.” It is this belief that motivates my participation in this debate. Instead of plodding along in our research and teaching with the same old stochastic process model for every problem involving time-series data, we should be looking for new ways to think about time-series analysis.
In the second remark in Mr. Gerr’s opening paragraph, regarding my response to Mr. Gerr’s October 1994 SP Forum letter in sympathy with “Hinich’s gleefully vicious no-holds-barred review” of [1], Mr. Gerr says “Even by New York standards, it [my response] seemed a bit much.” Well, I guess I was thinking about what John Hancock said, on boldly signing the Declaration of Independence:
There, I guess King George will be able to read that!
Like the King of England who turned a deaf ear to the messages coming from the new world, orthodox statisticians, like Messrs. Hinich and Gerr who are mired in tradition seem to be hard of hearing–a little shouting might be needed to get through to them.
Nevertheless, I am disappointed to see no apparent progress, on Mr. Gerr’s part, in understanding the technical issues involved in his and Hinich’s unsupportable position that the time-average framework for statistical signal processing has, and I quote Gerr’s most recent letter, “no obvious advantages.” I hasten to point out, however, that this most recent position is a giant step back from the earlier even more indefensible position taken by Hinich in his book review, reprinted in April 1994 SP Forum, where much more derogatory language was used.
In this letter, I make a final attempt to clarify the precariousness of Hinich’s and Gerr’s position by explaining links between my arguments and the subjects of two plenary lectures delivered at ICASSP ’94. In the process of discussing these links and this paper, I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition, that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework, simply cannot be defended if argument is to be based on fact and logic.
Lotfi Zadeh and Fuzzy Logic
I wish that Mr. Gerr would let go of the fantasy about “the field where the Fraction-of-Timers and Statisticians do battle.” There do not exist two mutually exclusive groups of people—one of which can think only in terms of fraction-of-time probability and the other of which call themselves Statisticians. How many times and in how many ways does this have to be said before Mr. Gerr will realize that some people are capable of using both fraction-of-time probability and stochastic process concepts, and of making choices between these alternatives by assessing the appropriateness of each for each particular application? Mr. Gerr’s “battle” of “fraction-of-time versus probability/statistics” simply does not exist. This insistence on a dichotomy of thought is strongly reminiscent of the difficulties some people have had accepting the proposition that the concept of fuzziness is a useful alternative to the concept of probability. The vehement protests against fuzziness are for most of us now almost laughable.
To quote Professor Lotfi Zadeh in his recent plenary lecture [2]
“[although fuzzy logic] offers an enhanced ability to model real-world phenomena…[and] eventually fuzzy logic will pervade most scientific theories…the successes of fuzzy logic have also generated a skeptical and sometimes hostile reaction…Most of the criticisms directed at fuzzy logic are rooted in a misunderstanding of what it is and/or a lack of familiarity with it.”
I would not suggest that the time-average approach to probabilistic modeling and statistical inference is as deep a concept, as large a departure from orthodox thinking, or as broadly applicable as is fuzzy logic, but there are some definite parallels, and Professor Zadeh’s explanation of the roots of criticism of fuzzy logic applies equally well to the roots of criticism of the time-average approach as an alternative to the ensemble-average or, more accurately, the stochastic-process approach. In the case of fuzzy logic, its proponents are not saying that one must choose either conventional logic and conventional set theory or their fuzzy counterparts as two mutually exclusive alternative truths. Each has its own place in the world. Those opponents who argue vehemently that the unorthodox alternative is worthless can be likened to religious fanatics. This kind of intolerance should have no place in science. But it is all too commonplace and it has been so down through the history of science. So surely, one cannot expect to find its absence in connection with the time-average approach to probabilistic modeling and statistical inference. Even though experimentalists in time-series analysis (including communication systems analysis and other engineered-systems analysis) have been using the time-average approach (to various extents) for more than half a century, there are those like Gerr and Hinich who “see no obvious advantages.” This seems to imply that Mr. Gerr has one and only one interpretation of a time-average measurement on time series data—namely an estimate of some random variable in an abstract stochastic process model. To claim that this mathematical model is, in all circumstances, the preferred one is just plain silly.
David J. Thomson and the Transcontinental Waveguide –addition to published discussion:
[It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process except to accommodate unfamiliarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework.] It is correct in this case that a sufficiently capable person would obtain the same result using either framework, but it is incorrect to not recognize the mental gyrations required to force this physical problem into the stochastic process framework. My claim—and the reason I wrote the book [1]—is that our students deserve to be made aware of the fact that there are two alternatives. It is pigheaded to hide this from our students and force them to go through the unnecessary and sometimes confusing mental gyrations required to force-fit the stochastic process framework to real-world problems where it is truly an unnecessary and, possibly, even inappropriate artifice.
Gerr’s Letter—addition to published letter:
To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing, whose proof is outlined in the Appendix at the end of this letter, was first derived by using the fraction-of-time conceptual framework [1].
An Illustration of Blinding Prejudice
To further illustrate the extent to which Mr. Gerr’s prejudiced approach to scientific inquiry has blinded him, I have chosen one of his research papers on the subject of cyclostationary stochastic processes. In [5], Mr. Gerr (and his coauthor) tackle the problem of detecting the presence of cyclostationarity in an observed time-series. He includes an introduction and references sprinkled throughout that tie his work to great probabilists, statisticians, and mathematicians. (We might think of these as the “Saints” in Mr. Gerr’s One True Religion.) This is strange, since his paper is nothing more than an illustration of the application of a known statistical test (and a minor variation thereof) to synthetic data. It is even more strange that he fails to properly reference work that is far more relevant to the problem of cyclostationarity detection. But I think we can see that there is no mystery here. The highly relevant work that is not cited is authored by someone who champions the value of fraction-of-time probabilistic concepts. The fact that the relevant publications (known to Gerr) actually use the stochastic process framework apparently does not remove Mr. Gerr’s blinders. All he can see–it would seem–is that the author is known to argue (elsewhere) that the stochastic process framework is not always the most appropriate one for time-series analysis, and this is enough justification for Mr. Gerr to ignore the highly relevant work by this “heretic” author (author of the book [1] that Hinich all but said should be burned).
To be specific, Mr. Gerr completely ignores the paper [6] (published 1-1/2 years prior to the submission of Gerr’s paper) and the book [7] (published 4 years prior) wherein the problem of cyclostationarity detection is tackled using maximum-likelihood [6], maximum-signal-to-noise ratio [6], [7], and other optimality criteria, all of which lead to detection statistics that involve smoothed biperiodograms (and that also identify optimal smoothing) which are treated by Gerr as if they were ad hoc. Mr. Gerr also cites a 1990 publication (which does not appear in his reference list) that purportedly shows that the integrated biperiodogram (cyclic periodogram) equals the cyclic mean square value of the data (cf. (12)); but this is a special case of the much more useful result, derived much earlier than 1990, that the inverse Fourier transform of the cyclic periodogram equals the cyclic correlogram. The argument, by example, that Gerr proffers to show that (12) (the cyclic correlogram at zero lag) is sometimes a good test statistic and sometimes a bad one is trivialized by this Fourier transform relation (cf. [1]) and the numerous mathematical models for data for which the idealized quantities (cyclic autocorrelations, and cyclic spectral densities) in this relation have been explicitly calculated (cf. [1], [7]). These models include, as special cases, the examples that Gerr discusses superficially. The results in [1], [7] show clearly when and why the choice of zero lag made by Gerr in (12) is a poor choice. As another example, consider Mr. Gerr’s offhand remark that a Mr. Robert Lund (no reference cited) “has recently shown that for the current example (an AM signal with a square wave carrier) only lines [corresponding to cycle frequencies] spaced at even multiples of d=8 [the reciprocal of the period of the carrier] will have nonzero spectral (rz) measure.” This result was established in a more general form many years earlier in his coauthor’s Ph.D. dissertation (as well as in [1]) where one need only apply the extremely well-known fact that a symmetrical square wave contains only odd harmonics.
To go on, the coherence statistic that Gerr borrows from Goodman for application to cyclostationary processes has been shown in [7] to be nothing more than the standard sample statistic for the standard coherence function (a function of a single frequency variable) for two processes obtained from the one process of interest by frequency-shifting data transformations–except for one minor modification; namely, that time-averaged values of expected values are used in place of non-averaged expected values in the definition of coherence because the processes are asymptotically mean stationary, rather than stationary. Therefore, the well-known issues regarding frequency smoothing in these cross-spectrum statistics need not be discussed further, particularly in the haphazard way this is done by Gerr, with no reliance on analysis of specific underlying stochastic process models.
Continuing, the incoherent average (13) proposed by Gerr for use with the coherence statistic is the only novel contribution of this paper, and I claim that it is a poor statistic. The examples used by Gerr show that this “incoherent statistic” outperforms the “coherent statistic,” but what he does not recognize is that he chose the wrong coherent statistic for comparison. He chose the cyclic correlogram with zero lag (12), which is known to be a poor choice for his examples. For his example in Figure 9, zero lag produces a useless statistic, whereas a lag equal to T/2 is known to be optimum, and produces a “coherent statistic” that is superior to Gerr’s incoherent statistic. Thus, previous work [1], [7] suggests that a superior alternative to Gerr’s incoherent statistic is the maximum over a set of lag-indexed coherent statistics.
Finally, Mr. Gerr’s vague remarks about choosing the frequency-smoothing window-width parameter M are like stabs in the dark by comparison with the thorough and careful mathematical analysis carried out within–guess what–the time-average conceptual framework in [1] in which the exact mathematical dependence of bias and variance of smoothed biperiodograms on the data-tapering window shape, the spectral-smoothing window shape, and the ideal spectral correlation function for the data model are derived, and in which the equivalence between spectral correlation measurement and conventional cross-spectrum measurement is exploited to show how conventional wisdom [1, chapter 5, 7] applies to spectral correlation measurement [1,chapters 11, 13, 15].
In summary, Gerr’s paper is completely trivialized by previously published work of which he was fully aware. What appears to be his choice to “stick his head in the sand” because the author of much of this earlier highly relevant work was not a member of his One True Religion exemplifies what Gerr is trying to deny. Thus, I repeat it is indeed appropriate to liken those (including Gerr) who Gerr would like to call skeptics to religious fanatics who are blinded by their faith.
Conclusion
In closing this letter, I would like to request that Mr. Gerr refrain from writing letters to the editor on this subject. To say, as he does in his last letter, “There are many points on which Professor Gardner and I disagree, but only two that are worthy of further discussion,” is to try to worm his way out of the debate without admitting defeat. I claim to have used careful reasoning to refute beyond all reasonable doubt every point Mr. Gerr (and Mr. Hinich) has attempted to make. Since he has shown that he cannot provide convincing arguments based on fact and logic to support his position, he should consider the debate closed. To sum up the debate:
– The resolution, cited in the introductory section of my 2 July 1995 letter to the editor, in contrapositive form, was made by myself in [1].
– The resolution was challenged by Hinich and defended by myself in April 1994 SP Forum.
– Hinich’s challenge was supported and my defense was challenged by Gerr in October 1994 SP Forum.
– Gerr’s arguments were challenged by myself in January 1995 SP Forum.
– Gerr defended his arguments in March 1995 SP Forum.
– Gerr’s presumably-final defense was challenged and the final arguments in support of the resolution are made by myself in this letter.
APPENDIX – Proof of Equivalence Between Time-Averaged and Frequency-Smoothed Cyclic Periodograms
History and Equivalence of Two Methods of Spectral Analysis
Published in IEEE SIGNAL PROCESSING MAGAZINE, July 1996
The purpose of this article is to present a brief history of two methods of spectral analysis and to present, in a tutorial fashion, the derivation of the deterministic relationship that exists between these two methods
History
Two of the oldest and currently most popular methods of measuring statistical (average) power spectral densities (PSD’s) are the frequency smoothing method (FSM) and the time averaging method (TAM). The FSM was thought to have originated in 1930 with Norbert Wiener’s work on generalized harmonic analysis [1], and to have been rediscovered in 1946 by Percy John Daniell [2]. But it was discovered only a few years ago (cf. [3]) that Albert Einstein had introduced the method in 1914 [4]. The currently popular method of deriving the FSM begins by showing that adjacent frequency bins in the periodogram have approximately the same correct mean values and the same large variances, and are approximately uncorrelated with each other. Then, it is observed that averaging these bins together retains the correct mean value, while reducing the variance.
The TAM is often attributed to a 1967 paper by P.D. Welch in the IEEE Transactions on Audio and Electroacoustics [5], but in fact the earliest known proposal of the TAM was by Maurice Stevenson Bartlett in 1948 [6]. The reasoning behind the TAM is similar to that for the FSM: the periodograms on adjacent segments of a data record have approximately the same correct mean values and the same large variances, and they are approximately uncorrelated with each other. Therefore, averaging them together will retain the correct mean value, while reducing the variance. (A more detailed historical account of the FSM, TAM, and other methods is given in [7].) Essentially, every spectral analysis software package available today includes either the FSM or the TAM, or both, often in addition to others. These other methods include, for example, the Fourier transformed tapered autocorrelation method, attributed to Ralph Beebe Blackman and John Wilder Tukey [8] (but used as early as 1898 by Albert A. Michelson [9]); and various model fitting methods that grew out of pioneering work by George Udny Yule in 1927 [10] and Gilbert Walker in 1931 [11].
It is well known that both the FSM and the TAM yield PSD estimates that can be made to converge to the exact PSD in some probabilistic sense, like in mean square as the length of the data record processed approaches infinity, However, it is much less commonly known that these two methods are much more directly related to each other. The pioneering methods due to Michelson, Einstein, Wiener, Yule, and Walker were all introduced without knowledge of the concept of a stochastic process. But starting in the 1950s (based on the work of mathematicians such as Khinchin, Wold, Kolmogorov, and Cramér in the 1930s and 1940s , the stochastic-process point of view essentially took over. It appears as though this mathematical formalism, in which analysts focus on calculating means and variances and other probabilistic measures of performance, delayed the discovery of the deterministic relationship between the FSM and TAM for about 40 years. That is, apparently it was not until the non-stochastic approach to understanding statistical (averaged) spectral analysis was revived and more fully developed in [7] that a deterministic relationship between these two fundamental methods was derived.
The next section presents, in a tutorial fashion, the derivation of the deterministic relationship between the FSM and TAM, but generalized from frequency-smoothed and time-averaged versions of the periodogram to same for the biperiodogram (also called the cyclic periodogram [7]). This deterministic relationship is actually an approximation of the time-averaged biperiodogram (TAB) by the frequency-smoothed biperiodogram (FSB) and, of course, vice versa. For evidence of the limited extent to which this deterministic relationship is known, the reader is referred to letters that have appeared in the SP Forum section of this magazine in the October 1994, January 1995, March 1995, and November 1995 issues.
Equivalence
Definitions
Let be a data-tapering window satisfying for , let be its autocorrelation
and let be its Fourier transform
Let be the sliding (in time ) complex spectrum of data seen through window
Similarly, let be a rectangular window of width , centered at the origin, and let be the corresponding sliding complex spectrum (without tapering). Also, let be the sliding cyclic correlogram for the tapered data
and let be the sliding cyclic correlogram without tapering
To complete the definitions, let and be the sliding biperiodograms (or cyclic periodograms) for the data
Derivation
It can be shown (using ) that (cf. [7, Chapter 11])
The above approximation, namely
for , becomes more accurate as the inequality grows in strength (assuming that there are no outliers in the data near the edges of the -length segment, cf. exercise 1 in [7, Chapt. 3] exercise 4b in [7, Chapt. 5], and Section B in [7, Chapt. 11]). For example, if the data is bounded by , , and , then it can be shown that the error in this approximation is worst-case bounded by . The first and last equalities above are simply applications of the cyclic-periodogram/cyclic-correlogram relation first established in [7, Chapter 11] together with the convolution theorem (which is used in the last equality).
Interpretation
The left-most member of the above string of equalities (and an approximation) is a biperiodogram of tapered data seen through a sliding window of length and time-averaged over a window of length . If this average is discretized, then we are averaging a finite number of biperiodograms of overlapping subsegments over the -length data record. (It is fairly well known that little is gained – although nothing but computational efficiency is lost – by overlapping segments more than about 50 percent.) The right-most member of the above string is a biperiodogram of un-tapered data seen through a window of length and frequency-smoothed along the anti-diagonal , using a smoothing window , for each fixed diagonal . Therefore, given a -length segment of data, one obtains approximately the same result, whether one averages biperiodograms on subsegments (TAM) or frequency smoothes one biperiodogram on the undivided segment (FSM). Given , the choice of determines both the width of the frequency smoothing windows in FSM and the length of the subsegments in TAM. Given and choosing , one can choose either of these two methods and obtain approximately the same result (barring outliers within of the edges of the data segment of length . By choosing (i.e., ), we see the biperiodograms reduce to the more common periodograms, and the equivalence then applies to methods of estimation of power spectral densities, rather than bispectra. Bispectra are also called cyclic spectral densities and spectral correlation functions [7]. As first proved in [7], the FSM and TAM spectral correlation measurements converge to exactly the same quantity, namely, the limit spectral correlation function (when it exists), in the limit as and , in this order. Further this limit spectral correlation function, also called the limit cyclic spectral density, is equal to the Fourier transform of the limit cyclic autocorrelation, as first proved in [7], where this relation is called the cyclic Wiener relation because it generalizes the Wiener relation between the PSD and autocorrelation from to
where
with .
In the special circumstance where the inequality cannot be satisfied because of the degree of spectral resolution (smallness of , that is required, there is no known general and provable argument that either method is superior to the other. It has been argued that, since the TAM involves time averaging, it is less appropriate than the FSM for nonstationary data. The results presented here, however, show that, for , neither the TAM nor the FSM is more appropriate than the other for nonstationary data. And, when is not satisfied, there is no known evidence that favors either method for nonstationary data.
The derivation of the approximation between the FSM and TAM presented here uses a continuous-time model. However, a completely analogous derivation of an approximation between the discrete-time FSM and TAM is easily constructed. When the spectral correlation function is being measured for many values of the frequency-separation parameter, , the TAM, modified to what is called the FFT accumulation method (FAM), is much more computationally efficient than the FSM implemented with an FFT [12].
William A. Gardner
Professor, Department of Electrical and Computer Engineering
University of California,
Davis, CA.
References
The debate preceding the above final argument:
To appear at a later date: the missing parts of the chronological sequence of contributions to the debate from both sides, including Hinich’s review.
1 – April 1994, reprint of Hinich in SP Mag
2 – April 1994, Author’s Comments including Ensembles in Wonderland
3 – Oct 1994, Gerr’s comments
4 – Jan 1995, My comments, These comments have been posted below the article “Ensembles in Wonderland”
5 – March 1995, Gerr’s second try
6 – July 1995, my final response (inserted at the beginning above)
Jan 1995, My comments
It is hard for me to decide whether or not Mr. Gerr’s letter in the Forum section of the October 1994 issue of this magazine deserves a response. He does not seem to address the basic issue of whether or not fraction-of-time probability is a useful concept. This is the issue being debated, isn’t it? In fact, I cannot find one technical point in his letter that is both valid and clearly stated. But, because Mr. Gerr has clearly stated in his letter that, regarding philosophical issues in science and engineering, he prefers “New York” style vicious attacks like Hinich’s to carefully worded slyly mocking replies, like mine, it has occurred to me that I might get through a little better to the Mr. Gerrs out there if I tried my hand at being just a little vicious. I hope the readers will understand that I am new at this; I give them my apologies now in case I fail to overcome my propensity for writing carefully and, when appropriate, slyly.
Mr. Gerr’s letter reveals a lot of misunderstanding and this provides us with some insight into what may motivate vicious attacks on attempts to educate people about alternative ways to conceptualize problem solving. It is hard for me to imagine how Mr. Gerr could have missed the main point of my response to Hinich’s review. This point, which is clearly stated in both the book [1] under attack and the unappreciated response to this attack, is that, and I quote from my response,
“… there is really no basis for controversy. The only real issue is one of judgement—judgement in choosing for each particular time-series analysis problem the most appropriate of two alternative approaches.”
To argue against this point is to be a zealot in the truest sense of the word, fanatically fighting for the One True Religion in statistics.
Sociologists and psychologists tell us that vicious behavior is often the result of paranoia born out of ignorance. In the example before us, both Hinich and Gerr demonstrate substantial ignorance regarding nonstochastic statistical concepts and methods, including fraction-of-time (FOT) probability. This case has already been made for Hinich in the Forum section of the April 1994 issue of this magazine. So, let us consider Gerr’s letter. First off, Gerr admits to the kind of behavior that is supposed to have no place in science and engineering, by identifying himself as a “partisan spectator”. Webster’s Ninth New Collegiate Dictionary defines partisan as “a firm adherent to a party, faction, or cause, or person, especially one exhibiting blind, prejudiced, and unreasoning allegiance.” On the basis of this admission alone one has to wonder whether to continue reading Gerr’s letter or flip the page. (It’s interesting that Gerr is into partisanship and Hinich’s university appointment is in the Government Department.) But what the heck, let’s see if we can find some technical content in his letter.
Mr. Gerr’s first of three technical remarks is quoted here:
“For me, the statistical approach to signal analysis begins with a probabilistic model (e.g., ARMA) for the signal. The signal time series is viewed as a single realization and as data arising from the model. The time series data is used in conjunction with statistical techniques (e.g., maximum likelihood) to infer parameters, order, appropriateness, etc. of the model. The abstract notion of an infinite population plays no role.”
Not too surprisingly, it is difficult to tell what point Mr. Gerr is trying to make here. He starts with a probabilistic model and ends with a denial of the notion of a population. Would Mr. Gerr care to tell us how he interprets “probability” in “probabilistic model” if he denies the notion of population? My guess is that his thinking does not go this deep. But let’s try to extract some meaning by reading between the lines. In spite of his sympathy with Hinich, Mr. Gerr seems to be agreeing that the problem-solving machinery of probability theory (e.g., ARMA modeling and maximum likelihood estimation) can be used regardless of whether one conceptualizes its use in terms of stochastic probability (with its associated ensembles or populations) or in terms of fraction-of-time (FOT) probability. This is the point that is made by the book [1] under attack: This book does include ARMA models and the maximum likelihood method as parts of the nonstochastic theory. True to the “blind allegiance” definition of partisanship, Mr. Gerr is apparently agreeing with the book while sympathizing with the attack on the book. Either Mr. Gerr has not read the book at all, or he may simply not have thought hard enough and long enough about these things. This is important to point out because I suspect it is the primary reason that there is any controversy at all.
Mr. Gerr then goes on to admit that the FOT approach may be required for chaotic time series. But again, true to form, he then makes a remark that is difficult to interpret:
“…the fraction-of-time approach may be required, though not necessarily: in [1], it is shown that statistical model-fitting techniques developed for stochastic time series models can also be useful in fitting chaotic time series models.”
This sounds like Mr. Gerr is again confused about the fact that many probabilistic models can be interpreted or conceptualized in terms of either stochastic probability or FOT probability. Thus, regardless of the fact that a model was originally derived in the stochastic probability framework, it can—depending on the particular model—still be used (and/or rederived) in the FOT framework. In fact, AR models were originally derived within the FOT framework, not the stochastic framework [2] – [3]. This will probably surprise Mr. Gerr. And if he is not confused about this, then he is again agreeing with the book [1] whose attack he supports.
On the assumption that people working with stochastic processes would have enough of an understanding of the subject to compare it with the nonstochastic theory presented in [1], this comparison was not made very explicit in [1]. Responses to [1], such as those of Messrs. Hinich and Gerr, suggest that this assumption is false more often than it is true. To make up for this, an explicit comparison and contrast between the theories of stochastic processes and nonstochastic time-series is made in Chapter 1 of [4].
Mr. Gerr concludes his letter by considering transient time-series and erroneously concluding that time averaging a biperiodogram over successive blocks of data (which he identifies with FOT methodology) is inappropriate, whereas spectrally smoothing a biperiodogram is appropriate. Obviously, he does not realize that the infamous book [1] that proposes FOT concepts and methods shows that when the data block, over which spectral smoothing of the biperiodogram is performed, is partitioned into subblocks over which time averaging of the biperiodogram is performed instead, the results from these two methods can closely approximate each other if the subblock length and window shape are chosen properly. In other words, it is very clearly explained in [1] that the FOT framework for spectral analysis includes frequency smoothing as well as time-averaging methods. This again brings up the question, did Mr. Gerr read the book [1], and if so, did he comprehend anything?
It is my recommendation to Mr. Gerr, and others who would entertain joining this discussion of the merit of considering alternatives to stochastic thinking, that the book [1] that started the furor so nicely exemplified by Hinich’s review, and Chapter 1 of [4], be read carefully, the way they were written. This should be a prerequisite to criticism, vicious or otherwise.
Before closing this letter, I should point out that the so-called controversy that statisticians like Hinich and Gerr are promoting is about as productive as the statisticians’ endless debate between the “Bayesians” and the “frequentists” over whether or not prior probabilities (“prior” meaning “before data collection”) should be included in the One True Religion of statistics [5]. The debate is endless, because it is based on the faulty premise that there is One True Religion. In fact, the subject of our “controversy” is not unrelated to the Bayesian/frequentist debate. This debate dates back to the 1920s, and involves many well-known statisticians, some 40 of whom are referenced in [5] for their contributions to this debate. The conclusion in [5], published just last month, is, I am happy to report:
“The Bayesians have been right all along! And so have the frequentists! Both schools are correct (and better than the other) under specific (and complementary) circumstances . . . Neither approach will uniformly dominate the other . . . knowing when to [use] one or the other remains a tricky question. It is nonetheless helpful to know that neither approach can be ignored”
This is very encouraging! These pragmatic statisticians are attempting to dispel belief in the One True Religion.
I conclude this reply with a little dialogue that I find both amusing and supportive of my response to vicious attacks:
Can old dogs be taught new tricks?
Maybe, but the teacher might get barked at for trying.
Should the teacher accept the barking graciously?
Maybe, but if the old dogs band together into a pack, the teacher better bark back.
— William A. Gardner
REFERENCES
As discussed on Page 3.1, the third and highest member in the hierarchical family of non-stochastic theories for statistical spectral analysis and, more generally, time-series analysis is the Non-Stochastic FOT-Probabilistic theory, which applies to and actually defines exactly stationary, cyclostationary, and almost cyclostationary (which includes Poly-Cyclostationary) Time Series. This is not a Purely Empirical theory, but it is considerably closer to empirical methodology for these three classes of time-series than is the classical Stochastic-Process Theory. The content of this Page 3.4 has been contributed to this website by Professor Antonio Napolitano, whose work is heavily cited on many pages of this website. His objectives in this article are to 1) present the state of the art, as of the end of 2020, of developing a rigorous mathematical foundation for FOT Probability and 2) compare and contrast the mathematical characteristics of this foundation and their practical ramifications in applications with same for the characteristics of the more abstract Stochastic-Process Probability foundation.
Professor Napolitano has been the leading contributor to FOT-Probability Theory and Cyclostationarity more generally based on both stochastic-process models and FOT-probabilistic models for the last 25 years, following the seminal work of the WCM between the mid-1980s and mid-1990s. The mathematical exposure of the closer relationship between the foundations of the FOT-Probability theory and practice than that between the foundations of the stochastic-process theory and practice is an especially valuable and revealing feature of this article.
A. Napolitano
December 21, 2020
The relative measure is the fraction-of-time (FOT) counterpart of the probability measure . Due to the differences between the relative measure, , on the relatively measurable sets (which are a subset of the -field of Borel subsets of the real line) and the probability measure, , on the -field of Borel subsets of a probability sample space, mathematical properties holding for stochastic processes do not necessarily have counterparts that hold for functions of time representing sample paths of these stochastic processes.
The key differences include:
– The class of the P-measurable sets is closed under union and intersection; the class of the relatively measurable sets is not.
– is a -additive measure; is not.
– Expectation is -linear; infinite-time average is not.
– Joint measurability of sample spaces is typically assumed but cannot be verified; joint relative measurability is a property of functions that can be verified.
These differences clearly show that the mathematical properties of the relative measure render it less amenable to mathematical study than do those of the probability measure . This, however, does not constitute an obstacle to using the FOT approach for signal analysis but, rather, provides motivation for using this approach instead of the classical stochastic-process approach based on .
Creators of the mathematical definition of a stochastic process dictated that certain properties of the mathematical entity (such as -additivity of the probability measure and -linearity of the expectation) must be exhibited so they could obtain mathematically desirable tractability. But as explained below, these dictations create a dichotomy between the abstract stochastic process properties and the properties of concrete individual sample paths of the stochastic process-the entities of primary interest to practitioners in many applications.
As also explained below, the creators of the mathematical definition of the FOT-probability model for functions, which can be thought of as single sample paths, did not dictate such problematic properties and therefore did not create such a dichotomy for the FOT probability approach.
2.1 Relative Measure
Let us consider the set , where is the -field of the Borel subsets and let be the Lebesgue measure on the real line . The relative measure of is defined by [19]
(1)
provided that the limit exists. In such a case, the limit does not depend on and the set is said to be relatively measurable (RM).
For example, the set
(2)
is and . The set
(3)
is not RM since oscillates between 0 and 1 as .
The relative measure is the Lebesgue measure normalized so that the relative measure of the real line is equal to , that is, . Note that such a normalization is obtained by a limit operation (as ) since the Lebesgue measure of the real line is infinite. Therefore, only Lebesgue-measurable sets with infinite Lebesgue measure can have a finite relative measure.
The normalization makes the relative measure of subsets of the real line a counterpart of the probability measure defined on sets belonging to the -field of the (abstract) sample space . For the probability measure, however, the normalization is obtained without a limit operation. That is, the sample space, before the measure normalization, is assumed to have finite measure. Thus, the normalized probability measure of a set is obtained by the ratio of the original un-normalized measure and the finite measure of the whole sample space.
Such a subtle property of the sample space of the classical probability measure has been surfaced by few authors (see e.g., Halmos [13] and [14, p.31]) even if the criticality of considering an infinite sample space was already surfaced by Kolmogorov in his fundamental work on the theory of probability where he addressed the necessity to introduce the Axiom VI (Axiom of Continuity) which cannot be derived from Axioms I-V. Citing [20, p. 15]: “For infinite fields, on the other hand, the Axiom of Continuity, VI, [is] proved to be independent of Axioms I-V. Since the new axiom is essential for infinite fields of probability only, it is almost impossible to elucidate its empirical meaning, as has been done, for example, in the case of Axioms I-V in par. 2 of the first chapter. For, in describing any observable random process, we can obtain only finite fields of probability. Infinite fields of probability occur only as idealized models of real random processes. We limit ourselves, arbitrarily, to only those models that satisfy Axiom VI. This limitation has been found expedient in researches of the most different sort.”
The normalization of the relative measure obtained by a limit operation results in having mathematical properties that render it less amenable to mathematical analysis than do those of the probability measure , as explained subsequently.
The class of the RM sets is not closed under union and intersection. That is, there exist such that [22, Fact 2.1] and there exist such that [22, Fact 2.2]. As an immediate consequence of [22, Fact 2.1], we have that not all Lebesgue-measurable sets (with infinite measure) are relatively measurable. In addition, non-RM sets are not so rare or exotic (see (3)) as non-Lebesgue-measurable sets [21, Par. 27, problem 7].
Since the Lebesgue measure is additive, the relative measure is also additive. That is, if and , then [22, Fact 2.4]. However, the relative measure is not -additive. In fact, following the Kac example [18, p. 46], if , then but since .
The absence of the -additivity property makes less attractive from a mathematical point of view than the probability measure whose -additivity is a consequence of Axiom VI [20, p.15]. The criticality of such an Axiom (and its consequences) has already been discussed in previous paragraphs.
From the above considerations, it is clear that a probabilistic model built for a persistent single function of time starting from the relative measure will have mathematical properties less amenable to mathematical analysis than will the classical probabilistic model of a stochastic process which is built starting from the probability measure . As explained below, this fact, which could initially appear to be a weakness of in comparison to , constitutes, instead, a motivation to adopt the FOT approach for signal analysis instead of the classical stochastic-process approach.
2.2 Relatively Measurable Functions
Let be a Lebesgue measurable function. The function is said to be relatively measurable if the set is RM for every , where is at most a countable set of points. Each RM function generates a function
(4)
in all points where the limit exists. In denotes the unit step function, that is, for and for .
The function has all the properties of a valid distribution function, except for the right-continuity property (in the discontinuity points). It represents the fraction-of-time (FOT) that the function is below the threshold [8], [9], [15]. For this reason, is referred to as the FOT distribution of the function .
Since the relative measure of finite sets is zero, every finite-energy or transient function has the trivial distribution function . Only finite-average-power or persistent functions can have a non-trivial FOT distribution.
Let be the indicator of the set that, is if and if The function is RM if and only if is a RM set and it follows that
(5)
Therefore, since non-RM sets are not rare or exotic (see (3)), non-RM functions also can easily be constructed. This fact has been exploited in [23] to design modulation formats for which statistical functions cannot be measured by time averages, for the purpose of obtaining secure communications.
Let be a relatively measurable, not necessarily bounded function and let be continuous, bounded, and such that for any , the equation admits at most a finite number of solutions for belonging to any finite interval. The following Fundamental Theorem of Expectation [22, Theorem 3.2] holds
(6)
where the first integral is in the Lebesgue sense and does not depend on and the second one is in the Riemann-Stieltjes sense.
From (6) it follows that the infinite-time average is the expectation operator for the FOT distribution and for every bounded we have
(7)
The duality of the FOT approach with the classical stochastic-process approach [8, Sec. 8.6], [10] is evident. For a 1st-order strict-sense stationary process with distribution the stochastic counterpart of the third equality in (4) is
(8)
where is the ensemble average and the stochastic counterpart of (7) is
(9)
A necessary and sufficient condition for the relative measurability of a function is not known. However, if is a bounded function, the existence of the time average
for every positive integer is a necessary condition for the relative measurability of . In addition, accounting for the Fundamental Theorem of Expectation, if is continuous and bounded and the left-hand side of
(10)
exists for any positive integer , then is relatively measurable, and equality (10) holds [34].
Finally, note that the absence of right-continuity of the FOT distribution is not important in applications. In the stochastic approach it is consequence of the -additivity of the probability measure .
2.3 Jointly Relatively Measurable Functions
By building counterexamples, it can easily be shown that the class of the RM functions is not closed under addition and multiplication [22, Theorem 3.5]. Thus, the class of the RM functions is not a function space. Such a results is in accordance with the fact that the class of finite-average-power functions is not a linear vector space [1], [24], and [25].
Note that, in contrast, the linear combination of two stochastic processes is still a stochastic process, provided that the two sample spaces are assumed to be jointly measurable. This, however, is not an innocuous assumption. Consequently, even if there exists a duality of results between the FOT and stochastic approaches (compare (7) and (9)), the stochastic process model for a single realization at hand should be used carefully, since properties of the stochastic process do not necessarily correspond to analogous properties of the function of time at hand. Such a deep difference between properties of stochastic processes and properties of functions constitutes a strong motivation for adopting the FOT approach for signal analysis.
The joint characterization of two (or more) functions is made by introducing the concept of joint relative measurability of functions. In particular, it can be shown that the sum and product of jointly RM functions is in turn a RM function. Thus, for such a function, a FOT probabilistic model can be constructed. The joint relative measurability is an analytical property of functions and, hence, easier to verify than the analogous property in the stochastic process framework, that is, the joint measurability of sample spaces. The latter property, in fact, cannot easily be verified in practice since, generally, the sample spaces are not specified.
Two Lebesgue measurable functions and are said to be jointly RM if the limit
(11)
exists for all where is at most a countable set of straight lines of . The function has all the properties of a bi-variate joint distribution function with the exception of right continuity in the discontinuity points. Such a definition extends naturally to functions.
Let and be jointly RM functions. Then each function is RM [22, Theorem 4.1]. In addition, the sum and the product are RM, provided that at least one of the functions is bounded [22, Theorem 4.2].
An extension to the multivariate case of the fundamental theorem of expectation (see (6)) can be derived [22, Theorem 4.5]. In particular, if and are bounded functions and and are jointly RM for every , then the temporal cross-correlation function [33] of and is given by.
(12)
where
(13)
2.4 Absence of -Linearity of the Infinite-Time Average
As a consequence of the absence of -additivity of the relative measure , although the corresponding expectation operator is linear, it is not -linear.
The infinite-time average of the linear combination of a finite number of jointly RM (not necessarily bounded) functions of time is equal to the linear combination of the time averages [5, Theorem 2.7]. However, this result cannot be extended to the case of a countable infinity of functions of time. For example, accounting for the identity
(14)
we have
(15)
since .
This result is different form the corresponding one in the stochastic approach where the expectation operator is -linear, provided that the underlying infinite series of random variables is absolutely convergent [20].
As for the absence of -additivity of the relative measure , the absence of -linearity of the corresponding expectation operator, the infinite-time average, could initially appear as to be a weakness of in comparison with ; however, it actually constitutes, a motivation to adopt the FOT approach for signal analysis instead of the classical stochastic-process approach. In fact, the realizations of a stochastic process do not necessarily exhibit properties analogous to those of the stochastic process.
The absence of -linearity of the expectation operator in the FOT approach is illustrated also with reference to the most general case, the almost-periodic component extraction operator, in Section 3.5.
2.5 Wold’s Isomorphism
In [35], Wold builds a discrete-time stochastic process whose sample paths are time-shifted versions of a single time series, and he thereby establishes an isomorphism whose continuous-time counterpart is
(16)
In such a case . Thus, in order to have we must have . This is an example of probability space for which measure normalization involves a limit operation. This fact, however, is not in agreement with the fundamental assumption that the probability space has finite measure, which assumption is made in the classical construction of a probability space [13, Sec. 5]. Therefore, Wold’s isomorphism (for discrete time or extended to continuous time) is not compatible with the classical definition of a stochastic process.
WARNING on Lemma in Section 2 of [35]: The “inner product” (notation of [35]) is well defined only if and are jointly RM (see (12)). The space of finite power signals is not a vector space [25].
The Wold’s isomorphism is extended in [9], [11] to cyclostationary time series. In such a case, a stochastic process is built whose sample paths are time-shifted versions of a single time series with time shifts that are integer multiples of the period of cyclostationarity . That is,
(17)
In [16], the details of the Wold’s isomorphism between cyclostationary stochastic sequences and cyclostationary numerical sequences are presented. It is shown how Hilbert-space representations of cyclostationary stochastic sequences are interpreted in the case of numerical cyclostationary sequences.
The cycloergodic theory, which extends and generalizes existing ergodic theory, is developed in [4]. Here it is shown that periodic components of time-varying probabilistic parameters can be consistently estimated from time averages on one sample path.
2.6 Conditional Relative Measurability and Independence
Let and be Lebesgue measurable sets and be an arbitrary increasing sequence of Lebesgue measurable subsets of with The conditional relative measure of the set given is defined as [22, Def. 5.1]
(18)
provided that the limit exists. In such a case, it is independent of the particular choice of the set sequence [22, Theorem 5.1].
Let the sets and be such that exists and is . The sets and are said to be independent if and only if [22, Def. 5.3].
Consider the definitions with and , with Lebesgue measurable. Assume that , where is at most a countable set of lines, exists. The functions and are defined to be independent if and only if .
Let and be jointly RM. In [22, Theorem 5.2] it is proved that the functions and are independent if and only if, except at most a countable set of straight lines, we have the equality
(19)
As an example, two sine waves with incommensurate periods are independent [18],[19]. If and are independent, then, for every and
(20)
That is, the normalization of to obtain a relative measure can be made by considering either subsets of the set built from or subsets . In other words, the function from which the normalizing sets are constructed, has no influence on the relative measure and such a relative measure equals . This result is in agreement with the intuitive concept of independence of two functions or signals in the sense that they have no link with each other.
The intuitive interpretation of the definition of independence in the FOT probability framework has no counterpart in the stochastic process framework where independence of processes is defined as the factorization of the joint distribution function into the product of the marginal ones [3],[7],[20]. In contrast, in the FOT probability framework such a factorization is proved to be true as a consequence of the intuitive definition in terms of conditional relative measurability [22, Theorem 5.2].
The concept of independent functions has been considered in [18], [19], [30], and [31], where equation (19) is taken to be the definition of independence and, consequently, no link with an intuitive concept of independence is established.
3.1 Time Average Estimation
In the stochastic approach, the estimator of the expected value of a wide-sense stationary stochastic process is the time average of one sample path of the process over the finite time interval with center and width . Similarly, in the FOT approach, the estimator of the infinite-time average of a time series is the finite-time average of this time series over .
In the stochastic approach, is fixed (and typically assumed to be 0 or ). The estimate is a random variable, that is, it depends on the sample point which determines the sample path or realization of the process. The variability of the estimate reflects its dependence on the sample path used for the estimation. Under appropriate mixing and stationarity assumptions, the estimate converges in some probabilistic sense as to the expected value of the process. In the FOT approach, the variability of the estimate reflects its dependence on the central point of the observation interval, when ranges over a wider temporal interval, say , with [26, Sec. 6.3.5].
Asymptotic characterization of the convergence of the estimator is expressed in terms of a double limit as and , provided that . The limit in produces the average-over-t behavior of the estimate, such as the average error (bias) and the average squared deviation of the estimate about its average value (the variance), where is the central point of the observation interval. The limit in produces an estimate that uses the whole time series for estimation.
Let be a RM time series obtained as a frequency-shifted second- or higher-order lag product of another time series . Second- and higher-order cyclic moments of are infinite-time averages of [12], [29]. The finite-time average
(21)
is the estimator of the infinite-time average
(22)
For sufficiently large (much greater than the longest period of cyclostationarity of the function is approximately (asymptotically exactly) wide-sense stationary in the FOT sense. That is, and its homogeneous nonlinear transformations do not contain any finite-strength sine-wave component with nonzero frequency. Thus, the FOT expectation operator of interest for asymptotic properties of the estimator is the infinite-time average. The performance of the estimator is expressed in terms of FOT bias and variance
(23)
(24)
where the two approximations become exact equalities in the limit as .
Assuming summability of second- and fourth-order temporal cumulants of , the estimator is mean-square consistent in the FOT sense, that is,
(25)
In addition, under further cumulant summability assumptions, the function
(26)
has a normal FOT distribution as [6].
3.2 Central Limit Theorm
If a sequence of independent zero-mean time-series satisfies some mild regularity assumptions, then the FOT distributions of the functions
(27)
in this sequence approach a zero-mean normal distribution as Theorem 3.5]. That is, we have the FOT Central Limit Theorem (CLT)
(28)
with equal to the average-over- value of the FOT variances of the functions .
The proof of this FOT CLT is based on the Taylor series expansion of the characteristic functions of similarly to the proof of the CLT in the classical stochastic approach. The proof in the FOT approach, however, is more challenging due to the presence of the limit operation in the definition of the relative measure , which limit is not present in the definition of probability measure .
The exploitation of the FOT CLT theorem allows one to overcome some difficulties that arise in the classical stochastic-process approach in the derivation of a widely adopted model for a communication channel [5, Section VI]. In the stochastic approach, a multipath Doppler channel is shown to introduce normally distributed gains under mild assumptions on the input signal and the channel characteristics [28, Chap. 14-1, pp. 759-762], [32, Chap. 2.4, pp. 34-37]. The justification, however, is only heuristic. In fact, in all the justifications, the input/output relationship of the channel is described in terms of deterministic signals and systems. Then, a stochastic model, whose statistical behavior should reproduce the time behavior of the deterministic model, is heuristically constructed.
In contrast, in the FOT approach in [5, Section VI], it is shown that the multipath Doppler channel introduces a time-varying gain which is RM with normal FOT distribution when the length of the observation interval approaches infinity and the number of paths approaches infinity. The order of these two limit operations cannot be interchanged. Moreover, even for a moderate number of paths, the distribution is almost normal, provided that the observation interval is sufficiently large.
3.3 Almost-Periodic Time-Variant FOT Distribution
For every time-series such that the sinusoidally weighted time average
(29)
exists , the decomposition
(30)
can be considered, where is the countable set of frequencies such that and the residual term does not contain any finite-strength additive sine-wave component
(31)
Let denote the almost-periodic (AP) component extraction operator, that is, the operator that extracts all the finite-strength additive sine-wave components of the function in its argument. From decomposition (30) we have
(32)
Under mild assumptions on the time series , it is shown in [11] that for every fixed the function of
(33)
is a valid cumulative distribution function except for the right-continuity property (at the discontinuity points). Moreover, for a well behaved function the following fundamental theorem of expectation can be proved [11]
(34)
From (33) and (34) it follows that is the expectation operator corresponding to the almost-periodically time-variant distribution . It is proposed in [9, Part II], [11], [12] to build the extension of the FOT approach for time series that exhibit cyclostationarity, that is, with periodically and almost-periodically time-variant probabilistic functions.
3.4 Signal Decomposition into FOT-Deterministic and FOT-Purely-Random Components
For an AP function , we have
(35)
That is, the AP component extraction operator, applied to an AP function, produces the AP function itself.
Since the AP component extraction operator is the expectation operator in the AP FOT probability framework, it follows from (35) that the AP functions are the deterministic signals in the AP FOT probability framework. All other signals are the random signals.
Note that the term “random” here is not intended to be synonymous with “stochastic”. In fact, the adjective stochastic is adopted, as usual, when an ensemble of realizations or sample paths exists, whereas the adjective random is used in reference to a single function of time.
In other words, decomposition (30) can be interpreted as the decomposition of a generic random signal into its deterministic (that is, AP) component and a residual component
(36)
Decomposition (36) is the FOT counterpart of the classical decomposition of a stochastic process into its mean value and a zero-mean term.
3.5 Almost-Periodic Component of a PAM Time Series
Let us consider a pulse-amplitude modulated (PAM) stochastic process
(37)
where is a sequence of random variables and the pulse . Under the assumption
(38)
the -linearity of the statistical expectation operator , validates the interchange of the infinite summmation and expectation operations, and the expected value of is given by
(39)
Such an interchange of expectation operator and infinite summation is not allowed in the FOT approach.
Let us consider the PAM time series
(40)
with a discrete-time time series. Even if
(41)
for the time series in general we have
(42)
(43)
since . Such a result is in accordance with the fact that the infinite-time average and the almost-periodic component extraction operator are not -linear.
In [27, Sec. 2.4.1], it is shown that by properly executing the almost-periodic component extraction operation, the FOT expectation of , which is its almost-periodic component, can be shown to be given by
(44)
where denotes the discrete-time almost-periodic component extraction operator.
Comparison of the procedures used to obtain the expected value (39) of a PAM process and its FOT counterpart (44) shows the required difference in ways of executing the expectation operator in the stochastic and FOT approaches. In fact, even if a summability condition for the discrete-time sequence is satisfied, the FOT expectation operator cannot be freely interchanged with the infinite-summation operation. Thus, the result (39) valid for the stochastic process cannot be immediately extended to its sample paths. In contrast, the result derived in the FOT approach is valid by default for the unique time series at the hand of the experimenter.
3.6 Linear System Decomposition into FOT-Deterministic and FOT-Random Components
In this section, linear dynamical systems are decomposed into the parallel connection of a deterministic component and a random component.
In the FOT probability framework for time series that exhibit cyclostationarity, a deterministic system is defined as a possibly complex (and not necessarily linear) system that for every deterministic (i.e., almost periodic) input time-series delivers a deterministic output time-series [17], [26, Sec. 4.3.1]. Therefore, for the sinusoidal input time-series
(45)
a FOT deterministic system delivers the almost-periodic output time-series
(46)
where is a countable set and and are frequencies and Fourier coefficients, both depending on the input frequency , of the almost-periodic output time series.
If the FOT deterministic system is linear, that is
(47)
the impulse-response function can be expressed as
(48)
where are the inverse functions of (that can always be chosen invertible) and , with denoting the first-order derivative of The output signal is given by
(49)
where denotes convolution and
(50)
with the Fourier transform of . That is, the output of a FOT deterministic linear system is comprised of the superposition of frequency-warped and filtered versions of the input signal [26, Fig. 4.4].
The class of FOT deterministic linear time-variant (LTV) systems includes that of the linear almost-periodically time-variant (LAPTV) systems which, in turn, includes, as special cases, both linear periodically time-variant and linear time-invariant (LTI) systems. Other examples are the systems performing a time-scale change. Decimators and interpolators are FOT deterministic discrete-time systems. The parallel and cascade concatenation of FOT deterministic LTV systems is still a FOT deterministic LTV system.
LTV systems that are not FOT deterministic systems include chirp modulators, modulators whose carrier is a pseudo-noise sequence, channels introducing a non-linear time-varying delay, and systems performing time windowing. In fact, these systems do not deliver an almost-periodic function when the input is a sine wave.
On the basis of the above introduced system classification, a deterministic system in the stochastic-process framework can be classified as either deterministic or random in the FOT sense, depending on the behavior of the impulse-response function. Moreover, in the stochastic-process framework, a random system can have an impulse-response function whose sample paths are either FOT deterministic or FOT random.
In the stochastic-process framework, the impulse-response function of a stochastic linear system can be decomposed into the sum of a deterministic component and a random component [2]. As for a stochastic process, the deterministic component is the expected value of the impulse-response function and the random component is the residual part. In the stochastic-process framework, the statistical expectation operator can be applied in the same way to stochastic processes and stochastic-system impulse-response functions. In fact, the randomness of both processes and systems is due to the dependence on the variable belonging to the sample space which is not linked to the time variables. In contrast, in the FOT probability framework, the AP component extraction operator is not able to provide the deterministic component also for systems since, in this framework, the randomness is a consequence of the time behavior of the functions.
As an example, let us consider the impulse-response function of a system performing a time-scale change. This system is FOT deterministic since an input sine wave with frequency is transformed into an output sine wave with frequency . By applying the almost-periodic component extraction operator to one would obtain the incongruous result that the deterministic component of should be identically zero, unless the system were .
A useful and congruous decomposition for LTV systems is obtained as follows [17]:
(51)
In (51), denotes the impulse-response function of the subsystem that for any AP input delivers an AP output. Its analytic expression is given by (48a), (48b) and is referred to as the impulse-response function of the FOT deterministic component of the LTV system. The function is the impulseresponse function of the subsystem that for any AP input delivers an output signal not containing any finite-strength additive sine-wave component. Such a subsystem is referred to as the FOT residual component of the LTV system.
For FOT deterministic LTV systems we have
(52)
Systems for which
(53)
are referred to as FOT purely random systems. By using both decompositions (36) and , the AP component of the output signal in (47) can be expressed as [17]
(54)
Note that, and are the expected values (in the AP FOT sense) of the input and output signals, respectively. Therefore, by analogy with the stochastic-process framework, can be interpreted as the expected value (in the AP FOT sense) of the impulse-response function .
The meanings of FOT deterministic, FOT random, and FOT residual component are summarized for both signals and systems in the table below.
FOT deterministic | FOT random | FOT residual | |
---|---|---|---|
signals | almost-periodic (AP) functions | not FOT deterministic | not containing any finite-strength additive sine-wave component |
systems | transform AP functions into AP functions | not FOT deterministic | transform AP functions into signals not containing finite-strength additive sine-wave components |
3.7 Monte Carlo Simulations
Monte Carlo simulations are FOT simulations, not “stochastic” simulations [6, Section 4.4]. Computer random number generators produce unique periodic sequences with very long periods. Calling such a routine several times (with different seeds) is equivalent to picking different time segments of the unique sequence. If the period is sufficiently long, the sequence can be considered aperiodic for practical purposes.
References
[1] J. Bass, “Suites uniformément denses, moyennes trigonométriques, fonctions pseudo-aléatoires,” Bulletin de la Société Mathématique de France, vol. 87, pp. 1-64,1959.
[2] P. A. Bello, “Characterization of randomly time-variant channels,” IEEE Transactions on Communications Systems, vol. CS-11, pp. 360-393, December 1963.
[3] P. Billingsley, Convergence in Probability Measures. New York: Wiley, 1968 .
[4] R. A. Boyles and W. A. Gardner, “Cycloergodic properties of discrete- parameter nonstationary stochastic processes,” IEEE Transactions on Information Theory, vol. IT-29, no. 1, pp. 105-114, January 1983.
[5] D. Dehay, J. Leśkow, and A. Napolitano, “Central limit theorem in the functional approach,” IEEE Transactions on Signal Processing, vol. 61, no. 16, pp. 4025-4037, August 2013.
[6] —, “Time average estimation in the fraction-of-time probability framework,” Signal Processing, vol. 153, pp. 275-290, December 2018.
[7] J. L. Doob, Stochastic Processes. New York: John Wiley \& Sons, Inc., 1953.
[8] W. A. Gardner, Introduction to Random Processes with Applications to Signals and Systems. New York: Macmillan, 1985,(1990,2nd Edn., McGraw-Hill, New York).
[9] —, Statistical Spectral Analysis: A Nonprobabilistic Theory. Englewood Cliffs, NJ: Prentice-Hall, 1987 .
[10] —, “Two alternative philosophies for estimation of the parameters of time-series,” IEEE Transaction on Information Theory, vol. 37, pp. 216-218, January 1991.
[11] W. A. Gardner and W. A. Brown, “Fraction-of-time probability for time-series that exhibit cyclostationarity,” Signal Processing, vol. 23, pp. 273-292, June 1991.
[12] W. A. Gardner and C. M. Spooner, “The cumulant theory of cyclostationary time-series. Part I: Foundation,” IEEE Transactions on Signal Processing, vol. 42, pp. 3387-3408, December 1994.
[13] P. R. Halmos, “The foundations of probability,” The American Mathematical Monthly, vol. 51, no. 9, pp. 493-510, November 1944.
[14] —, Lectures on Ergodic Theory. Mathematical Society of Japan, and American Mathematical Society (AMS) Chelsea Publishing, New York, 2006 .
[15] E. M. Hofstetter, “Random processes,” in The Mathematics of Physics and Chemistry, H. Margenau and G. M. Murphy, Eds. Princeton, NJ: D. Van Nostrand Co., 1964, vol. II, ch. 3.
[16] H. L. Hurd and T. Koski, “The Wold isomorphism for cyclostationary sequences,” Signal Processing, vol. 84 pp. May
[17] L. Izzo and A. Napolitano, “Linear time-variant transformations of generalized almost-cyclostationary signals. Part I: Theory and method,” IEEE Transactions on Signal Processing, vol. 50, no. 12, pp. 2947 – 2961, December 2002.
[18] M. Kac, Statistical Independence in Probability, Analysis and Number Theory. USA: The Mathematical Association of America, 1959.
[19] M. Kac and H. Steinhaus, “Sur les fonctions indépendantes (IV) (Intervalle infini),” Studia Mathematica, vol. 7, pp. 1-15,1938.
[20] A. N. Kolmogorov, Foundations of the Theory of Probability, 1933, and Chelsea, New York, 1956.
[21] A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis. Englewood Cliffs, NJ: Prentice Hall, 1970, and New York: Dover, 1975 .
[22] J. Leśkow and A. Napolitano, “Foundations of the functional approach for signal analysis,” Signal Processing, vol. 86, no. 12, pp. 3796-3825, December 2006.
[23] —, “Non-relatively measurable functions for secure-communications signal design,” Signal Processing, vol. 87, no. 11, pp. 2765-2780, November 2007 .
[24] P. M. Mäkilä, J. R. Partington, and T. Norlander, “Bounded power signal spaces for robust control and modeling,” SIAM Journal on Control and Optimization, vol. 37, no. 1, pp. 92-117,1998.
[25] J. Mari, “A counterexample in power signals space,” IEEE Transactions on Automatic Control, vol. 41 , no. 1, pp. 115-116, January 1996.
[26] A. Napolitano, Generalizations of Cyclostationary Signal Processing: Spectral Analysis and Applications. John Wiley & Sons Ltd – IEEE Press, 2012.
[27] —, Cyclostationary Processes and Time Series: Theory, Applications, and Generalizations. Elsevier, 2019.
[28] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw-Hill, 1995.
[29] C. M. Spooner and W. A. Gardner, “The cumulant theory of cyclostationary time-series. Part II: Development and applications,” IEEE Transactions on Signal Processing, vol. 42, pp. 3409-3429, December 1994.
[30] H. Steinhaus, “Sur les foncions indépendantes (VI) (Équipartition),” Studia Mathematica, vol. 9, pp. 121-132, 1940.
[31] —, “Sur les foncions indépendantes (VII),” Studia Mathematica, vol. 10, pp. 1-20, 1948.
[32] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge, UK: Cambridge University Press, 2005.
[33] N. Wiener, “Generalized harmonic analysis,” Acta Mathematica, vol. 55, pp. 117-258, 1930.
[34] A. Wintner, “Remarks on the ergodic theorem of Birkhoff,” Proceedings of the National Academy of Science of the U.S.A., vol. 18, pp. 248-251, 1932.
[35] H. O. A. Wold, “On prediction in stationary time series,” Ann. Math. Statist., vol. 19, pp. 558-567, 1948.
Much about avoiding the issue of cycloergodicity—and ergodicity as well—is said in earlier sections of this Page 3; and there is more in the paper “Fraction-of-Time Probability: Advancing Beyond the Need for Stationarity and Ergodicity Assumptions” by A. Napolitano and W. A. Gardner, submitted in March 2021 for publication in the journal Nature. It has even been argued that cycloergodicity issues—namely the difficulties in determining if a specified stochastic process model that exhibits one of the various forms of cyclostationarity is or is not cycloergodic at some or all cycle frequencies—can be ignored by simply forgetting about stochastic-process models and using instead FOT-probabilistic models for single time-series. This generally makes sense because if one wants to work with time averages, then there’s no reason to adopt a stochastic-process model in the first place,
Nevertheless, it is conceivable that there could be some meaningful situations for which it is indeed important to determine if a specified stochastic process exhibits cycloergodic properties and, if so, which specific properties: cycloergodicity, with probability (equal to) one, at some specific cycle frequencies or at all cycle frequencies; or mean-square cycloergodicity of some moments or all moments and at some or all cycle frequencies; absence of cycloergodicity at some specific cycle frequencies, etc.
These types of questions were addressed in a 1983 paper [JP11], and apparently have not been further addressed since. This is unfortunate because the progress made in this 38-year-old paper (as of 2021) applies to only discrete-time processes and even for these processes this progress is disappointing. Specifically, some conditions on the time-variant (or translation variant) probability measure in a classically-defined stochastic process that are necessary and sufficient for cycloergodicity with probability one are established, but it is then shown that processes as simple and straightforward as a Bernoulli process, consisting of an infinite sequence of statistically independent binary-valued (with “values” typically referred to as “success” and “failure”) random variables, generalized to accommodate a probability of success that varies almost periodically with the discrete time index for the process, are obviously cycloergodic with probability one, yet do not satisfy the derived conditions for cycloergodicity. Furthermore, the conditions for cycloergodicity with probability one that are derived appear to be appropriate and the most straightforward generalizations of the well-established conditions for ergodicity with probability one. Thus, it seems to be unknown as to whether or not a satisfactory theory of cycloergodicity with probability one can be developed.
On the other hand, the conditions for mean-square cycloergodicity also appear to be appropriate and to be straightforward generalizations of the well-known mixing conditions for mean-square ergodicity, and in this case—unlike the case of cycloergodicity with probability one—no examples that violate these conditions yet exhibit mean-square cycloergodicity have been found.
It is conceivable that this apparent issue with the concept of cycloergodicity with probability one is yet another disadvantage of classically defined stochastics processes as viable models for characterizing the behavior of single time series; i.e., single sample paths of the process, cf. [“Fraction-of-Time Probability: Advancing Beyond the Need for Stationarity and Ergodicity Assumptions” by A. Napolitano and W. A. Gardner, submitted to the journal Nature in March 2021].
If further progress on cycloergodicity with probability one is made, it is the WCM’s intention to describe such progress on this page.