Theme: A wrong turn in the mathematical modeling of time-series was taken almost a century ago. Today, Academia should engage in remediation to overcome the detrimental influence on the teaching and practice of time-series analysis in Science and Engineering.
The objective of this page is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance this proposed paradigm shift has met with from those indoctrinated in the theory of Stochastic processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 11, Notes on the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 4 in perspective.
The 1987 book, Statistical Spectral Analysis: A Nonprobabilistic Theory, argues for more judicious use of the modern stochastic-process-model (arising from the work of mathematicians in the 1930s, such as Khinchin, Kolmogorov, and others) instead of the more realistic predecessor: the time-series model first developed mathematically by Norbert Wiener in 1930 (see also page 59 of Wiener 1949, written in 1942, regarding the historical relationship between his and Kolmogorov’s approaches), that was briefly revisited in the 1960s by engineers before it was buried by mathematicians. The brief tongue-in-cheek essay Ensembles in Wonderland, published in IEEE Signal Processing Magazine, AP Forum, 1994 and reproduced below, is an attempt at satirizing the outrage typified by narrow-minded thinkers exemplified by two outspoken skeptics, Neil Gerr and Melvin Hinich, who wrote scathing remarks and a book review characterizing this book as utter nonsense.
Consider the parallel to the book Alice in Wonderland; the following is comprised of excerpts taken from https://en.wikipedia.org/wiki/Alice’s_Adventures_in_Wonderland: Martin Gardner and other scholars have shown the book Alice in Wonderland [written by Lutwidge Dodgson under the pseudonym Lewis Carroll] to be filled with many parodies of Victorian popular culture. Since Carroll was a mathematician at Christ Church, it has been argued that there are many references and mathematical concepts in both this story and his later story Through the Looking Glass; examples include what have been suggested to be illustrations of the concept of a limit, number bases and positional numeral systems, the converse relation in logic, the ring of integers modulo a specific integer. Deep abstraction of concepts, such as non-Euclidean geometry, abstract algebra, and the beginnings of mathematical logic, was taking over mathematics at the time Alice in Wonderland was being written (the 1860s). Literary scholar Melanie Bayley asserted in the magazine New Scientist that Alice in Wonderland in its final form was written as a scathing satire on new modern mathematics that was emerging in the mid-19th century.
Today, Dodgson’s satire appears to be backward looking because, after all, there are strong arguments that modern mathematics has triumphed. Coming back to the topic of interest here, stochastic processes have triumphed in terms of being wholly adopted in mathematics and science and engineering, except for a relatively small contingent of empirically-minded scientists and engineers. Yet, recent mathematical arguments, summarized in [B2], provide a sound mathematical basis for reversing this outcome, especially when the overwhelming evidence of practical and pragmatic and pedagogic and overarching conceptual advantages provided in the 1987 book is considered. The present dominance of the more abstract and less realistic stochastic process theory might be viewed as an example of the pitfalls of what has become known as groupthink or the inertia of human nature that resists changes in thinking, which is exemplified on Page 11.
Before presenting the article “Ensembles in Wonderland”, the final letter to SP Forum in the debate is reproduced first for the sake of hindsight.
The debate:
July 2, 1995 (published in Nov 1995)
To the Editor:
Introduction
This is my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], and carried on by Mr. Neil Gerr through his letters to SP Forum.
In this letter, I supplement my previous remarks aimed at clarifying the precariousness of Hinich’s and Gerr’s position by explaining the link between my argument in favor of the utility of fraction-of-time (FOT) probability and the subject of a plenary lecture delivered at ICASSP ’94. In the process of discussing this link I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition–that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework–simply cannot be defended if argument is to be based on fact and logic.
David J. Thomson’s Transcontinental Waveguide Problem
To illustrate that the stochastic-process conceptual framework is often applied to physical situations where the time-average framework is a more natural choice, I have chosen an example from D. J. Thomson’s recent plenary lecture on the project that gave birth to the multiple-window method of spectral analysis [2]. The project that was initiated back in the mid-1960s was to study the feasibility of a transcontinental millimeter waveguide for a telecommunications transmission system potentially targeted for introduction in the mid-1980s. It was found that accumulated attenuation of a signal propagating along a circular waveguide was directly dependent on the spectrum of the series, indexed by distance, of the erratic diameters of the waveguide. So, the problem that Thomson tackled was that of estimating the spectrum for the more than 4,000-mile-long distance-series using a relatively small segment of this series that was broken into a number of 30-foot long subsegments. (It would take more than 700,000 such 30-foot sections to span 4,000 miles.) The spectrum had a dynamic range of over 100 dB and contained many periodic components, indicating the unusual challenge faced by Thomson.
When a signal travels down a waveguide (at the speed of light) it encounters the distance-series of erratic waveguide-diameters. Because of the constant velocity, the distance-series is equivalent to a time-series. Similarly, the series of diameters that is measured for purposes of analysis is—due to the constant effective velocity of the measurement device—equivalent to a time-series. So, here we have a problem where there is one and only one long time-series of interest (which is equivalent to a distance-series)–-there is no ensemble of long series over which average characteristics are of interest and, therefore, there is no obvious reason to introduce the concept of a stochastic process. That is, in the physical problem being investigated, there was no desire to build an ensemble of transcontinental waveguides. Only one (if any at all) was to be built, and it was the spectral density of distance-averaged (time-averaged) power of the single long distance-series (time-series) that was to be estimated, using a relatively short segment, not the spectral density of ensemble-averaged power. Similarly, if one wanted to analytically characterize the average behavior of the spectral density estimate (the estimator mean) it was the average of a sliding estimator over distance (time), not the average over some hypothetical ensemble, that was of interest. Likewise, to characterize the variability of the estimator, it was the distance-average squared deviation of the sliding estimator about its distance-average value (the estimator variance) that was of interest, not the variance over an ensemble. The only apparent reason for introducing a stochastic process model with its associated ensemble, instead of a time-series model, is that one might have been trained to think about spectral analysis of erratic data only in terms of such a conceptual artifice and might, therefore, have been unaware of the fact that one could think in terms of a more suitable alternative that is based entirely on the concept of time averaging over the single time-series. (Although it is true that the time-series segments obtained from multiple 30 ft. sections of waveguide could be thought of as independent random samples from a population, this still does not motivate the concept of an ensemble of infinitely long time-series–a stationary stochastic process. The fact remains that, physically, the 30-foot sections represent subsegments of one long time-series in the communications system concept that was being studied.)
It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process (the model adopted by Thomson) except to accommodate lack of familiarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework. Somehow, he does not recognize the mental gyrations required to force this and other physical problems into the stochastic process framework.
Gerr’s Letter
Having explained the link between my argument in favor of the utility of FOT probability and Thomson’s work, let us return to Gerr’s letter. Mr. Gerr, in discussing what he refers to as “a battle of philosophies,” states that I have erred in likening skeptics to religious fanatics. But in the same paragraph we find him defensively trying to convince his readers that the “statistical/probabilistic paradigm” has not “run out of gas” when no one has even suggested that it has. No one, to my knowledge, is trying to make blanket negative statements about the value of what is obviously a conceptual tool of tremendous importance (probability) and no one is trying to denigrate statistical concepts and methods. It is only being explained that interpreting probability in terms of the fraction-of-time of occurrence of an event is a useful concept in some applications. To argue, as Mr. Gerr does again in the same paragraph, that in general this concept “has no obvious advantages” and using it is “like building a house without power tools: it can certainly be done, but to what end?” is, as I stated in my previous letter, to behave like a religious fanatic — one who believes there can be only One True Religion. This is a very untenable position in scientific research.
As I have also pointed out in my previous letter, Mr. Gerr is not at all careful in his thinking. To illustrate his lack of care, I point out that Gerr’s statement “Professor Gardner has chosen to work within the context of an alternative paradigm [fraction-of-time probability]”, and the implications of this statement in Gerr’s following remarks, completely ignore the facts that I have written entire books and many papers within the stochastic process framework, that I teach this subject to my students, and that I have always extolled its benefits where appropriate. If Mr. Gerr believes in set theory and logic, then he would see that I cannot be “within” paradigm A and also within paradigm B unless A and B are not mutually exclusive. But he insists on making them mutually exclusive, as illustrated in the statement “From my perspective, developing signal processing results using the fraction-of-time approach (and not probability/statistics) … .” (The parenthetical remark in this quotation is part of Mr. Gerr’s statement.) Why does Mr. Gerr continue to deny that the fraction-of-time approach involves both probability and statistics?
Another example of the lack of care in Mr. Gerr’s thinking is the convoluted logic that leads him to conclude “Thus, spectral smoothing of the biperiodogram is to be preferred when little is known of the signal a priori.” As I stated in my previous letter, it is mathematically proven* in [1] that the frequency smoothing and time averaging methods yield approximately the same result. Gerr has given us no basis for arguing that one is superior to the other and yet he continues to try to make such an argument. And what does this have to do with the utility of the fraction-of-time concept anyway? These are data processing methods; they do not belong to one or another conceptual framework.
To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing referred to above was first derived by using the fraction-of-time conceptual framework [1]. If there is no conceptual advantage to this framework, why wasn’t such a fundamental result derived during the half century of research based on stochastic processes that preceded [1]? The second example is taken from the first attempt to develop a theory of higher-order cyclostationarity for the conceptualization and solution of problems in communication system design. In [3], it is shown that a fundamental inquiry into the nature of communication signals subjected to nonlinear transformations led naturally to the fraction-of-time probability concept and to a derivation of the cumulant as the solution to a practically motivated problem. This is, to my knowledge, the first derivation of the cumulant. In all other work, which is based on stochastic processes (or non-fraction-of-time probability) and which dates back to the turn of the century, cumulants are defined, by analogy with moments, to be coefficients in an infinite series expansion of a transformation of the probability density function (the characteristic function), which has some useful properties. If there is no conceptual advantage to the fraction-of-time framework, why wasn’t the cumulant derived as the solution to the above-mentioned practical problem or some other practical problem using the orthodox stochastic-probability framework?
Conclusion
Since no one in the preceding year has entered the debate to indicate that they have new arguments for or against the philosophy and corresponding theory and methodology presented in [1], it seems fair to proclaim the debate closed. The readers may decide for themselves whether the resolution put forth in [1] was defeated or was upheld.
* A more detailed and tutorial proof of this fundamental equivalence is given in the article “The history and the equivalence of two methods of spectral analysis,” Signal Processing Magazine, July 1996, No.4, pp.20 – 23, which is copied into the Appendix farther down this Page .
But regarding the skeptics, I sign off with a humorous anecdote:
When Mr. Fulton first showed off his new invention, the steamboat, skeptics were crowded on the bank, yelling ‘It’ll never start, it’ll never start.’
It did. It got going with a lot of clanking and groaning and, as it made its way down the river, the skeptics were quiet.
For one minute.
Then they started shouting. ‘It’ll never stop, it’ll never stop.’
— William A. Gardner
References
Excerpts from earlier versions of above letter to the editor before it was condensed for publication:
April 15, 1995
Introduction
In this, my final letter to SP Forum in the debate initiated by Mr. Melvin Hinich’s challenge to the resolution made in the book [1], I shall begin by addressing two remarks in the opening paragraph of Mr. Neil Gerr’s last letter (in March 1995 SP Forum). In the first remark, Mr. Gerr suggests that the “bumps and bruises” he sustained by venturing into the “battle” [debate] were to be expected. But I think that such injuries could have been avoided if he had all the relevant information at hand before deciding to enter the debate. This reminds me of a story I recently heard:
Georgios and Melvin liked to hunt. Hearing about the big moose up north, they went to the wilds of Canada to hunt. They had hunted for a week, and each had bagged a huge moose. When their pilot Neil landed on the lake to take them out of the wilderness, he saw their gear and the two moose. He said, “I can’t fly out of here with you, your gear, and both moose.”
“Why not?” Georgios asked.
“Because the load will be too heavy. The plane won’t be able to take off.”
They argued for a few minutes, and then Melvin said, “I don’t understand. Last year, each of us had a moose, and the pilot loaded everything.”
“Well,” said Neil, “I guess if you did it last year, I can do it too.”
So, they loaded the plane. It moved slowly across the lake and rose toward the mountain ahead. Alas, it was too heavy and crashed into the mountain side. No one was seriously hurt and, as they crawled out of the wreckage in a daze, the bumped and bruised Neil asked, “Where are we?”
Melvin and Georgios surveyed the scene and answered, “Oh, about a mile farther than we got last year.”
If Mr. Gerr had read the book [1] and put forth an appropriate level of effort to understand what it was telling him, he would have questioned Mr. Hinich’s book review and would have seen that the course he was about to steer together with the excess baggage he was about to take on made a crash inevitable.
A friend of mine recently offered me some advice regarding my participation in this debate. “Why challenge the status quo”, he said, “when everybody seems happy with the way things are.” My feeling about this is summed up in the following anecdote:
“Many years ago, a large American shoe manufacturer sent two sales reps out to different parts of the Australian outback to see if they could drum up some business among the aborigines. Sometime later, the company received telegrams from both agents.
The first one said. ‘No business. Natives don’t wear shoes.’
The second one said, ‘Great opportunity here–natives don’t wear shoes.'”
Another friend asked “why spend your time on this [debate] when you could be solving important problems.” I think Albert Einstein answered that question when he wrote:
“The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science”
This underscores my belief that we are overemphasizing “engineering training” in our university curricula at the expense of “engineering science.” It is this belief that motivates my participation in this debate. Instead of plodding along in our research and teaching with the same old stochastic process model for every problem involving time-series data, we should be looking for new ways to think about time-series analysis.
In the second remark in Mr. Gerr’s opening paragraph, regarding my response to Mr. Gerr’s October 1994 SP Forum letter in sympathy with “Hinich’s gleefully vicious no-holds-barred review” of [1], Mr. Gerr says “Even by New York standards, it [my response] seemed a bit much.” Well, I guess I was thinking about what John Hancock said, on boldly signing the Declaration of Independence:
There, I guess King George will be able to read that!
Like the King of England who turned a deaf ear to the messages coming from the new world, orthodox statisticians, like Messrs. Hinich and Gerr who are mired in tradition seem to be hard of hearing–a little shouting might be needed to get through to them.
Nevertheless, I am disappointed to see no apparent progress, on Mr. Gerr’s part, in understanding the technical issues involved in his and Hinich’s unsupportable position that the time-average framework for statistical signal processing has, and I quote Gerr’s most recent letter, “no obvious advantages.” I hasten to point out, however, that this most recent position is a giant step back from the earlier even more indefensible position taken by Hinich in his book review, reprinted in April 1994 SP Forum, where much more derogatory language was used.
In this letter, I make a final attempt to clarify the precariousness of Hinich’s and Gerr’s position by explaining links between my arguments and the subjects of two plenary lectures delivered at ICASSP ’94. In the process of discussing these links and this paper, I hope to continue the progress made in my previous two letters in discrediting the naysayers and thereby moving toward broader acceptance of the resolution that was made and argued for in [1] and is currently being challenged. My continuing approach is to show that the position taken by the opposition, that the fraction-of-time probability concept and the corresponding time-average framework for statistical signal processing theory and method have nothing to offer in addition to the concept of probability associated with ensembles and the corresponding stochastic process framework, simply cannot be defended if argument is to be based on fact and logic.
Lotfi Zadeh and Fuzzy Logic
I wish that Mr. Gerr would let go of the fantasy about “the field where the Fraction-of-Timers and Statisticians do battle.” There do not exist two mutually exclusive groups of people—one of which can think only in terms of fraction-of-time probability and the other of which call themselves Statisticians. How many times and in how many ways does this have to be said before Mr. Gerr will realize that some people are capable of using both fraction-of-time probability and stochastic process concepts, and of making choices between these alternatives by assessing the appropriateness of each for each particular application? Mr. Gerr’s “battle” of “fraction-of-time versus probability/statistics” simply does not exist. This insistence on a dichotomy of thought is strongly reminiscent of the difficulties some people have had accepting the proposition that the concept of fuzziness is a useful alternative to the concept of probability. The vehement protests against fuzziness are for most of us now almost laughable.
To quote Professor Lotfi Zadeh in his recent plenary lecture [2]
“[although fuzzy logic] offers an enhanced ability to model real-world phenomena…[and] eventually fuzzy logic will pervade most scientific theories…the successes of fuzzy logic have also generated a skeptical and sometimes hostile reaction…Most of the criticisms directed at fuzzy logic are rooted in a misunderstanding of what it is and/or a lack of familiarity with it.”
I would not suggest that the time-average approach to probabilistic modeling and statistical inference is as deep a concept, as large a departure from orthodox thinking, or as broadly applicable as is fuzzy logic, but there are some definite parallels, and Professor Zadeh’s explanation of the roots of criticism of fuzzy logic applies equally well to the roots of criticism of the time-average approach as an alternative to the ensemble-average or, more accurately, the stochastic-process approach. In the case of fuzzy logic, its proponents are not saying that one must choose either conventional logic and conventional set theory or their fuzzy counterparts as two mutually exclusive alternative truths. Each has its own place in the world. Those opponents who argue vehemently that the unorthodox alternative is worthless can be likened to religious fanatics. This kind of intolerance should have no place in science. But it is all too commonplace and it has been so down through the history of science. So surely, one cannot expect to find its absence in connection with the time-average approach to probabilistic modeling and statistical inference. Even though experimentalists in time-series analysis (including communication systems analysis and other engineered-systems analysis) have been using the time-average approach (to various extents) for more than half a century, there are those like Gerr and Hinich who “see no obvious advantages.” This seems to imply that Mr. Gerr has one and only one interpretation of a time-average measurement on time series data—namely an estimate of some random variable in an abstract stochastic process model. To claim that this mathematical model is, in all circumstances, the preferred one is just plain silly.
David J. Thomson and the Transcontinental Waveguide –addition to published discussion:
[It is obvious in this example that there is no advantage to introducing the irrelevant abstraction of a stochastic process except to accommodate unfamiliarity with alternatives. Yet Gerr turns this around and says there is no obvious advantage to using the time-average framework.] It is correct in this case that a sufficiently capable person would obtain the same result using either framework, but it is incorrect to not recognize the mental gyrations required to force this physical problem into the stochastic process framework. My claim—and the reason I wrote the book [1]—is that our students deserve to be made aware of the fact that there are two alternatives. It is pigheaded to hide this from our students and force them to go through the unnecessary and sometimes confusing mental gyrations required to force-fit the stochastic process framework to real-world problems where it is truly an unnecessary and, possibly, even inappropriate artifice.
Gerr’s Letter—addition to published letter:
To further demonstrate the indefensibility of Gerr’s claim that the fraction-of-time probability concept has “no obvious advantages,” I cite two more examples to supplement the advantage of avoiding “unnecessary mental gyrations” that was illustrated using Thomson’s waveguide problem. The first example stems from the fact that the fundamental equivalence between time averaging and frequency smoothing, whose proof is outlined in the Appendix at the end of this letter, was first derived by using the fraction-of-time conceptual framework [1].
An Illustration of Blinding Prejudice
To further illustrate the extent to which Mr. Gerr’s prejudiced approach to scientific inquiry has blinded him, I have chosen one of his research papers on the subject of cyclostationary stochastic processes. In [5], Mr. Gerr (and his coauthor) tackle the problem of detecting the presence of cyclostationarity in an observed time-series. He includes an introduction and references sprinkled throughout that tie his work to great probabilists, statisticians, and mathematicians. (We might think of these as the “Saints” in Mr. Gerr’s One True Religion.) This is strange, since his paper is nothing more than an illustration of the application of a known statistical test (and a minor variation thereof) to synthetic data. It is even more strange that he fails to properly reference work that is far more relevant to the problem of cyclostationarity detection. But I think we can see that there is no mystery here. The highly relevant work that is not cited is authored by someone who champions the value of fraction-of-time probabilistic concepts. The fact that the relevant publications (known to Gerr) actually use the stochastic process framework apparently does not remove Mr. Gerr’s blinders. All he can see–it would seem–is that the author is known to argue (elsewhere) that the stochastic process framework is not always the most appropriate one for time-series analysis, and this is enough justification for Mr. Gerr to ignore the highly relevant work by this “heretic” author (author of the book [1] that Hinich all but said should be burned).
To be specific, Mr. Gerr completely ignores the paper [6] (published 1-1/2 years prior to the submission of Gerr’s paper) and the book [7] (published 4 years prior) wherein the problem of cyclostationarity detection is tackled using maximum-likelihood [6], maximum-signal-to-noise ratio [6], [7], and other optimality criteria, all of which lead to detection statistics that involve smoothed biperiodograms (and that also identify optimal smoothing) which are treated by Gerr as if they were ad hoc. Mr. Gerr also cites a 1990 publication (which does not appear in his reference list) that purportedly shows that the integrated biperiodogram (cyclic periodogram) equals the cyclic mean square value of the data (cf. (12)); but this is a special case of the much more useful result, derived much earlier than 1990, that the inverse Fourier transform of the cyclic periodogram equals the cyclic correlogram. The argument, by example, that Gerr proffers to show that (12) (the cyclic correlogram at zero lag) is sometimes a good test statistic and sometimes a bad one is trivialized by this Fourier transform relation (cf. [1]) and the numerous mathematical models for data for which the idealized quantities (cyclic autocorrelations, and cyclic spectral densities) in this relation have been explicitly calculated (cf. [1], [7]). These models include, as special cases, the examples that Gerr discusses superficially. The results in [1], [7] show clearly when and why the choice of zero lag made by Gerr in (12) is a poor choice. As another example, consider Mr. Gerr’s offhand remark that a Mr. Robert Lund (no reference cited) “has recently shown that for the current example (an AM signal with a square wave carrier) only lines [corresponding to cycle frequencies] spaced at even multiples of d=8 [the reciprocal of the period of the carrier] will have nonzero spectral (rz) measure.” This result was established in a more general form many years earlier in his coauthor’s Ph.D. dissertation (as well as in [1]) where one need only apply the extremely well-known fact that a symmetrical square wave contains only odd harmonics.
To go on, the coherence statistic that Gerr borrows from Goodman for application to cyclostationary processes has been shown in [7] to be nothing more than the standard sample statistic for the standard coherence function (a function of a single frequency variable) for two processes obtained from the one process of interest by frequency-shifting data transformations–except for one minor modification; namely, that time-averaged values of expected values are used in place of non-averaged expected values in the definition of coherence because the processes are asymptotically mean stationary, rather than stationary. Therefore, the well-known issues regarding frequency smoothing in these cross-spectrum statistics need not be discussed further, particularly in the haphazard way this is done by Gerr, with no reliance on analysis of specific underlying stochastic process models.
Continuing, the incoherent average (13) proposed by Gerr for use with the coherence statistic is the only novel contribution of this paper, and I claim that it is a poor statistic. The examples used by Gerr show that this “incoherent statistic” outperforms the “coherent statistic,” but what he does not recognize is that he chose the wrong coherent statistic for comparison. He chose the cyclic correlogram with zero lag (12), which is known to be a poor choice for his examples. For his example in Figure 9, zero lag produces a useless statistic, whereas a lag equal to T/2 is known to be optimum, and produces a “coherent statistic” that is superior to Gerr’s incoherent statistic. Thus, previous work [1], [7] suggests that a superior alternative to Gerr’s incoherent statistic is the maximum over a set of lag-indexed coherent statistics.
Finally, Mr. Gerr’s vague remarks about choosing the frequency-smoothing window-width parameter M are like stabs in the dark by comparison with the thorough and careful mathematical analysis carried out within–guess what–the time-average conceptual framework in [1] in which the exact mathematical dependence of bias and variance of smoothed biperiodograms on the data-tapering window shape, the spectral-smoothing window shape, and the ideal spectral correlation function for the data model are derived, and in which the equivalence between spectral correlation measurement and conventional cross-spectrum measurement is exploited to show how conventional wisdom [1, chapter 5, 7] applies to spectral correlation measurement [1,chapters 11, 13, 15].
In summary, Gerr’s paper is completely trivialized by previously published work of which he was fully aware. What appears to be his choice to “stick his head in the sand” because the author of much of this earlier highly relevant work was not a member of his One True Religion exemplifies what Gerr is trying to deny. Thus, I repeat it is indeed appropriate to liken those (including Gerr) who Gerr would like to call skeptics to religious fanatics who are blinded by their faith.
Conclusion
In closing this letter, I would like to request that Mr. Gerr refrain from writing letters to the editor on this subject. To say, as he does in his last letter, “There are many points on which Professor Gardner and I disagree, but only two that are worthy of further discussion,” is to try to worm his way out of the debate without admitting defeat. I claim to have used careful reasoning to refute beyond all reasonable doubt every point Mr. Gerr (and Mr. Hinich) has attempted to make. Since he has shown that he cannot provide convincing arguments based on fact and logic to support his position, he should consider the debate closed. To sum up the debate:
– The resolution, cited in the introductory section of my 2 July 1995 letter to the editor, in contrapositive form, was made by myself in [1].
– The resolution was challenged by Hinich and defended by myself in April 1994 SP Forum.
– Hinich’s challenge was supported and my defense was challenged by Gerr in October 1994 SP Forum.
– Gerr’s arguments were challenged by myself in January 1995 SP Forum.
– Gerr defended his arguments in March 1995 SP Forum.
– Gerr’s presumably-final defense was challenged and the final arguments in support of the resolution are made by myself in this letter.
APPENDIX – Proof of Equivalence Between Time-Averaged and Frequency-Smoothed Cyclic Periodograms
History and Equivalence of Two Methods of Spectral Analysis
Published in IEEE SIGNAL PROCESSING MAGAZINE, July 1996
The purpose of this article is to present a brief history of two methods of spectral analysis and to present, in a tutorial fashion, the derivation of the deterministic relationship that exists between these two methods
History
Two of the oldest and currently most popular methods of measuring statistical (average) power spectral densities (PSD’s) are the frequency smoothing method (FSM) and the time averaging method (TAM). The FSM was thought to have originated in 1930 with Norbert Wiener’s work on generalized harmonic analysis [1], and to have been rediscovered in 1946 by Percy John Daniell [2]. But it was discovered only a few years ago (cf. [3]) that Albert Einstein had introduced the method in 1914 [4]. The currently popular method of deriving the FSM begins by showing that adjacent frequency bins in the periodogram have approximately the same correct mean values and the same large variances, and are approximately uncorrelated with each other. Then, it is observed that averaging these bins together retains the correct mean value, while reducing the variance.
The TAM is often attributed to a 1967 paper by P.D. Welch in the IEEE Transactions on Audio and Electroacoustics [5], but in fact the earliest known proposal of the TAM was by Maurice Stevenson Bartlett in 1948 [6]. The reasoning behind the TAM is similar to that for the FSM: the periodograms on adjacent segments of a data record have approximately the same correct mean values and the same large variances, and they are approximately uncorrelated with each other. Therefore, averaging them together will retain the correct mean value, while reducing the variance. (A more detailed historical account of the FSM, TAM, and other methods is given in [7].) Essentially, every spectral analysis software package available today includes either the FSM or the TAM, or both, often in addition to others. These other methods include, for example, the Fourier transformed tapered autocorrelation method, attributed to Ralph Beebe Blackman and John Wilder Tukey [8] (but used as early as 1898 by Albert A. Michelson [9]); and various model fitting methods that grew out of pioneering work by George Udny Yule in 1927 [10] and Gilbert Walker in 1931 [11].
It is well known that both the FSM and the TAM yield PSD estimates that can be made to converge to the exact PSD in some probabilistic sense, like in mean square as the length of the data record processed approaches infinity, However, it is much less commonly known that these two methods are much more directly related to each other. The pioneering methods due to Michelson, Einstein, Wiener, Yule, and Walker were all introduced without knowledge of the concept of a stochastic process. But starting in the 1950s (based on the work of mathematicians such as Khinchin, Wold, Kolmogorov, and Cramér in the 1930s and 1940s , the stochastic-process point of view essentially took over. It appears as though this mathematical formalism, in which analysts focus on calculating means and variances and other probabilistic measures of performance, delayed the discovery of the deterministic relationship between the FSM and TAM for about 40 years. That is, apparently it was not until the non-stochastic approach to understanding statistical (averaged) spectral analysis was revived and more fully developed in [7] that a deterministic relationship between these two fundamental methods was derived.
The next section presents, in a tutorial fashion, the derivation of the deterministic relationship between the FSM and TAM, but generalized from frequency-smoothed and time-averaged versions of the periodogram to same for the biperiodogram (also called the cyclic periodogram [7]). This deterministic relationship is actually an approximation of the time-averaged biperiodogram (TAB) by the frequency-smoothed biperiodogram (FSB) and, of course, vice versa. For evidence of the limited extent to which this deterministic relationship is known, the reader is referred to letters that have appeared in the SP Forum section of this magazine in the October 1994, January 1995, March 1995, and November 1995 issues.
Equivalence
Definitions
Let be a data-tapering window satisfying for , let be its autocorrelation
and let be its Fourier transform
Let be the sliding (in time ) complex spectrum of data seen through window
Similarly, let be a rectangular window of width , centered at the origin, and let be the corresponding sliding complex spectrum (without tapering). Also, let be the sliding cyclic correlogram for the tapered data
and let be the sliding cyclic correlogram without tapering
To complete the definitions, let and be the sliding biperiodograms (or cyclic periodograms) for the data
Derivation
It can be shown (using ) that (cf. [7, Chapter 11])
The above approximation, namely
for , becomes more accurate as the inequality grows in strength (assuming that there are no outliers in the data near the edges of the -length segment, cf. exercise 1 in [7, Chapt. 3] exercise 4b in [7, Chapt. 5], and Section B in [7, Chapt. 11]). For example, if the data is bounded by , , and , then it can be shown that the error in this approximation is worst-case bounded by . The first and last equalities above are simply applications of the cyclic-periodogram/cyclic-correlogram relation first established in [7, Chapter 11] together with the convolution theorem (which is used in the last equality).
Interpretation
The left-most member of the above string of equalities (and an approximation) is a biperiodogram of tapered data seen through a sliding window of length and time-averaged over a window of length . If this average is discretized, then we are averaging a finite number of biperiodograms of overlapping subsegments over the -length data record. (It is fairly well known that little is gained – although nothing but computational efficiency is lost – by overlapping segments more than about 50 percent.) The right-most member of the above string is a biperiodogram of un-tapered data seen through a window of length and frequency-smoothed along the anti-diagonal , using a smoothing window , for each fixed diagonal . Therefore, given a -length segment of data, one obtains approximately the same result, whether one averages biperiodograms on subsegments (TAM) or frequency smoothes one biperiodogram on the undivided segment (FSM). Given , the choice of determines both the width of the frequency smoothing windows in FSM and the length of the subsegments in TAM. Given and choosing , one can choose either of these two methods and obtain approximately the same result (barring outliers within of the edges of the data segment of length . By choosing (i.e., ), we see the biperiodograms reduce to the more common periodograms, and the equivalence then applies to methods of estimation of power spectral densities, rather than bispectra. Bispectra are also called cyclic spectral densities and spectral correlation functions [7]. As first proved in [7], the FSM and TAM spectral correlation measurements converge to exactly the same quantity, namely, the limit spectral correlation function (when it exists), in the limit as and , in this order. Further this limit spectral correlation function, also called the limit cyclic spectral density, is equal to the Fourier transform of the limit cyclic autocorrelation, as first proved in [7], where this relation is called the cyclic Wiener relation because it generalizes the Wiener relation between the PSD and autocorrelation from to
where
with .
In the special circumstance where the inequality cannot be satisfied because of the degree of spectral resolution (smallness of , that is required, there is no known general and provable argument that either method is superior to the other. It has been argued that, since the TAM involves time averaging, it is less appropriate than the FSM for nonstationary data. The results presented here, however, show that, for , neither the TAM nor the FSM is more appropriate than the other for nonstationary data. And, when is not satisfied, there is no known evidence that favors either method for nonstationary data.
The derivation of the approximation between the FSM and TAM presented here uses a continuous-time model. However, a completely analogous derivation of an approximation between the discrete-time FSM and TAM is easily constructed. When the spectral correlation function is being measured for many values of the frequency-separation parameter, , the TAM, modified to what is called the FFT accumulation method (FAM), is much more computationally efficient than the FSM implemented with an FFT [12].
William A. Gardner
Professor, Department of Electrical and Computer Engineering
University of California,
Davis, CA.
References
The debate preceding the above final argument:
To appear at a later date: the missing parts of the chronological sequence of contributions to the debate from both sides, including Hinich’s review.
1 – April 1994, reprint of Hinich in SP Mag
2 – April 1994, Author’s Comments including Ensembles in Wonderland
3 – Oct 1994, Gerr’s comments
4 – Jan 1995, My comments, These comments have been posted below the article “Ensembles in Wonderland”
5 – March 1995, Gerr’s second try
6 – July 1995, my final response (inserted at the beginning above)
Jan 1995, My comments
It is hard for me to decide whether or not Mr. Gerr’s letter in the Forum section of the October 1994 issue of this magazine deserves a response. He does not seem to address the basic issue of whether or not fraction-of-time probability is a useful concept. This is the issue being debated, isn’t it? In fact, I cannot find one technical point in his letter that is both valid and clearly stated. But, because Mr. Gerr has clearly stated in his letter that, regarding philosophical issues in science and engineering, he prefers “New York” style vicious attacks like Hinich’s to carefully worded slyly mocking replies, like mine, it has occurred to me that I might get through a little better to the Mr. Gerrs out there if I tried my hand at being just a little vicious. I hope the readers will understand that I am new at this; I give them my apologies now in case I fail to overcome my propensity for writing carefully and, when appropriate, slyly.
Mr. Gerr’s letter reveals a lot of misunderstanding and this provides us with some insight into what may motivate vicious attacks on attempts to educate people about alternative ways to conceptualize problem solving. It is hard for me to imagine how Mr. Gerr could have missed the main point of my response to Hinich’s review. This point, which is clearly stated in both the book [1] under attack and the unappreciated response to this attack, is that, and I quote from my response,
“… there is really no basis for controversy. The only real issue is one of judgement—judgement in choosing for each particular time-series analysis problem the most appropriate of two alternative approaches.”
To argue against this point is to be a zealot in the truest sense of the word, fanatically fighting for the One True Religion in statistics.
Sociologists and psychologists tell us that vicious behavior is often the result of paranoia born out of ignorance. In the example before us, both Hinich and Gerr demonstrate substantial ignorance regarding nonstochastic statistical concepts and methods, including fraction-of-time (FOT) probability. This case has already been made for Hinich in the Forum section of the April 1994 issue of this magazine. So, let us consider Gerr’s letter. First off, Gerr admits to the kind of behavior that is supposed to have no place in science and engineering, by identifying himself as a “partisan spectator”. Webster’s Ninth New Collegiate Dictionary defines partisan as “a firm adherent to a party, faction, or cause, or person, especially one exhibiting blind, prejudiced, and unreasoning allegiance.” On the basis of this admission alone one has to wonder whether to continue reading Gerr’s letter or flip the page. (It’s interesting that Gerr is into partisanship and Hinich’s university appointment is in the Government Department.) But what the heck, let’s see if we can find some technical content in his letter.
Mr. Gerr’s first of three technical remarks is quoted here:
“For me, the statistical approach to signal analysis begins with a probabilistic model (e.g., ARMA) for the signal. The signal time series is viewed as a single realization and as data arising from the model. The time series data is used in conjunction with statistical techniques (e.g., maximum likelihood) to infer parameters, order, appropriateness, etc. of the model. The abstract notion of an infinite population plays no role.”
Not too surprisingly, it is difficult to tell what point Mr. Gerr is trying to make here. He starts with a probabilistic model and ends with a denial of the notion of a population. Would Mr. Gerr care to tell us how he interprets “probability” in “probabilistic model” if he denies the notion of population? My guess is that his thinking does not go this deep. But let’s try to extract some meaning by reading between the lines. In spite of his sympathy with Hinich, Mr. Gerr seems to be agreeing that the problem-solving machinery of probability theory (e.g., ARMA modeling and maximum likelihood estimation) can be used regardless of whether one conceptualizes its use in terms of stochastic probability (with its associated ensembles or populations) or in terms of fraction-of-time (FOT) probability. This is the point that is made by the book [1] under attack: This book does include ARMA models and the maximum likelihood method as parts of the nonstochastic theory. True to the “blind allegiance” definition of partisanship, Mr. Gerr is apparently agreeing with the book while sympathizing with the attack on the book. Either Mr. Gerr has not read the book at all, or he may simply not have thought hard enough and long enough about these things. This is important to point out because I suspect it is the primary reason that there is any controversy at all.
Mr. Gerr then goes on to admit that the FOT approach may be required for chaotic time series. But again, true to form, he then makes a remark that is difficult to interpret:
“…the fraction-of-time approach may be required, though not necessarily: in [1], it is shown that statistical model-fitting techniques developed for stochastic time series models can also be useful in fitting chaotic time series models.”
This sounds like Mr. Gerr is again confused about the fact that many probabilistic models can be interpreted or conceptualized in terms of either stochastic probability or FOT probability. Thus, regardless of the fact that a model was originally derived in the stochastic probability framework, it can—depending on the particular model—still be used (and/or rederived) in the FOT framework. In fact, AR models were originally derived within the FOT framework, not the stochastic framework [2] – [3]. This will probably surprise Mr. Gerr. And if he is not confused about this, then he is again agreeing with the book [1] whose attack he supports.
On the assumption that people working with stochastic processes would have enough of an understanding of the subject to compare it with the nonstochastic theory presented in [1], this comparison was not made very explicit in [1]. Responses to [1], such as those of Messrs. Hinich and Gerr, suggest that this assumption is false more often than it is true. To make up for this, an explicit comparison and contrast between the theories of stochastic processes and nonstochastic time-series is made in Chapter 1 of [4].
Mr. Gerr concludes his letter by considering transient time-series and erroneously concluding that time averaging a biperiodogram over successive blocks of data (which he identifies with FOT methodology) is inappropriate, whereas spectrally smoothing a biperiodogram is appropriate. Obviously, he does not realize that the infamous book [1] that proposes FOT concepts and methods shows that when the data block, over which spectral smoothing of the biperiodogram is performed, is partitioned into subblocks over which time averaging of the biperiodogram is performed instead, the results from these two methods can closely approximate each other if the subblock length and window shape are chosen properly. In other words, it is very clearly explained in [1] that the FOT framework for spectral analysis includes frequency smoothing as well as time-averaging methods. This again brings up the question, did Mr. Gerr read the book [1], and if so, did he comprehend anything?
It is my recommendation to Mr. Gerr, and others who would entertain joining this discussion of the merit of considering alternatives to stochastic thinking, that the book [1] that started the furor so nicely exemplified by Hinich’s review, and Chapter 1 of [4], be read carefully, the way they were written. This should be a prerequisite to criticism, vicious or otherwise.
Before closing this letter, I should point out that the so-called controversy that statisticians like Hinich and Gerr are promoting is about as productive as the statisticians’ endless debate between the “Bayesians” and the “frequentists” over whether or not prior probabilities (“prior” meaning “before data collection”) should be included in the One True Religion of statistics [5]. The debate is endless, because it is based on the faulty premise that there is One True Religion. In fact, the subject of our “controversy” is not unrelated to the Bayesian/frequentist debate. This debate dates back to the 1920s, and involves many well-known statisticians, some 40 of whom are referenced in [5] for their contributions to this debate. The conclusion in [5], published just last month, is, I am happy to report:
“The Bayesians have been right all along! And so have the frequentists! Both schools are correct (and better than the other) under specific (and complementary) circumstances . . . Neither approach will uniformly dominate the other . . . knowing when to [use] one or the other remains a tricky question. It is nonetheless helpful to know that neither approach can be ignored”
This is very encouraging! These pragmatic statisticians are attempting to dispel belief in the One True Religion.
I conclude this reply with a little dialogue that I find both amusing and supportive of my response to vicious attacks:
Can old dogs be taught new tricks?
Maybe, but the teacher might get barked at for trying.
Should the teacher accept the barking graciously?
Maybe, but if the old dogs band together into a pack, the teacher better bark back.
— William A. Gardner
REFERENCES
Content in preparation, 1 June 2020
Content in preparation, 1 June 2020
Content in preparation, 1 June 2020