What does “probability & statistics” mean? These two terms are often used together, but they are two distinct entities. Mathematical statistics is what you get when you use probability theory to model statistics. But probability exists in its own right as an abstract mathematical theory and statistics exists in its own right as a collection of empirical Empirical Data | noun Numerical quantities derived from observation or measurement in contrast to those derived from theory or logic alone.methods for analyzing data. The blend of probability and statistics is a whole that is bigger than the sum of its parts, but those who forget that statistics are empirical and probability is mathematical do so at their own conceptual peril.
To those who dig below the surface in the field of applied mathematical statistics involving time series of data, the following question arises: which of two alternative theories of probability should one apply to the statistics of interest in each application? The answer is that it depends on the statistics of interest. If I am designing a digital communication system and I want the bit-error rate for a received signal over time to be less than 1 bit-decision error in 100 bit-decisions, on average over time, then I want the fraction of time the bit decision is in error for this signal to less than 1/100, which is called the fraction-of-time (FOT) probability of a bit error.
On the other hand, If I am producing a large number of communication systems and I want the number of systems that make bit-decision errors at any arbitrary time to be less than 1 in a 100, on average over the ensemble of systems, then I want the fraction of systems that make errors to be less than 1/100. This is the relative frequency of bit errors, and it converges as the ensemble size grows without bound to the relative-frequency (RF) probability of bit error which is, according to Kolmogorov’s Law of Large numbers, the stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables.probability of the bit-error event. This is a purely theoretical quantity in an abstract mathematical model of an ensemble of signals (one from each system) called a stochastic process.
These two probabilities are distinct and, in general, there is no reason to expect them to equal each other.
Nevertheless, to “make things nice” by having these two probabilities equal each other, the ergodic hypothesis was introduced in studies of time-series data, like communications signals. (Actually, it was borrowed from earlier studies in Physics of dynamical systems of large numbers of particles). The hypothesis is that the limit over an infinitely long period of time of the FOT probability of an event involving a time function such as a signal is equal to the limit over an infinite ensemble of time functions of the RF probability, which in turn equals the abstract stochastic probability. At about the same time that this hypothesis was beginning to become popular, Birkhoff introduced his ergodic theorem which consists of the necessary and sufficient condition on an abstract stochastic process—a mathematical model—for these two probabilities to equal each other for that process.
Because it is typically impossible to prove that the necessary and sufficient condition for ergodicity holds in real-world applications, in practice analysts usually simply invoke the ergodic hypothesis without making any effort to validate it.
A source of confusion by some who invoke the ergodic hypothesis is thinking it is a hypothesis about the real data they are analyzing when, in fact, it is a hypothesis about the mathematical model they have adopted. Confusion surrounding the ergodic hypothesis can be avoided in many applications by first determining what is of primary interest in the application being studied: Is it the behavior of long time averages or the behavior of large ensemble averages? If it is the former, the analysist should simply adopt FOT probability and forget all about stochastic probability and the ergodic hypothesis.
As simple and self-evident as this truth is, some experts indoctrinated in the theory of stochastic processes argue that FOT probability is an abomination that has no place in mathematical statistics. The purpose of this Page 3 is to establish once and for all how absurd this extreme position is by addressing concerns about FOT probability that have been expressed in the past and extinguishing these concerns and associated claims that there is a controversy, through careful conceptualization, mathematical modeling, and straightforward discussion. As explained on this Page 3, there is no basis for controversy; there is simply a need to make a choice between two options for modeling probability in each application of interest.
Yet, there is a wrinkle: before the limit is taken in each of the alternative types of probability, FOT and RF, these quantities are both statistics—they are computed from finite amounts of empirical data. They can be interpreted as estimates of the limiting mathematical quantities, and they can exhibit some of the same properties as the mathematical quantities, but they are statistics, not probabilities. Moreover, the quantity that each converges to is just a number for a given set of statistics from any single execution of the underlying experiment. These quantities are not mathematical models. But the collection of all such numbers obtained from all possible sets of statistics from the repeated trials of the underlying experiment behave according to a probabilisticProbabilistic | noun Based on the theoretical concept of probability, e.g., a mathematical model of data comprised of probabilities of occurrence of specific data values. model. The explanation given here of this wrinkle is probably confusing to those who do not already know what is so tersely stated here. Nevertheless, the purpose of pages 3.1 through 3.6, following the remainder of the narrative below, is to explain the statement here and the equal mathematical footing of the two alternative types of probability in sufficient detail to remove all ambiguity of meaning, thereby putting to rest all hypothetical challenges to the validity of what is said here.
Colloquial saying: “If it ain’t broke, don’t fix it”.
Grammarian’s version: “If it isn’t broken don’t attempt to fix it.”
Regardless of how this is verbalized, the problem with how this way of thinking is often misapplied is that “It” IS often broken relative to what could be, but users are so accustomed to it that they don’t realize it could work much better.
Consider, as an example, the technology I used for preparing my doctoral dissertation in the early 1970s. I used an IBM Selectric typewriter and Snopake correction fluid (a fast-drying fluid that is opake and as “white as the driven snow”), which enables the typist to paint over a mistake and then retype on the dried paint (beware of retyping before the paint is dry). I used this same technology for the first two books I wrote in the mid-1980s, after writing several drafts in longhand. It seemed acceptable at the time but, in comparison with the word processing technology I used to prepare this website, it is abundantly clear just how broken that technology was. Of course, adopting the superior word processing technology required the effort to first learn how to operate a personal computer. This learning “hump” that writers needed to get over resulted in many potential benefactors avoiding (actually only postponing) the chore of “coming up to speed” with PCs. The paradigm shift began for some upon the 1984 release of the first Apple Macintosh system, following the 1976 release of the first Apple computer, and for others it began with the 1989 release of the first Microsoft Word application for PCs. Others began jumping on board throughout the 1990s and by the turn of the Century this paradigm shift was well on its way. Today, we have electronic research journals for which new knowledge need never be recorded on paper. Thankfully someone decided a long time ago that the IBM Selectric Typewrite was indeed broken. The term word processing was actually created way back in 1950 by Ulrech Steinhilper, a German IBM typewriter sales executive with vision.
So it goes with many users of stochastic processes today: they have used this tool for years—since around 1950—and they see it as unbroken and they want no part of coming up to speed on a replacement tool that they believe isn’t needed, even though they do not yet understand this new tool. Unfortunately, ways of thinking are harder to change than is accepting new technology.
The cyclostationarity paradigm shift did not really take off until several years following the publication of the seminal 1987 book [Bk2]. It seems the same is going to be true for the FOT-Probability paradigm shift, with this website playing a role similar to that played by the 1987 book. Interestingly, that book attempted to initiate this shift as well as the shift to cyclostationarity 35 years ago. But apparently, the relearning hump for replacing stochastic processes was found to be too high for many.
I believe most people who learn how to use the stochastic process concept and associated mathematical model tentatively accept the substantial level of abstraction it represents and, as time passes, become increasingly comfortable with that abstractness, and eventually accept it as a necessity and even as reality–something that should not be challenged. It is remarkable that our minds are able to adapt to such abstractions. At the same time, there are costs associated with unquestioning minds that accept such levels of abstraction without convincing themselves that there are no more-concrete alternatives. The position taken at this website is that the effectiveness with which the stochastic process model can be used in practice is limited by its level of abstraction—the typical absence of explicit specifications of both (1) its sample space (ensemble of sample paths) and (2) its probability measure defined on the sample space—and this in turn limits progress in conceiving, designing, and analyzing methods for statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. signal processing on the basis of such signal models.
There is a little-known (today) alternative to the stochastic process, which is much less abstract and, as a consequence, exposes fundamental misconceptions regarding stochastic processes and their use. The removal of the misconceptions that result from adoption of the alternative has enabled the Inventor to make significant advances in the theory and application of cyclostationary processes and more generally in data-adaptive statistical signal processing. Despite these advances, less questioning minds continue to ignore the role that the alternative has played in these advances and continue to try to force-fit the new knowledge into the unnecessarily abstract theory of stochastic processes. The alternative—the invention—is fully specified below on Page 3.1, and its consequential advances in understanding theory and method for random Random | adjectiveUnpredictable, but not necessarily modeled in terms of probability and not necessarily stochastic. signals are taught on Pages 3.2 and 3.3, where the above generalized remarks are made specific and are proven mathematically. This alternative is called Fraction-of-Time (FOT) Probability.
1 An Elevator Speech is a very concise speech about a new business concept that is intended to capture the interest of an investor during the short time he spends with the speaker in an elevator between floors in a building (e.g., on the way to a venture capital office).
Theme: For stationary and cyclostationary time series, a wrong turn in their mathematical modeling was taken almost a century ago. Today, Academia should engage in remediation to overcome the detrimental influence on the teaching and practice of time-series analysis in Science and Engineering.
Universality: This same theme is beginning to play out in the field of economics, as distinct from the various fields of science and engineering. This parallel trend speaks to the universality of the relevance of the proposed paradigm shift initiated in 1987. The parallels in recommendations going forward are, in fact, remarkable as illustrated in the two articles explaining the area of study called Ergodicity Economics: 1) and 2). However, the driving objective in Ergodicity Economics is to focus analysis on time averages of a single time series, rather than expected values over an ensemble of a non-ergodic process, whereas in the applications fields of interest focused on in this website, the driving objective is to focus analysis on time averages of single time series, because this is a more elegant way to proceed compared with introducing ergodic process models and using expected values.
The objective of this page is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the more abstract theory of Stochastic Processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Discussion of the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.
Before continuing, the point of Ergodicity Economics is explained in these excerpts from the item 1) above.
“In the real world, through the pages of scientific journals, in blog posts and in spirited Twitter exchanges, the set of ideas now called ‘Ergodicity Economics’ is overturning a fundamental concept at the heart of economics, with radical implications for the way we approach uncertainty and cooperation. The economics group at London Mathematical Laboratory is attempting to redevelop economic theory from scratch, starting with the axiom that individuals optimise what happens to them over time, not what happens to them on average in a collection of parallel worlds.”
“Expected utility theory has become so familiar to experts in economics, finance and risk-management in general that most see it as the obvious method of reasoning. Many see no alternatives. But that’s a mistake. This inspired London Mathematical Laboratory efforts to rewrite the foundations of economic theory, avoiding the lure of averaging over possible outcomes, and instead averaging over outcomes in time, with one thing happening after another, as in the real world. Many people – including most economists – naively believe that these two ways of thinking should give identical results, but they don’t. And the differences have big consequences, not only for people trying to do their best when facing uncertainty, but for the basic orientation of all of economic theory, and its prescriptions for how economic life might best be organised.
The upshot is that a subtle and mostly forgotten centuries-old choice in mathematical thinking has sent economics hurtling down a strange path. Only now are we beginning to learn how it might have been otherwise – and how a more realistic approach could help re-align economic orthodoxy with reality, to the benefit of all.”
Paraphrased by the WCM, the adoption of the probability theory of population statistics for economic studies was a wrong turn taken at the birth of the field of economics, and only now in the second decade of the 21st Century do we see that the fraction-of-time probability theory of non-population statistics is the correct path into the future.
The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.
Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.
Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website.
Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp with the continuous-time theory.
Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But the statistics sought with Monte Carlo simulations of stationary and cyclostationary time series can more easily be obtained from time averages on a single record instead of averages over independently produced records. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In fact, commercially available random sequence generators used for Monte Carlo simulations are actually time segments from a single long sequence. Consequently, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.3. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.
For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally trained employees who peter out and hire a new batch.
In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.
FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from anything empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.
BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t – s) (with some abuse of notation due to the use of x to denote two distinct functions). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.3. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.
The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of this Page 3. An in-depth tutorial analysis and discussion of the similarities and difference between the classical stochastic process model and the alternative mathematical model based on Wold’s ensemble for stationary processes and Gardner’s complementary ensemble for cyclostationary processes is provided on Page 3.2. Further investigation of the differences between the measure-theoretic foundations for these two alternative approaches to signal modeling is reported on, in tutorial fashion, on Page 3.3. Page 3.4 presents a perspective from the past and identifies some still unsolved problems, Page 3.5 provides a brief outline of the hierarchy, according to the level of empiricism, of statistical and probabilistic models for random signals, and Page 3.6 reproduces a published debate on the pros and cons of these two alternatives for modeling random signals. Unfortunately—as good debates go—the arguments against the FOT probability alternative are shallow, unconvincing, and in places erroneous. One can take this as an indication that opponents of FOT Probability simply do not have a strong position to argue from.
The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically driven work, in which the theory of stochastic processes surfaced, which ran its course in about 50 years. The mathematically driven development of stochastic processes has continued beyond that initial period, but has centered on primarily nonstationary processes, rather than primarily stationary processes. The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20th century and extending to the present.