The objective of this page 3 is to discuss the proper place in science and engineering of the fraction-of-time (FOT) probability model for time-series data, and to expose the resistance that this proposed paradigm shift has met with from those indoctrinated in the more abstract theory of Stochastic Processes, to the exclusion of the alternative FOT-probability theory. It is helpful to first consider the broader history of resistance to paradigm shifts in science and engineering. The viewer is therefore referred to Page 7, Discussion of the Detrimental Influence of Human Nature on Scientific Progress, as a prerequisite for putting this page 3 in perspective.
The motivation for this website is the recognition that a paradigm shift in theory has been very slowly underway since the mid-1980s, encompassing both stationary time series and those exhibiting cyclostationarity, and an older rudimentary form of this theory for stationary time series represents relatively common practice as a tool for research and product development and testing in industry, especially in fields of engineering involving time series analysis. Interestingly, however, is the fact that this theory, which is based on what is called Fraction-of-Time (FOT) Probability. has apparently not been integrated into curricula in academia. (There may be a few exceptions to this broad statement, but the WCM is not aware of them.) This disconnect between academia and engineering industry is likely due to a situation that arose between the 1930s and 1960s whereby mathematicians apparently shunned previous practice in applied work involving random functions of time, which practice exclusively worked with time-average statistics (FOT probability), and adopted a new theory that was more amenable to developing and proving theorems—the mathematician’s bread and butter—than to elegant conceptualization of practical problems to be solved and avoidance of unnecessary abstraction.
This new theory put forth by Kolmogorov in the early 1930s, was based on ensemble or sample-space average statistics. The problem with this rejection of the earlier practical approach is that there are many problems to be solved for which there is no counterpart in the real world to the Kolmogorov’s assumed sample space, namely a population of time functions from which one can draw samples at random. The simple fact that there exists a mathematically viable theory of non-population probability (viz., FOT probability) for individual time functions and their time-average statistics that can often be more appropriate in practice than the population probability theory for ensembles of functions seems to have been studiously ignored ever since its full-fledged introduction for both stationary times series and time series exhibiting cyclostationarity in the mid-1980s. It is conceivable that this situation is a result of the fact that the population probability theory is far more widely applicable, because it accommodates arbitrarily nonstationary (time-varying) probabilities, whereas the non-population probability theory accommodates only stationary (time-invariant), cyclostationary (periodic), and almost-cyclostationary (almost periodic) probabilities. An almost periodic function can be represented as a sum of periodic functions with incommensurate periods. Nevertheless, this more restricted applicability still accommodates a very broad array of practical applications—broad enough for this shunned theory to merit being a standard tool in every communication systems engineer’s analytic toolbox and likewise for rotating machinery engineers and other statistical analysts of time series data exhibiting statistical cyclicity, including those working in econometrics and a variety of natural sciences where rhythms arise.
In addition to the generality of arbitrary nonstationarity, this head-in-the-sand situation is supported by the fact that there are indeed many applications for statistical inference on time series data involving populations of functions—probably more than those not associated with a population. The argument presented here is that not only is population probability here to stay for good reason but also that non-population probability is equally deserving of the same status. Non-population probability theory for individual time functions is needed for all those applications for which no population of functions is available in the application of interest or, more strongly, no population can exist. Example? There is no scientific evidence that a population of cosmic universes exists, so to base statistical studies of astronomical time series data on an assumed population model of time series data is non-scientific. So, why is it being done? Perhaps the answer is as simple as the fact that mathematical statisticians have had a substantial influence over the general field of statistics, and population models of probability are generally preferred to non-population models for exclusively mathematical reasons, notably proving theorems. But this situation was admittedly artificially created by Kolmogorov (who expressed his reservations) by simply adopting stochastic process model axioms that rule out function behavior that can be less mathematically friendly. That such less friendly behavior of individual time functions can arise in the non-population probability model is simply a reflection of the practice of using time averages in empirical time series analysis [Napolitano, A., Gardner, W.A.: Fraction-of-time probability: Advancing beyond the need for stationarity and ergodicity assumptions. IEEE Access 10, 34591–34612 (2022)]. It is a reflection of reality, as ugly as some mathematician may find that reality.
To put it succinctly, there is no good reason for university instructors in engineering and science to choose between introducing students exclusively to population probability or non-population probability. It is a simple matter to supplement standard introductory treatments of stochastic processes (population probability for functions) with a more brief introduction to the alternative random functions (non-population probability for functions). This equips students after graduation to make an intelligent choice when presented with a practical problem involving time functions and a desire to perform probabilistic or statistical analysis of function behavior. With the status quo, the vast majority of our graduates are completely unaware of non-population probability and are known to simply default to stochastic processes, believing it is the only option available to them.
There is an accumulating number of endorsements in support of returning to the now-fully-developed version of the earlier nascent non-population theory of probability for single functions.
As an illustrative example, the field of Signals Intelligence, and particularly Communications Intelligence, was revolutionized by Gardner’s initial 1987 revelation of his non-population theory of cyclostationarity and his demonstration of its applicability to Communication Signal Interception (cf., (Gardner, 2018a, pp. 6 and 12)). (Despite this key fact, researchers typically use the alternative stochastic process version of Gardner’s theory, which he introduced simultaneously in his 1985 book [Bk3], most likely because of indoctrination in our universities, despite Gardner’s demonstration of major new insights gained from using the non-population theory in signal interception.) More recently, the nascent movement in Econometrics referred to as Ergodicity Economics is in essence a return from population-probability models and methods to their non-population probability counterpart [Peters, O.: The ergodicity problem in economics. Nature Physics 15(12), 1216–1221 (2019)]. The editorial introducing the issue of Nature/Physics [Time to move beyond average thinking. Nature Physics 15(12), 1207 (2019)] containing the above article is in complete alignment with the editorial remarks on the wisdom of a return to non-population probability the Authors have included throughout a number of their publications, such as [Napolitano, A., Gardner, W.A.: Fraction-of-time probability: Advancing beyond the need for stationarity and ergodicity assumptions. IEEE Access 10, 34591–34612 (2022)]. And this is in complete alignment with the remarks from the editor of the Journal of Sound and Vibration cited in the article Gardner (2023) [Gardner, W.A.: Transitioning away from stochastic process models. Journal of Sound and Vibration, 117871 (2023)]. Other endorsements from leaders in the field that are published in this article include the following:
Professor Enders A. Robinson, originator of the digital revolution in geophysics, and highest honored scientist in the field of geophysics, in a published review of the book [Bk2] introducing FOT-Probability theory [Signal Processing, EURASIP, and Journal of Dynamical Systems, Measurement, and Control, ASME, 1990], wrote:
“This book can be highly recommended to the engineering profession. Instead of struggling with many unnecessary concepts from abstract probability theory, most engineers would prefer to use methods that are based upon the available data. This highly readable book gives a consistent approach for carrying out this task. In this work Professor Gardner has made a significant contribution to statistical spectral analysis, one that would please the early pioneers of spectral theory and especially Norbert Wiener.”
Similarly, the following quotation from Professor Ronald N. Bracewell – recipient of the IEEE’s Heinrich Hertz medal for pioneering work in antenna aperture synthesis and image reconstruction as applied to radio astronomy and to computer-assisted tomography – taken from his Foreword to the book [Bk2], introducing FOT-Probability theory, makes essentially the same point that Robinson makes:
“If we are to go beyond pure mathematical deduction and make advances in the realm of phenomena, theory should start from the data. To do otherwise risks failure to discover that which is not built into the model . . . Professor Gardner’s book demonstrates a consistent approach from data, those things which in fact are given, and shows that analysis need not proceed from assumed probability distributions or random processes. This is a healthy approach and one that can be recommended to any reader.”
Not to belabor the point, but even the information theorist, Professor James Massey – Professor of Digital Technology at ETH Zurich, IEEE Alexander Graham Bell medalist and member of the National Academy of Engineering – wrote, in a 1986 prepublication review of the book [Bk2], “I admire the scholarship of this book and its radical departure from the stochastic process bandwagon of the past 40 years.”
Clearly, evidence in support of Gardner’s 1987 proposal for a paradigm shift in time series analysis is mounting and suggests that the shift is solidly underway now. But there’s no reflection of this in university curricula! Perhaps, overwhelmed by the breakneck pace of technological advances, in electrical engineering for example, university faculty members may have simply had too many other pressing matters in keeping curricula up to date.
In order to be more concrete about these two alternative meanings of probability, this overview is concluded with a detailed narrative explanation of the exact difference between the two meanings.
The classical meaning of Probability and Random Variables is taught exclusively in terms of populations which are called samples spaces. When an experiment is performed, the sample space is defined to be the set of all possible elementary or indecomposable experimental outcomes. For example, when a noise voltage across the terminals of an electrically resistive device is measured, the relevant sample space for this experiment can be said to contain all the electrical states of the device described by positions and velocities of charge carriers.
The sample space is often not analytically described in any detail, it is used mostly as a concept. For example, a random variable is defined to be a real-valued function on a sample space. Similarly, probability is defined to be a real-valued function on all the well-behaved subsets of a sample space. Each such subset is called an event. When an experiment is performed, various events can occur. Every event subset of the sample space that contains the one sample point that occurs upon execution of the experiment is said to have occurred. An example of four such events is Event-1 = noise voltage is positive, Even-2 = noise voltage is less than 10 volts, Event-3 = noise voltage is between 2 and 3 volts, and Event-4 = noise voltage is between -1 and 5 volts. If the measured voltage is 2.6 volts, all four of these events occur.
In order for the function of event sets to be a valid probability for a given sample space, it must satisfy three axioms:
Axiom 1: The probability of any event is a non-negative real number
Axiom 2: The probability of the event consisting of the entire sample space is 1
Axiom 3: For any collection of mutually exclusive events, the probability of their union is the sum of their individual probabilities
This universally accepted notion and formal definition of probability has proven itself to be of fundamental importance in analysis of matters of chance. In particular, to consider just one technical area where this notion defines the field of study, we have functions of time that involve matters of chance. Examples include information bearing signals, whose embedded information content is usefully modeled in terms of probability and random variables.
However, there are some matters of chance that arise throughout electrical engineering, for example, for which an alternative concept of probability has proven itself to also be of fundamental importance. This concept is actually crucial in empirical work in the subfield of electrical engineering we call random signals and noise. Outside of electrical engineering proper, this subfield is often referred to as statistical time series analysis, or random functions. In these descriptors, the term “random” is here intended to mean nothing more than signals or noise or other functions with uncertain behavior not necessarily having anything to do with the traditional concept of population probability described above.
Empirical work involving uncertain functions is commonly conducted using time averages of the uncertain functions and time averages of functions of these uncertain functions such as the square of the uncertain function or the indicator function which takes on the value 1 at times for which the uncertain function exhibits behavior defining an event of interest and takes on the value of 0 for all other times.
Despite the ubiquity of this work based on these times averages, it is generally unknown today that there is a mature comprehensive theory of probability based on times averages of functions, which has nothing to do with the traditional concept of probability based on sample spaces or, equivalently, populations of functions. All the quantities encompassed by this alternative probability theory are defined in terms of measurements on a single function of time without any reference to a population of functions. Such quantities give rise to the most elegant theory when the averages are taken to be the limit as averaging time approaches infinity. This is analogous to the considerable utility of the law of large numbers in traditional probability theory, for which the expected value is arrived at by taking the limit of an average over a set of random samples from a sample space, as the set size approaches infinity.
The time average of the event-indicator function produces the length of time over which the event of interest occurs divided by the total length of time averaged over. That is, it is the fraction of time (FOT) over which an event of interest occurs. This can also be referred to as the relative frequency of occurrence over time of the event. In the limit as the averaging time approaches infinity, this relative frequency defines the FOT probability of the event. In comparison, the population probability of an event is (by the law of large numbers) the limit of the relative frequency of occurrence of the event among repeated random samples taken from the sample space, which also is a relative frequency of occurrence.
Despite this fundamental conceptual difference between the traditional population probability theory and the alternative non-population probability theory of functions, the mechanics of using these theories are essentially identical! The mathematics of the two theories based on alternative relative frequencies can be said to be dual. The crucial difference between these theories is that one theory can only be applied scientifically if there exists an actual population associated with the experiments being studied, whereas the other does not in any way involve populations and can be applied scientifically to a single function of time. Because of this crucial distinction, the alternative probability theory, when it applies, is more elegant and less abstract and, as a result, is able to be used more effectively. This has been proven in several fields of endeavor dating back to before the turn of the century including the fields of communications systems and signals intelligence systems, particularly for communications signals. In this latter field, the subject of signal interception has been revolutionized by the non-population probability theory of cyclostationarity (much of this work is protected by governments for national security). In fact, the cyclostationarity paradigm shift that is now mature was achieved by replacing the initial stochastic process formulation (the population-probability theory of time functions) with the fraction-of-time probability formulation (the non-population probability theory of time functions). More recently, in a nascent paradigm shift in econometrics, which goes by the somewhat misleading name “ergodicity economics” and which some say is revolutionizing econometrics, the inappropriateness of sample space averages for some purposes is being recognized and replaced with time averages.
As suggested by the term ergodicity economics, this paradigm shift from population probability to non-population probability has a connection to the concept and theory of ergodicity of samples spaces and probability measures. This historical connection is somewhat of a red herring on the path toward understanding probability modeling today. It is primarily useful for seeing the extremes to which analysts were forced when laboring under the delusion that population probability is the only probability available to them for studying uncertain functions. This unpleasant aspect of history can be summarized and dispensed with as follows.
Having only a population-based probability theory in the analysts mathematical tool box, while being faced with practical problems involving uncertain functions for which no populations exist, and while recognizing the considerable utility of time averaged measurements on the single function, the ergodic hypothesis was adopted: “Assume that the stochastic process model adopted is ergodic, meaning that sample space averages and time averages of any measurement both converge to the same expected values”.
The famous ergodic theorem of Birkhoff published in the early 1930s provided the necessary and sufficient mathematical condition on the probability measure for the sample space of a stochastic process to be ergodic. But it is a rare occasion in practice when the analyst is able to apply this theorem to a specific stochastic process model and determine whether or not that model is ergodic. So, many analysts continued with the non-scientific practice of using the ergodic hypothesis without ever determining if it is true or false for the particular model it was being applied to. Mathematical analysis was carried out based on the stochastic process model using expectation, and then time averages would be measured in experimental work with the objective of verifying the mathematical results, not knowing if there was any theoretical connection between the two results. Moreover, algorithms derived from the mathematics were often modified by replacing expected values in mathematically derived results with time averages so they could be implemented in applications for which there was no population of functions.
This unsavory work still goes on today. Why? Because academia has for the most part not updated its curricula to reflect the 37-year-old fact that non-population probability exists and is characterized by a complete theory that is highly analogous to the population-probability theory of stochastic processes that academia continued to teach exclusively.
To complete this overview of a paradigm shift between two alternative meanings of probability, it is important to surface two additional facts concerning these alternatives. The first fact favors the stochastic process. Non-population probability enables us to build mathematical models of functions for which the probabilities are time invariant and are called stationary. On the other hand, stochastic process models can be comprised of probabilities that vary in time in an almost arbitrarily non-stationary manner. This opens up the theory to many more applications. The only catch is that such applications must involve populations if this problem solving approach is to be considered scientific. Fortunately, there is no shortage of such applications. And for this reason, the subject of stochastic processes is very broad and has many applications.
Nevertheless, there is a way to modify how the time averages are calculated in order to generalize the definition of non-population probability so as to accommodate a very specific type of nonstationarity, which in the simplest case is called cyclostationarity and in the less simple case is called almost cyclostationarity. The modification required entails replacing the conventional uniformly weighted time average with sinusoidally weighted time averages. This requires some relatively detailed explanations and is excluded from this overview. However, it is important to mention here that the generalization to almost cyclostationary probability introduces a third distinct meaning of probability, in which the interpretation as a fraction-of-time of occurrence of an event is forfeited. Yet, there is no need to revert to sample spaces and population probability: the almost periodically time varying probabilities are constructed by combining genuine periodically time-varying FOT probabilities having incommensurate periods. Because important characteristics of cyclostationarity are characteristics of individual sample functions, not of expected functions representing average behavior over the sample space, the stochastic process can mask these characteristics that are surfaced with the non-population probability calculations. An example is the characteristic of a function of being able produce a finite strength additive sine wave component by being subjected to a memoryless nonlinear transformation, e.g., a simple squaring of the function. This characteristic has nothing to do with populations, expected values, and other facets of stochastic processes. The sine-wave generation property of a stochastic process is an expected property and can be present even if some sample paths are unable to generate such sine waves. In fact, a stochastic process can have a probability greater than 0 and less than 1 that its square will contain such sinewaves, whereas in the non-population theory, a function either can or cannot generate such sine waves. The amplitude and phase of such a sine wave do not have probability distributions over an population.
What does “probability & statistics” mean? These two terms are often used together, but they are two distinct entities. Mathematical statistics is what you get when you use probability theory to model statistics. But probability exists in its own right as an abstract mathematical theory and statistics exists in its own right as a collection of empirical methods for analyzing data. The blend of probability and statistics is a whole that is bigger than the sum of its parts, but those who forget that statistics are empirical and probability is mathematical do so at their own conceptual peril.
To those who dig below the surface in the field of applied mathematical statistics involving time series of data, the following question arises: which of two alternative theories of probability should one apply to the statistics of interest in each application? The answer is that it depends on the statistics of interest. If I am designing a digital communication system and I want the bit-error rate for a received signal over time to be less than 1 bit-decision error in 100 bit-decisions, on average over time, then I want the fraction of time the bit decision is in error for this signal to less than 1/100, which is called the fraction-of-time (FOT) probability of a bit error.
On the other hand, If I am producing a large number of communication systems and I want the number of systems that make bit-decision errors at any arbitrary time to be less than 1 in a 100, on average over the ensemble of systems, then I want the fraction of systems that make errors to be less than 1/100. This is the relative frequency of bit errors, and it converges as the ensemble size grows without bound to the relative-frequency (RF) probability of bit error which is, according to Kolmogorov’s Law of Large numbers, the stochastic probability of the bit-error event. This is a purely theoretical quantity in an abstract mathematical model of an ensemble of signals (one from each system) called a stochastic process.
These two probabilities are distinct and, in general, there is no reason to expect them to equal each other.
Nevertheless, to “make things nice” by having these two probabilities equal each other, the ergodic hypothesis was introduced in studies of time-series data, like communications signals. (Actually, it was borrowed from earlier studies in Physics of dynamical systems of large numbers of particles). The hypothesis is that the limit over an infinitely long period of time of the FOT probability of an event involving a time function such as a signal is equal to the limit over an infinite ensemble of time functions of the RF probability, which in turn equals the abstract stochastic probability. At about the same time that this hypothesis was beginning to become popular, Birkhoff introduced his ergodic theorem which consists of the necessary and sufficient condition on an abstract stochastic process—a mathematical model—for these two probabilities to equal each other for that process.
Because it is typically impossible to prove that the necessary and sufficient condition for ergodicity holds in real-world applications, in practice analysts usually simply invoke the ergodic hypothesis without making any effort to validate it.
A source of confusion by some who invoke the ergodic hypothesis is thinking it is a hypothesis about the real data they are analyzing when, in fact, it is a hypothesis about the mathematical model they have adopted. Confusion surrounding the ergodic hypothesis can be avoided in many applications by first determining what is of primary interest in the application being studied: Is it the behavior of long time averages or the behavior of large ensemble averages? If it is the former, the analysist should simply adopt FOT probability and forget all about stochastic probability and the ergodic hypothesis.
As simple and self-evident as this truth is, some experts indoctrinated in the theory of stochastic processes argue that FOT probability is an abomination that has no place in mathematical statistics. The purpose of this Page 3 is to establish once and for all how absurd this extreme position is by addressing concerns about FOT probability that have been expressed in the past and extinguishing these concerns and associated claims that there is a controversy, through careful conceptualization, mathematical modeling, and straightforward discussion. As explained on this Page 3, there is no basis for controversy; there is simply a need to make a choice between two options for modeling probability in each application of interest.
Yet, there is a wrinkle: before the limit is taken in each of the alternative types of probability, FOT and RF, these quantities are both statistics—they are computed from finite amounts of empirical data. They can be interpreted as estimates of the limiting mathematical quantities, and they can exhibit some of the same properties as the mathematical quantities, but they are statistics, not probabilities. Moreover, the quantity that each converges to is just a number for a given set of statistics from any single execution of the underlying experiment. These quantities are not mathematical models. But the collection of all such numbers obtained from all possible sets of statistics from the repeated trials of the underlying experiment behave according to a probabilistic model. The explanation given here of this wrinkle is probably confusing to those who do not already know what is so tersely stated here. Nevertheless, the purpose of pages 3.1 through 3.6, following the remainder of the narrative below, is to explain the statement here and the equal mathematical footing of the two alternative types of probability in sufficient detail to remove all ambiguity of meaning, thereby putting to rest all hypothetical challenges to the validity of what is said here.
Colloquial saying: “If it ain’t broke, don’t fix it”.
Grammarian’s version: “If it isn’t broken don’t attempt to fix it.”
Regardless of how this is verbalized, the problem with how this way of thinking is often misapplied is that “It” IS often broken relative to what could be, but users are so accustomed to it that they don’t realize it could work much better.
Consider, as an example, the technology I used for preparing my doctoral dissertation in the early 1970s. I used an IBM Selectric typewriter and Snopake correction fluid (a fast-drying fluid that is opake and as “white as the driven snow”), which enables the typist to paint over a mistake and then retype on the dried paint (beware of retyping before the paint is dry). I used this same technology for the first two books I wrote in the mid-1980s, after writing several drafts in longhand. It seemed acceptable at the time but, in comparison with the word processing technology I used to prepare this website, it is abundantly clear just how broken that technology was. Of course, adopting the superior word processing technology required the effort to first learn how to operate a personal computer. This learning “hump” that writers needed to get over resulted in many potential benefactors avoiding (actually only postponing) the chore of “coming up to speed” with PCs or Macs (Apple computers). The paradigm shift began for some upon the 1984 release of the first Apple Macintosh system, following the 1976 release of the first Apple computer, and for others it began with the 1989 release of the first Microsoft Word application for PCs. Others began jumping on board throughout the 1990s and by the turn of the Century this paradigm shift was well on its way. Today, we have electronic research journals for which new knowledge need never be recorded on paper. Thankfully someone decided a long time ago that the IBM Selectric Typewrite was indeed broken. The term word processing was actually created way back in 1950 by Ulrech Steinhilper, a German IBM typewriter sales executive with vision.
So it goes with many users of stochastic processes today: they have used this tool for years—since around 1950—and they see it as unbroken and they want no part of coming up to speed on a replacement tool that they believe isn’t needed, even though they do not yet understand this new tool. Unfortunately, ways of thinking are harder to change than is accepting new technology.
The cyclostationarity paradigm shift did not really take off until several years following the publication of the seminal 1987 book [Bk2]. It seems the same is going to be true for the FOT-Probability paradigm shift, with this website playing a role similar to that played by the 1987 book. Interestingly, that book attempted to initiate this shift as well as the shift to cyclostationarity 35 years ago. But apparently, the relearning hump for replacing stochastic processes was found to be too high for many.
An Elevator Speech is a very concise speech about a new business concept that is intended to capture the interest of an investor during the short time he spends with the speaker in an elevator between floors in a building (e.g., on the way to a venture capital office).
I believe most people who learn how to use the stochastic process concept and associated mathematical model tentatively accept the substantial level of abstraction it represents and, as time passes, become increasingly comfortable with that abstractness, and eventually accept it as a necessity and even as reality–something that should not be challenged. It is remarkable that our minds are able to adapt to such abstractions. At the same time, there are costs associated with unquestioning minds that accept such levels of abstraction without convincing themselves that there are no more-concrete alternatives. The position taken at this website is that the effectiveness with which the stochastic process model can be used in practice is limited by its level of abstraction—the typical absence of explicit specifications of both (1) its sample space (ensemble of sample paths) and (2) its probability measure defined on the sample space—and this in turn limits progress in conceiving, designing, and analyzing methods for statistical signal processing on the basis of such signal models.
There is a little-known (today) alternative to the stochastic process, which is much less abstract and, as a consequence, exposes fundamental misconceptions regarding stochastic processes and their use. The removal of the misconceptions that result from adoption of the alternative has enabled the Inventor to make significant advances in the theory and application of cyclostationary processes and more generally in data-adaptive statistical signal processing. Despite these advances, less questioning minds continue to ignore the role that the alternative has played in these advances and continue to try to force-fit the new knowledge into the unnecessarily abstract theory of stochastic processes. The alternative—the invention—is fully specified below on Page 3.1, and its consequential advances in understanding theory and method for random signals are taught on Pages 3.2 and 3.3, where the above generalized remarks are made specific and are proven mathematically. This alternative is called Fraction-of-Time (FOT) Probability.
As explained in this section, there are various choices to be made in deciding how best to present the more mathematical details of the theory of Non-Population Probability, and the choices made here are all based on pedagogy, not economics, or technology, or current fads, or idiosyncrasies.
These pedagogical considerations are explained in the following discussion.
The macroscopic world that our five senses experience—sight, hearing, smell, taste and touch—is analog: forces, locations of objects, sounds, smells, temperature, and so on change continuously in time and space. Such things varying in time and space can be mathematically modeled as functions of continuous time and space variables, and calculus can be used to analyze these mathematical functions. For this reason, developing an intuitive real-world understanding of time-series analysis, and as an example spectral analysis of time-records of data from the physical world, requires that continuous-time models and mathematics of continua be used.
Unfortunately, this is at odds with the technology that has been developed in the form of computer applications and digital signal processing (DSP) hardware for carrying out mathematical analysis, calculating spectra, and associated tasks. This technology is based on discrete-time and discrete function-values, the numerical values of quantized and digitized time samples of various quantitative aspects of phenomena or of continuous-time and -amplitude measurements. Therefore, in order for engineers, scientists, statisticians, and others to design and/or use the available computer tools and DSP Hardware for data analysis and processing at a deeper-than-superficial level, they must learn the discrete-time theory of the methods available—the algorithms implemented on the computer or in DSP Hardware. The discreteness of the data values that this equipment processes can be ignored in the basic theory of statistical spectral analysis until the question of accuracy of the data representations subjected to analysis and processing arises. Then, the number of discrete-amplitude values used to represent each time sample of the original analog data, which determines the number of bits in a digital word representing a data value, becomes of prime importance as does the numbers of time samples per second. This discretization of time-series data values and time indices both affect the processing of data in undesirable ways, including spectral aliasing and nonlinear effects.
Consequently, essentially every treatment of the theory of spectral analysis and statistical spectral analysis available to today’s students of the subject presents a discrete-time theory. This theory must, in fact, be taught for obvious reasons but, from a pedagogical perspective, it is the Content Manager’s tenet that the discrete-time digital theory should be taught only after students have gained an intuitive real-world understanding of the principles of spectral analysis of continuous-time analog data, both statistical and non-statistical analysis. And this requires that the theory they learn be based on continuous-time mathematical models. This realization provides the motivation for the treatment presented at this website.
Certainly, for non-superficial understanding of the use of digital technology for time-series analysis, the discrete-time theory must be learned. But for even deeper understanding of the link between the physical phenomena being studied and the analysis and processing parameters available to the user of the digital technology, the continuous-time theory must also be learned. In fact, because of the additional layer of complexity introduced by the approximation of analog data with digital representations, which is not directly related to the principles of analog spectral analysis, an intuitive comprehension of the principles of spectral analysis, which are independent of the implementation technology, are more transparent and easier to grasp with the continuous-time theory.
Similarly, the theory of statistical spectral analysis found in essentially every treatment available to today’s students is based on the stochastic-process model. This model is, for many if not most signal analysis and processing applications, unnecessarily abstract and forces a detachment of the theory from the real-world data to be analyzed or processed, and this is so even when analysts think they need to perform Monte Carlo simulations of data analysis or processing methods involving stationary and cyclostationary time series. To be sure, such simulations are extremely common and of considerable utility. But the statistics sought with Monte Carlo simulations of stationary and cyclostationary time series can more easily be obtained from time averages on a single record instead of averages over independently produced records. Moreover, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. In fact, commercially available random sequence generators used for Monte Carlo simulations are actually time segments from a single long sequence. Consequently, knowing only a statistical theory of ensembles of data records (stochastic processes) is a serious impediment to intuitive real-world understanding of the principles of analysis, such as statistical spectral analysis, of single records of time-series data. Worse yet, as explained on Page 3.3. the theory of stochastic processes tells one nothing at all about a single record. For the most part, the theory of stochastic processes is not a statistical theory, it is a much more abstract probabilistic theory. And, when probabilistic analysis is desired, it can be carried out for a single time-series using FOT probability, thereby avoiding the unnecessary abstraction of stochastic processes.
For this reason, it is the Content Manager’s tenet that for the sake of pedagogy the discrete-time digital stochastic-process theory of statistical spectral analysis should be taught only after students have gained an intuitive real-world understanding of the principles of statistical spectral analysis of continuous-time analog non-stochastic data models, and only as needed. This avoids the considerable distractions of the nitty-gritty details of digital implementations and the equally distracting abstractions of stochastic processes. No one who is able to be scientific can successfully argue against this fact. The arguments that exist and explain the other fact—that the theory and method of discrete-time digital spectral analysis of stochastic processes is essentially the exclusive choice of university professors and of instructors in industrial educational programs—are non-pedagogical. The arguments are based on economics—directly or indirectly: 1) the transition in philosophy that occurred along with first the electrical revolution and second the digital revolution (not to mention the space-technology revolution and the military/industrial revolution)—from truly academic education to vocational training in schools of engineering (and in other fields of study as well); 2) economic considerations in the standard degree programs in engineering (and other technical fields)—B.S., M.S., and Ph.D. degrees—limit the amount of course-work that can be required for each subject in a discipline; 3) economic considerations of the students studying engineering limit the numbers of courses they take that are beyond what is required for the degree they seek; motivations of too many students are shortsighted and focused on immediate employability and highest pay rate, which are usually found at employers chasing the latest economic opportunity; 4) motivations of professors and industry instructors are affected by faculty-rating systems which are affected by university-rating systems: numbers of employable graduates produced each year reign, and industry defines “employability”. Businesses within a capitalistic economy typically value immediate productivity (vocational training) over long-range return on investment (education) in its employees. The problem with vocational training in the modern world is that the lifetime of utility of the vocation trained for today is over in ten years, give or take a few years. Industry can discard those vocationally trained employees who peter out and hire a new batch.
In closing this argument for the pedagogy adopted for this website, the flaw in the argument “we don’t have time to teach both the non-stochastic and stochastic theories of statistical spectral analysis” is exposed, leaving no rational excuse for continuing with the poor pedagogy that we find today at essentially every place so-called statistical spectral analysis is taught. And the same argument applies more generally to other types of statistical analysis.
FACT: For many operational purposes, the relatively abstract stochastic-process theory and its significant difference from most things empirical can be ignored once the down-to-earth probabilistic interpretation of the non-stochastic theory is understood.
BASIS: The basis for this fact is that one can define all the members of an ensemble of time functions x(t, s), where s is the ensemble-member index for what can be called a stochastic process x(t), by the identity x(t, s) = x(t – s) (with some abuse of notation due to the use of x to denote two distinct functions). Then the time-averages in terms of which the non-stochastic theory is developed become ensemble averages, or expected values, which are operationally equivalent for many purposes to the expected values in terms of which the theory of the classically defined stochastic process is developed. In other words, the non-stochastic theory of statistical spectral analysis has a probabilistic interpretation that is operationally identical for many purposes to that of the stochastic-process theory. For convenience in discussion, the modifier “for many purposes” of the terms “operationally equivalent” and “operationally identical” can be replaced with the modified terms “almost operationally equivalent” and “almost operationally identical”. For stationary stochastic processes, which is the model adopted for the stochastic theory of statistical spectral analysis, this “trick”—which is rarely if ever mentioned in the manner it is here, in courses on the subject—is known as Wold’s Isomorphism [Bk1], [Bk2], [Bk3], [Bk5]. As a matter of fact, though, the ensemble of a classically defined stochastic process cannot actually be so transparently visualized; it is far more abstract than Wold’s ensemble. Yet, it has almost no operational advantage. To clarify those operational purposes where this equivalence does not hold, one must delve into the mathematical technicalities of measure theory. This is done on Page 3.3. Such technicalities of measure theory are rarely of any utility to practitioners, except in that they refute the shallow claim by those who are stuck in their ways that the FOT probability theory has no measure-theoretic basis.
The WCM introduced a counterpart of Wold’s Isomorphism that achieves a very similar stochastic-process interpretation of a single time-series for cyclostationary processes and something similar to that for poly-cyclostationary stochastic processes [Bk1], [Bk2], [Bk3], [Bk5]. This, together with a deep and broad discussion of the differences between the classically defined stochastic process and its almost operationally equivalent FOT-probabilistic model is the subject of the subsections of this Page 3. An in-depth tutorial analysis and discussion of the similarities and difference between the classical stochastic process model and the alternative mathematical model based on Wold’s ensemble for stationary processes and Gardner’s complementary ensemble for cyclostationary processes is provided on Page 3.2. Further investigation of the differences between the measure-theoretic foundations for these two alternative approaches to signal modeling is reported on, in tutorial fashion, on Page 3.3. Page 3.4 presents a perspective from the past that is startling in terms of the number of unpleasant issues that arise from stochastic process models of cyclostationarity and identifies some still unsolved problems, Page 3.5 provides a brief outline of the hierarchy, according to the level of empiricism, of statistical and probabilistic models for random signals, and Page 3.6 reproduces a published debate on the pros and cons of these two alternatives for modeling random signals. Unfortunately—as good debates go—the arguments against the FOT probability alternative are shallow, unconvincing, and in places erroneous. One can take this as an indication that opponents of FOT Probability simply do not have a strong position to argue from.
The history of the development of time-series analysis can be partitioned into the earlier empirically driven work focused on primarily methodology, which extended over a period of about 300 years and the later but overlapping mathematically driven work, in which the theory of stochastic processes surfaced, which ran its course of primary development in about 50 years. The mathematically driven development of stochastic processes has continued beyond that initial period, but has centered on primarily nonstationary processes, rather than primarily stationary processes. The development of time series analysis theory and methodology for cyclostationary and related stochastic processes and their non-stochastic time-series counterparts came along later during the latter half of the 20th century and extending to the present.