Table Of Contents

11. Contributions to Assorted Topics in Time-Series Analysis and Signal Processing

  • 11.1 Optimization of Statistical Performance of Spectral Correlation Analyzers and Spectrum Analyzers

    As an example that suggests that the unnecessary use of abstract stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process models can interfere with conceptualization and progress in developing methodology and even performance analysis, a straightforward theoretical development in non-stochastic probabilisticProbabilistic | noun Based on the theoretical concept of probability, e.g., a mathematical model of data comprised of probabilities of occurrence of specific data values. analysis of methods of statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. spectral analysis is reviewed here. The source of this non-stochastic characterization of the problem of designing quadratic signal processors for estimating ideal statistical spectral densities (and, more generally, cross-spectral densities and spectral correlation densities) is Section C of Chapter 5 and Section B of Chapter 15 of the 1987 book [Bk2].

    As shown in [Bk2], essentially all traditional and commonly used methods of direct statistical spectral analysis, which excludes indirect methods based on data modeling reviewed on Page 11.2, can be characterized as quadratic functionals of the observed data which, in the most general case of streaming data, slide along the time series of data. These functionals are, in turn, characterized by the weighting kernels of quadratic forms in both cases of discrete-time and continuous-time data. It is further shown that the spectral resolution, temporal resolution, and spectral leakage properties of all these individual spectrum estimators (collectively causing estimator bias) and the reliability (as measured by variance and coefficient of variation) of these estimators are characterized in terms of properties of these kernels. These kernels are explicitly specified by the particular window functions used by the specific spectrum estimators, including data tapering windows, autocorrelation tapering windows, time-smoothing and frequency-smoothing windows, and variations on these. With the use of the simple tabular collection of these kernels, Table 5-1 in [Bk2], all the direct spectrum estimators are easily qualitatively and quantitatively compared. The apparent confusion, demonstrated by many users guided by traditional stochastic process treatments of spectrum estimation, concerning the true role, interaction, and quantitative impact of operations such as time-smoothing, frequency smoothing, data tapering, autocorrelation tapering, direct Fourier transformation of data, indirect Fourier transformation of autocorrelation functions, time-hopped time averaging vs continuously sliding time averaging, and more, should be resolved by Table 5-1. With this general approach, the comparison of spectrum estimators in practice is rendered transparent.

    Given Table 5-2 and the results in the above-cited chapter sections of [Bk2], which quantitatively compare all the direct methods of spectrum analysis, one can set about optimizing the design of a spectrum analyzer so as to achieve a desired level of temporal resolution, spectral resolution, spectral leakage, and reliability within the limits of this entire class of estimators. One can easily see the tradeoffs among all these performance parameters and select what one considers the optimal tradeoff for the particular application at hand. This is done through the selection, from among existing catalogs of window functions, particular windows for the operations of time-smoothing, frequency smoothing, data tapering, and autocorrelation tapering. Analogous results to those in Tables 5-1 and 5-2 but for spectral correlation analysis are presented in Tables in Section B of chapter 15 of [Bk2].

    It should be mentioned before concluding this overview that there is another performance characteristic of spectrum estimators, and especially spectral correlation analyzers, which typically require the computation of many cross-spectra for a single record of data, and this is computational cost and data storage requirements. These performance parameters are not characterized in Table 5-1 mentioned above. This performance evaluation task is generally more challenging than that discussed above, especially given Table 5-1. Nevertheless, thorough analysis of the competing algorithms for spectral correlation analysis (also called cyclic spectrum analysis) has been reported in the literature on cyclic spectrum analysis dating back to the seminal paper by my colleagues R. Roberts, W. Brown, and H. Loomis in the special issue of the 1991 IEEE Signal Processing Magazine [JP36] on pp 38 to 49.

    It is surmised that one of the reasons the achievements in [Bk2] have apparently not appeared in subsequent literature on stochastic processes is that the key tradeoff between spectral resolution and spectral leakage is not a stochastic phenomenon. It is simply a characteristic of deterministic functions. This fact also may be responsible for the misleading claims about the superiority of the multi-taper method of direct spectrum estimation relative to the classical methods based on time-averaged and/or frequency-smoothed periodograms of possibly tapered data treated in [Bk2]. An in-depth discussion is provided in the forthcoming research paper, “The Multi-Taper Method of Spectrum Estimation: New Conclusions on Comparison with Periodogram Methods” (The source is in progress: being completed in January 2025, entitled “The Multi-Taper Method of Spectrum Estimation: Another Comparison with Periodogram Methods”.)

  • 11.2 FOT-Probability Theory of Parametric Spectrum Analysis

    The theory and methodology of parametric spectrum analysis developed over a long period of time with increased emphasis during later periods focused on observations of data containing multiple sine waves with similar frequencies, i.e., frequencies whose differences are comparable to or smaller than the reciprocal of the observation time, particularly when only one time record of data is available. After early initial work in time-series analysis on what was called the “problem of hidden periodicities” (see Page 4.1) using non-probabilistic models, a concerted effort based on the use of stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process models led to a substantial variety of methods particularly for high-resolution spectral analysis (resolving spectral peaks associated with additive sine waves with closely spaced frequencies) ensued. This effort is another example of methodology development based on unnecessarily abstract data models; that is, stochastic process models that mask the fact that ensembles of sample paths and abstract probability measures on these ensembles (sample spaces) are completely unnecessary in the formulation and solution of the problems addressed (cf. Page 3). 

    The first, and evidently still the only, comprehensive treatment of this methodology within the non-stochastic framework of Fraction-of-Time Probability Theory is presented in Chapter 9 of the book [Bk2]. The treatment provided covers the following topics:

    • Autoregressive Modeling Theory
      • Yule-Walker Equations
      • Levinson-Durbin Algorithm
      • Linear Prediction
      • Wold-Cramer Decomposition
      • Maximum-Entropy Model
      • Lattice Filters
      • Cholesky Factorization
    • Autoregressive Methods
      • Least-Squares Procedures
      • Model-Order Determination
      • Singular-Value Decomposition
      • Maximum-Likelihood
    • Autoregressive Moving Average Methods
      • Modified Yule-Walker Equations
      • Estimation of the AR parameters
      • Estimation of the MA parameters
    • Experimental Study
      • Periodogram Methods
      • Minimum-Leakage Method
      • Yule-Walker, Burg, and Forward-Backward Least-Squares AR Methods
      • Overdetermined-Normal-Equations AR Method
      • Singular-Value-Decomposition AR Method
      • Hybrid Method

    The extensive comparison of methods in the experimental study leads to the conclusion that, in general, for data having spectral densities including both smooth parts and impulsive parts (spectral lines), the best performing methods are hybrids of direct methods based on processed periodograms and indirect methods based on model fitting. A well designed hybrid method can take advantage of the complementary strengths of both direct and indirect methods.

  • 11.3 Bayesian Imaging of RF Sources
    • 11.3.1 Star Ranging by Interplanetary-Baseline Interferometry

      Since the WCM’s initial theoretical considerations of (1) cyclostationary signal modeling of star radiation and (2) interplanetary baseline interferometry between the mid-1980s and the mid-2010s, technology has caught up and a number of relevant reports are available in the literature. The associated theoretical developments and increased capabilities for topic (2) over the last 10 years are quite impressive and render the WCM’s initial preliminary work irrelevant. Consequently, the modest content of this section—written in the mid-2010s—has been removed. The ongoing work in the field on topic (1) is revealing the utility of cyclostationarity modeling of astrophysical times series measurements and observations.

    • 11.3.2 Bayesian Theoretical Basis for Source Location and Performance Quantification

      The purpose of this page is to describe a substantial advance in theory and methodology for high-performance location of RF emitters using multiple moving sensors based on statistically optimum aperture synthesis. This advance was developed at SSPI during the several years preceding 2010 as part of the work outlined on Page 12.1. I developed the Basel concept and mathematical formulation described below without SSPI support because of the lack of indirect cost budget for this work. The software implementation and testing with real data was carried out with full support by SSPI originating with government funded contracts.

      What’s New About Bayesian Emitter Location Technology?

      In this introductory discussion, key differences between the following alternative technologies are exposed:  (1) classical TDOA/FDOA-based and AOA-based estimation of unknown (non-random) emitter location using multiple sensors and calculation of corresponding confidence regions using stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. process models of the received data at the collection system’s sensors and averaging over the sample space of all possible received data; (2) the relatively novel Bayesian formulation of estimation of random Random | adjectiveUnpredictable, but not necessarily modeled in terms of probability and not necessarily stochastic. emitter location coordinates using multiple sensors and calculation of Bayesian confidence regions conditioned on observation of the actual received data; and (3) ad hoc methods for RF Imaging of spatially discrete emitters, using multiple sensors, derived from VLBI (Very Long Baseline Interferometry) concepts and methods developed for Radio Astronomy.

      A key concept in (2) and (3) is aperture synthesis and the production of RF images: the output image producedin which bright spots or corresponding peaks in amplitudes  are sought as the probable locations of sources—can be thought of as being somewhat analogous to an antenna pattern. The image is designed to exhibit peaks above the surface upon which the emitters reside (e.g., Earth) at whatever coordinates RF emitters are located, and the height of such peaks typically increase as the average transmitted power of the emitters increases, relative to background radiation and/or antenna and receiver noise. 

      Before proceeding, the topic to be addressed is put in perspective relative to more well-known methods of aperture synthesis.

      Synthetic Aperture Photography (SAP) is analogous to Synthetic Aperture RF Imaging, (SARFI) neither of which is as strongly analogous to Synthetic Aperture Radar (SAR). The former two are passive and endeavor to image single or spatial arrangements of radiating energy sources whereas the latter is active and transmits energy at reflecting structures and synthesizes an image of the structure from the received reflections.  To be more precise, SARFI normally images sources of RF energy, but can under certain circumstances be used to distinguish between sources and reflectors of sources and can thereby perform imaging without line of sight (given favorable characteristics of reflectors) and can mitigate multipath propagation degradation of images. In contrast, SAP typically images structures on the basis of the optical energy they reflect from the environment.

      Consequently, these three types of aperture synthesis are quite distinct.  In Medical Imaging, a CAT scan using computer aided tomography does perform a type of synthetic aperture imaging, but it is distinct from SAP, SARFI, and SAR. Somewhat more closely related is Radio Telescope Imaging (RTI), intended for imaging essentially contiguous spatially extended source of RF radiation, like stars. The assumption of a single RF point source to be located or a configuration of spatially discrete sources of RF energy to be jointly located justifies a unique mathematical model and statistical inference method.

      I formulated such a model and developed a method of aperture synthesis that appears to have capability advantages beyond that of earlier RF imaging techniques for locating RF emitters on Earth using non-geostationary satellite receivers.  I never received funding to compare performances of these two competing approaches, and I have no idea of what might have been done in this vein during the 13 years since I retired from that work. To my knowledge, any work in this area was likely classified by the US government on the basis of national defense.  My internal (independent) R&D on this approach, performed at my company SSPI in the first decade of the 21st Century, was not classified and is summarized in what follows here.

      In the Bayesian Aperture Synthesis for Emitter Location (BASEL)) system described herein, the Image is actually either the likelihood function or the posterior probability density function of emitter location coordinates, given the received data impinging on the array.  This image can be displayed as either grey-level or multicolor intensity overlaid on a map of the geographical area seen by the sensors, or it can be displayed as the height of a surface above the relevant region of Earth’s surface.  

      Significant reductions in computational complexity can be realized by producing images that are only proportional to these statistical Statistical | adjective Of or having to do with Statistics, which are summary descriptions computed from finite sets of empirical data; not necessarily related to probability. functions or even only monotonic nonlinearly warped versions of these. Contours of constant elevation for such images produce probability containment regions, and can be computed exactly, unlike the classical location-error containment regions (typically ellipses or ellipsoids) used with TDOA/FDOA-based location systems which are only approximations and, in some cases, quite crude approximations. 

      In addition to the Image derived from the received data, explicit formulas for an ideal Image can be obtained by substituting a multiple-signal-plus-noise model for the received data in the image formula and then replacing the finite-time-average autocorrelation functions of the signals and the noise that appear in the formula with their idealized versions, which produces a sort of antenna pattern—but not in the usual sense. Rather, this produces an idealized (independent of specific received data) image for whatever multi-signal spatial scenario is modeled. 

      One of the keys to transitioning from TDOA/FDOA and AOA based RF-emitter location, to RF Imaging, is replacing the unknown parameters, TDOA, FDOA, and AOA, in data models with their functional dependence on unknown emitter location coordinate variables on Earth’s surface and known sensor locations overhead. This process is called geo-registration. This way, when sensors move along known trajectories during data collection and TDOA, FDOA, and AOA functions vary in a known manner, but the unknown emitter-location coordinates do not change as long as the emitter is not moving, long coherent integration over periods during which the sensor positions change substantially is enabled. This enables long integration over periods during which the sensor positions change substantially. These sensor trajectories increase the size of the synthesized aperture beyond that of the distance between fixed sensors. 

      In many applications, prior probability densities for emitter locations used in the Bayesian formulation of statistical inference are unknown, in which case they can be specified as uniform over the entire region of practically feasible locations of interest.  This is not a weakness of this approach. It simply means that the location estimation optimization criterion is based on the likelihood function of candidate emitter coordinates over a specified region instead of the posterior probability density function of those coordinates.  Moreover, when prior information is available, this Bayesian approach incorporates that information in a statistically optimum manner through the prior probability densities. Furthermore, these priors play a critical role in location updating through Bayesian Learning for which posteriors from one period of collection become priors for the next period of collection. 

      Also, starting with the posterior probability formulation leads to a natural choice of alternative to the classical (and often low-quality) approximate error containment regions for performance quantification, and this choice is the regions contained within contours of constant elevation of the emitter location posterior probability density function above the surface defined by the set of all candidate emitter coordinates. 

      This alternative metric for quantifying location accuracy has the important advantage of not averaging over all possible received data (the population) in a stochastic process model, but rather conditioning the probability of emitter location on the actual data received. It also has the advantage of not requiring the classical approximation for containment regions which is known to be inaccurate in important types of applications where attainable location accuracy is not high.

      Besides this advantageous difference in location-accuracy quantification, the Bayesian approach produces optimal imaging algorithms that have aspects in common with now-classical Imaging systems for radio astronomy, referred to as VLBI, and also open the door to various methods for improving the capability of RF imaging for spatially discrete emitters. The improvements over ad hoc VLBI-like processing result from abandoning the interpretation in terms of imaging essentially continuous spatial distributions of radiation sources in favor of locating spatially discrete sources of radiation. In contrast to radio astronomy, where the sensors are on the surface of rotating Earth and the diffuse sources of radiating energy to be imaged are in outer space, in RF Imaging of manmade radio emitters, the sources are typically located on the surface of Earth and the sensors follow overhead trajectories.

      In conclusion, BASEL technology improves significantly on both (1) traditional radar emitter location methods based on sets of TDOA and/or FDOA and/or AOA measurements applied to communications emitters and (2) more recent methods of RF Imaging for communications emitters derived from classical VLBI technology for radio astronomy. The improvement comes in the form of not only higher sensitivity and higher spatial resolution resulting from increases in coherent signal processing gain, but also novel types of capability that address several types of location systems impairments.

      An especially productive change in the modeling of received data that is responsible for some of the unusual capability of BASEL technology, particularly coherent combining of statistics over multiple sensor pairs with unknown phase relationships, is the adoption of time-partitioned models for situations in which signals of interest are known in some subintervals of time and unknown in others. This occurs, for example, when receiver-training data is transmitted periodically, such as in the GSM cellular telephone standard.  For such signals, the posterior PDF calculator takes on two distinct forms, one of which applies during known-signal time intervals and the other of which applies during unknown-signal time intervals. When the emitter location is not changing over a contiguous set of such intervals of time, these two distinct posterior PDFs (images) can be combined into a single image, which enjoys benefits accruing from both models. Thus, BASEL can perform two types of time partitioning when appropriate: (1) partitioning the data into known-signal time-intervals and unknown-signal time intervals, and (2) partitioning time into subintervals that are short enough to ensure the narrowband approximation to the Doppler effect due to sensor motion is accurate in each reduced subinterval. It doesn’t matter which partitioning results in shorter time intervals: The data from all intervals is optimally combined.

      There is more than one way to beneficially combine RF images, depending on the benefits desired. For example, the presence of known portions of a signal in received data enables signal-selective geolocation. In addition, it also enables measurement of the phase offsets of the data from distinct sensor pairs. I have formulated a method for combining images that essentially equalizes these phases over multiple contiguous subintervals of time thereby enabling coherent combining of signal selective images over distinct pairs of moving sensors. This method combines CAFs (Cross Ambiguity Functions) from the data portions in which the signal is unknown with RCAFs (Radar CAFs) from the remaining data portions using a simple quadratic form. In this configuration, BASEL coherently combines images over both time and space, using narrowband RCAF and CAF measurements. 

      Another way the BASEL Processor provides innovative geolocation capability is through processing that mitigates multipath propagation and blocked line of sight. Highlights of this aspect of BASEL are listed below

      • Traditional TDOA/FDOA-model fitting is theoretically limited by 1) unresolved multipath peaks and distinct false peaks in the CAFs (Cross Ambiguity Functions), and 2) missing peaks in the CAFs due to NLOS—no line of sight to the emitter. 
      • BASEL’s innovative received-data-model fitting theoretically overcomes these limitations, but testing has been limited to date (2010).
      • The maximum-likelihood (ML) data-model fitter combines geo-registered CAFs in a manner that enables not only super-resolution of emitter and reflector locations, but also emitter-location with NLOS. The algorithm within the BASEL Processor for implementing this ML geolocator is called GRCM (Geo-Registered Channel Matcher)
      • Jointly for all channel pairs corresponding to all unique sensor pairs, GRCM matches the two channels in each pair with respect to a common set of reflectors locations and complex attenuations and a single emitter location.
      • Channel matching means that data from sensor 1 is passed through a multipath channel model intended to match the actual channel the data from sensor 2 passed through, and vice versa (interchange 1 and 2). 
      • Channel matching is an approach to channel identification that is known to have a fundamental flaw that in many applications cannot be overcome:  the least-squares channel models are non-unique—they are concatenated with an ambiguous phantom channel that is unknown and non-unique.
      • This fundamental flaw is removed in the application to multipath mitigation when the multipath can be modeled in terms of a finite set of discrete reflectors. Then, instead of performing channel matching for each sensor pair independently, all channel-pair matchings are performed jointly with respect to a common set of reflector positions and complex attenuations. (This does not require that all sensors see all reflectors).
      • The structure of the multipath models expressed in terms of reflector locations, together with the constraint that all sensors see (some of) the reflectors from a single set, replaces the fundamental flaw of traditional channel matching with a far less problematic challenge of determining the best model order, which is the total number of reflectors in the model.
      • As with conventional super-resolution (e.g., for multiple closely spaced spectral lines), a model order that is too small will not achieve optimum resolution, and a model order that is too high will produce false features (e.g., spectral lines or, here, reflector locations).
      • In the geolocation application, the ramifications of non-optimal model order are of less concern because, typically, it is only the emitter that needs to be accurately located. False reflector locations are not nearly as problematic because the algorithm can discriminate between reflector locations and emitter locations.
      • The ML CAF combiner GRCM not only offers revolutionary multipath mitigation capability, but also offers more degrees of freedom for estimating nuisance parameters, such as the parameters in an ionospheric model. Theoretically, this enables GRCM to estimate a larger number of nuisance parameters for a specified number of sensors than the number for traditional TDOA/FDOA model fitting.
      • The degrees of freedom are theoretically even higher when multipath is present, due to the virtual sensor properties of reflectors when GRCM is used.
      • The high sensitivity of the BASEL Imaging system is theoretically made even higher when multipath is present due to the matched-filter type of processing gain GRCM produces.
      • The GRCM CAF combining technique not only accommodates known sensor motion, but also lends itself to the accommodation of unknown target motion. 
      • Target tracking can be implemented using an iterative locally linearized implementation of the GRCM estimator.
      • The effectiveness of emitter tracking can theoretically be higher with multipath, due to the virtual sensor properties of reflectors when GRCM is used.

       

      SOME EXAMPLES OF BASEL CAPABILITY

      The following material provides a few details about BASEL RF Imaging.

      Click on the window to see all pages
  • 11.4 A Radically Different Method of Moments

    This page presents a way to use the Bayesian Minimum-Risk Inference methodology subject to structural constraints on the functionals of available time-series data to be used for making inferences.

    Because the approach of minimizing risk subject to such a constraint is not tractable and, in fact, is even less tractable than unconstrained minimum-risk inference, an alternative suboptimum method is developed. This method produces minimum-risk (i.e., minimum-mean-squared-error) structurally constrained estimates of the required posterior probabilities or PDFs, and then uses these estimates as if they were exact in the standard Bayesian methodology for hypothesis testing and parameter estimation. Since all the computational complexity in the Bayesian methodology is contained in the computation of the posterior probabilities or PDFs, this approach to constraining the complexity of computation is appropriate and it is tractable. It requires only inversion of linear operators, regardless of the nonlinearities allowed by the structural constraint. The dimension of the linear operators does however increase as the polynomial order of the allowed nonlinearities is increased.

    A full presentation of this different method of moments can be viewed here. An in-depth study, in the form of an unpublished PhD dissertation, on inference from scalar-valued time series based on the Constrained Bayesian Methodology invented by the WCM, upon which this method of moments is based, is available here.

    A tutorial podcast is available here.

    This method is an alternative to both the classical Method of Moments and the Generalized Method of Moments. The well-known four primary moment-based probability density estimation and associated moment-based parameter estimation methods (Pearson, Provost, Lindsay) are briefly described as background for introducing the new method. This method is radically different in approach yet provides a solution that requires essentially the same information as the existing methods: (1) Model moments with known dependence on unknown parameters and (2) associated sample moments. However, the new method, unlike the classical method of moments and its generalized counterparts, requires only the solution of simultaneous linear equations. A theoretical comparison between the new and old methods is made, and reference is made to the Author’s earlier work on analytical comparisons with Bayesian parameter estimation and decision for time series data arising in digital communication systems receivers.

    Considering the depth of technical detail in the above-linked source, the WCM has chosen to also give users access to a podcast which provides an excellent overview in the form of an easy-to-listen-to chat between two conversationalists. This podcast was produced by AI using Google’s experimental NotebookLM. The WCM confirms that the technical content is accurate and does an admirable job of getting across the main points of the research paper.

     

    Editorial Information on this Article

     

    The following paraphrased comments from a journal’s reviewers of the article linked to above indicate important future work needed to clarify the place of this method among other methods thereby better exposing its relative advantages and disadvantages. This work is not being planned for by the Author; so, he is inviting others to consider taking on this work, which could qualify this new method for publication in a reputable journal. The appearance of this paper in a website and an eBook puts it in the public domain but does not disqualify it from publication by others in a research journal.  Attribution would be appreciated.

    Comments From Reviewer 1

    This paper compares a methodology previously introduced by the author and compares it with other types of method of moments estimators. It is my understanding that this approach is a Bayesian approach to method-of moments-estimation. Below, I provide some references that might be useful for the author. All of them are posterior to the author’s work (1970s) but are widely used today. I think putting the “radically different” method of moments in context would make it an interesting review that would be accessible to engineers and statisticians/economists alike.

    Diggle & Gratton (1984) proposed what is today known as Approximate Bayesian Computation (ABC). Instead of using a likelihood, sample moments are used to perform Bayesian inference, conditional on the moments rather than the full sample of observations. In many settings, the expectation of the moments is not tractable (the integral on p14 of this paper) so that Monte Carlo simulations are used. See Marin et al. (2012) for a review. A related, but less computationally intensive, approach is called Synthetic likelihood, proposed by Wood (2010). See also Gallant & McCulloch (2009). This is related to quasi-Bayesian inference, see below.

    Another approach is to replace the likelihood with its empirical counterpart: the empirical likelihood which is computed under model constraints. In a Bayesian framework, this yields the Bayesian Empirical likelihood estimator of Lazar (2003).

    Chernozhukov & Hong (2003) consider quasi-Bayesian inference which builds on the generalized method of moments. Related to the review in the paper, it is possible to use the characteristic function in a generalized method of moments setup, economists usually refer to Bierens & Ploberger (1997) and other papers by the authors. There are earlier references in statistics.

    References

    Bierens, H. J. & Ploberger, W. (1997). Asymptotic theory of integrated conditional moment tests. Econometrica: Journal of the Econometric Society, (pp. 1129–1151).

    Chernozhukov, V. & Hong, H. (2003). An mcmc approach to classical estimation. Journal of econometrics, 115(2), 293–346.

    Diggle, P. J. & Gratton, R. J. (1984). Monte carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 46(2), 193–212.

    Gallant, A. R. & McCulloch, R. E. (2009). On the determination of general scientific models with application to asset pricing. Journal of the American Statistical Association, 104(485), 117–131.

    Lazar, N. A. (2003). Bayesian empirical likelihood. Biometrika, 90(2), 319–326.

    Marin, J.-M., Pudlo, P., Robert, C. P., & Ryder, R. J. (2012). Approximate bayesian computational methods. Statistics and computing, 22(6), 1167–1180.

    Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), 1102–1104.

     

    Comments From Reviewer 2

    Summary

    The paper introduces a quasi-Bayesian approach applicable when the likelihood function is not explicitly available, yet certain moments can be obtained as an explicit function of unknown parameters. Consequently, the paper proposes a method for conducting Bayesian estimation in contexts where the GMM or the method of moments (MoM) would typically be employed. Unlike previous pseudo-Bayesian estimations that approximate the likelihood function to derive the pseudo-posterior distribution, this work proposes a technique for directly approximating the posterior distribution itself.

    While the paper introduces some new concepts, a response to the following comments could better clarify the place of this method among other methods.

    Comments

    1. The concept of Bayesian estimation based on moment conditions, as presented here, is not entirely novel. There have been several papers proposing Bayesian estimation methods in contexts analogous to those discussed in this manuscript. Examples include works by Kim (2003, Journal of Econometrics), Lazer (2003, Biometrika), Schennach (2005, Biometrika), and Chib, Shin, and Shimoni (2018, Journal of the American Statistical Association). A comparison of the proposed method with these existing methods could better delineate its contribution.
    2. The theoretical foundation of the paper is not strong enough. While Table 1 compares the proposed method with the MoM, the comparison does not convincingly establish the superiority of the proposed approach. Employing approximate posterior distributions does not inherently ensure the validity of the inferences. Previous studies on pseudo-Bayesian estimation have rigorously examined the theoretical foundation of utilizing pseudo-likelihood for Bayesian estimation, from both Bayesian and frequentist perspectives. It would be beneficial for the authors to engage more deeply with this body of work to strengthen the theoretical basis of the method.
    3. This comment pertains to the above comment regarding Table 1, where the author suggests that the proposed estimator converges to the Bayesian minimum-risk estimator as the order of the polynomial increases. If this assertion holds, then it would represent a theoretical contribution of the paper, although the existence of all moments is required. Nonetheless, I was unable to find any results that substantiate this claim.
  • 11.5 Cyclic Point Processes, Marked and Filtered

    The work cited on page 11.4 ([JP2], [JP4], and [JP9]) provides the first derivations of linearly constrained Bayesian receivers for synchronous M-ary digital communications signals. These derivations reveal insightful receiver characterizations in terms of linearly constrained MMSE estimates of posterior probabilities of the transmitted digital symbols, which provides a basis for formulating data-adaptive versions of these receivers.

    Further work here reveals that linearly constrained Bayesian receivers for optical M-ary digital communications signals are closely related to the above receivers for additive-noise channels such as those used in radio-frequency transmission systems and cable transmission systems.  That is, these linear receivers for the optical signals can be insightfully characterized in terms of linearly constrained MMSE estimates of posterior probabilities of the optically transmitted digital symbols over fiber optic channels. 

    The signal models for both these classes of digital communications signals are cyclostationary and lead to a bank of matched filters for the M distinct pulse types followed by symbol-rate subsamplers and a matrix of Fractionally Spaced Equalizers which in fact perform more than simply equalization. They are more akin to discrete-time multi-variate Wiener filters. 

    This similarity in receiver structures can be interpreted as a direct result of an equivalent linear model for the Marked and Filtered Doubly Stochastic Stochastic | adjective Involving random variables, as in stochastic process which is a time-indexed sequence of random variables. Poisson Point Processes used to model the optical signals. This equivalent model is derived in the above linked paper [JP5]. The signal models for both the RF and optical signals used in this work are cyclostationary.

  • 11.6 Exploiting Spectral Redundancy of Frequency Modulated Signals

    The initial concept discussed here is to use the approximation of narrowband FM by DSB-AM plus quadrature carrier, and then exploit the 100% spectral redundancy of DSB-AM, due to its cyclostationarity, to suppress co-channel interference. This suppression is achieved with the technique of FRESH filtering (see page 2.5.1).

    The simplest version of the problem addressed by this approach is that for which interference exists only on one side of the carrier. Nevertheless, it is possible to correct for interference on both sides of the carrier, provided that the set of corrupted sub-bands on one side of the carrier has a frequency-support set with a mirror image about the center frequency that does not intersect the set of corrupted sub-bands on the other side of the center frequency.

    For WBFM signals, in order to meet the conditions under which FM is approximately equal to DSB-AM plus a quadrature carrier, we must first pass the signal through a frequency divider which divides the instantaneous frequency of the FM signal by some sufficiently large integer. This approach is explained here, and the challenges presented by the impact of interference on the behavior of the frequency divider are identified and discussed. This leads to the identification of a threshold phenomenon for FM signals in additive interference that is similar to but distinct from the well-know threshold phenomenon for demodulation of FM signals in additive noise. 

    The concept here are preliminary and attainable performance in expected to be limited by threshold phenomenon for WBFM.

  • 11.7 A historical Perspective on Nonlinear Systems Identification

    Content in preparation by Professor Davide Mattera to be posted to this website in the future.

  • 11.8 On Cycloergodicity

    Cycloergodicity is the word we use, by analogy with the meaning of the word Ergodicity, for the property of a stochastic process that guarantees the convergence of sinusoidally-weighted time averaged measurements on a stochastic process, as averaging time increases without bound, to sinusoidally-weighted time averages of time-varying expected values of those same measurements. Ergodicity applies to stationary stochastic processes and, more generally to Asymptotically Mean Stationary (perhaps not the best descriptor) stochastic processes, which are nonstationary processes for which the time averages of time-varying expected values converge to a limit value, the temporal-mean value of the stochastic-mean. Cycloergodicity applies to cyclostationary processes and, more generally, almost cyclostationary processes, for which the expected values are periodically or almost periodically time varying. More generally (by analogy with asymptotically mean stationary processes), cycloergodicity can apply to generally time-varying expected values, in which case cycloergodicity again refers to convergence to sinusoidally-weighted times averages of the generally time-varying expected values.

    As of today, 9 January 2025, there is no complete generalization of Birkhoff’s Ergodicity Theorem from stationary to almost cyclostationary processes (which include cyclostationary processes as a special case). Birkhoff’s theorem provides a necessary and sufficient condition on the probability measure of a discrete-time stationary process for that process to be ergodic. His theorem extends to asymptotically mean stationary processes. Almost cyclostationary processes are, in fact, asymptotically mean stationary processes [JP11], but this does not provide the theorem we desire, as specified above which involves sinusoidally-weighted time averages.  The first problem encountered in seeking a cycloergodicity theorem that generalizes Birkhoff’s ergodicity theorem is that it appears as though we would need to consider translations of event sets in the sample space by integer multiples of the period of cyclostationarity of interest and, if that period is not an integer multiple of the time sampling increment of the discrete-time process, such translations are undefined.  We can circumvent this challenge by considering continuous-time processes. The remainder of this Section 11.8 briefly discusses my work on the challenge of seeking a counterpart to Birkhoff’s ergodicity theorem that generalizes it from stationary to almost cyclostationary processes. This work began in the late 1970s and spans nearly 50 years but was sporadic and involved no more than about 5 years of time during which actual work was occasionally carried out. (It is noted for the readers’ benefit that because Birkhoff’s Theorem is not ergodic, I avoid in this discussion use of the popular term “ergodic theorem” in favor of “ergodicity theorem”.

    The results from the first period of work are reported in the 1983 journal paper [JP11] and concluded that, despite concrete progress, the desired generalization of Birkhoff’s theorem had not yet been found; the second period of work led to a proposal for an approach to actually reaching such a generalization and is reported in the 2023 paper [JP67] and was further refined in 2025 as reported below.

    In 1994, I returned to the subject for the purpose of summarizing notes I had made off and on over the preceding ten years, and I wrote a compendium of illustrative examples on why cycloergodicity is a problem of considerable practical interest. This revealing discussion exposes quite an array of undesirable properties of stochastic processes that are not cycloergodic.  This provided what I considered substantial motivation to those with interest in using cyclostationary or almost cyclostationary stochastic process models to pursue this unsolved problem. But to my knowledge, no one took the bait.

    My perspective on this, upon returning to this topic thirty years later in 2023, is that this is inherently a difficult concept to mathematize and the formal proposition I ultimately arrived at upon revisiting the topic 2 years later in 2025, while being as elegant as Birkhoff’s Ergodicity Theorem, may not lead to mathematical results that are of practical value. But it is also my perspective that this is the same as the status of Birkhoff’s Ergodicity Theorem. The discussion below is as non-technical as I could make it.

    2023 Results:

    How to Generalize Birkhoff’s Theorem of Ergodicity for Continuous-Time Almost Cyclostationary Kolmogorov Stochastic Processes.

    The content of this section does not contribute to the primary objective of this website, but it does follow easily from the concepts introduced in the highly relevant article [JP67] and it does provide what appears to be a genuine generalization of ergodicity theory of stationary and cyclostationary processes to almost cyclostationary Kolmogorov stochastic processes. Strong Cycloergodicity theory of Kolmogorov stochastic Processes, which extends and generalizes existing ergodicity theory, is developed in Boyles and Gardner in 1983 [JP11] where it is shown that sinusoidal and periodic components of time-varying probabilistic parameters can be consistently estimated w.p.1 from time averages on one sample path. It is also established that a strong theory of cycloergodicity inclusive enough to cover all applications of practical interest had, at that time, not yet be shown to exist. Moreover, it is shown in [JP11] that such a theory cannot presuppose the existence of a dominating stationary measure, as does the theory presented therein. Nevertheless, it would appear that it can be argued that because a continuous-time cyclostationary process can be characterized as a discrete-time vector-valued (or function-valued) stationary process, Birkhoff’s Ergodicity Theorem for scalar-valued discrete-time stationary processes, if generalized to vector-valued processes, leads to a completely analogous cycloergodicity theorem for continuous-time cyclostationary processes. The vector (or function), at any discrete time equal to an integer multiple of the period of cyclostationarity, consists of the infinite set of process values over the period between that discrete time and the previous discrete time.

    Furthermore, it is shown by Gray in Chapter 7 and references therein that Birkhoff’s ergodicity theorem has been extended from stationary to asymptotically mean-stationary (AMS) discrete-time processes. This extension guarantees the existence of consistent time-average estimators for the discrete-time averages of time-varying probabilistic parameters, such as probability density functions. Because almost-cyclostationary (ACS) discrete-time processes are AMS [JP11], this extended theorem applies to discrete-time ACS processes (and the same might well be true for continuous-time ACS processes after discrete-time sampling) but it does not apply directly to estimation of the sinusoidal and periodic components of almost-periodically time-varying probabilistic parameters.

    Nevertheless, Gray does discuss ergodicity of 𝑁−stationary discrete-time processes, which are 𝑁-dimensional vector-valued representations for discrete-time cyclostationary processes with period 𝑁. Furthermore, the discrete-time infinite-dimensional vector-valued process described above that represents a continuous-time scalar-valued process is AMS if that continuous-time process is ACS (which includes, as special cases cyclostationary and stationary processes).

    Consequently, for any selected period of a continuous-time ACS process, one can form a discrete time vector-valued AMS process as explained above. Then the time average of a probabilistic parameter of this vector-valued process will equal the periodic component of the corresponding probabilistic parameter of the original ACS process. In this way any periodic component for any real-valued period 𝑇 of the almost periodically time-varying probabilistic parameters of the original scalar-valued continuous-time ACS process can be guaranteed to be consistently estimable by applying the proposed ergodicity theorem to the infinite-dimensional vector-valued discrete-time AMS process.

    It follows that the discrete-time AMS version of the Birkhoff ergodicity theorem can be extended/generalized to accommodate cycloergodicity for continuous-time ACS processes by requiring that the ergodicity condition for discrete-time AMS processes be satisfied by the vector-valued representation for each and every period 𝑇 of the continuous-time process. In addition, there appears to be a partially cycloergodicity version of this proposed theorem that requires the ergodicity condition for some but not all periods be satisfied.

    This leaves one class of ACS processes for which a cycloergodicity theorem remains to be proposed, and this is the class of discrete time processes having measures that possess non-zero sinusoidal components with sine-wave frequencies that are incommensurate with the time-sampling rate. Some such processes do indeed allow for consistent estimation of such sinusoidal components [JP11], but others do not. A necessary and sufficient condition for consistent estimation has apparently not yet been proposed.

    2025 Results: Emboldened by the above reasoning, I have conjectured a Cycloergodicity Theorem for almost cyclostationary continuous-time processes, which I paraphrase here. The trouble with it is the same as the trouble with Birkhoff’s Ergodicity Theorem. It will almost never be useful in practice because the knowledge about the probability measure required for testing the necessary and sufficient condition for cycloergodicity will rarely be available.

    Proposition on Cycloergodicity:  For any continuous-time Almost Cyclostationary (ACS) process, Cycloergodicity can (in principle) be tested for one period at a time. The Cumulative Distribution Functions (CDFs) for any ACS process can be characterized in terms of an algebraic sum of periodic CDFs and a constant CDF (as I showed in 1987 [Bk2, page 512]). I conjecture that the same is true for the much more abstract probability measure that appears in the definition of a stochastic process and involves the periodic components of the almost periodically time varying probability measure. For an ACS process to be Cycloergodic with period T, every event set for the process that is translation invariant for translations that are integer multiples of T, the periodic-component of the time-varying measure of the process, evaluated at the event, must equal 0 or 1.

    I have not seen this proposition or its equivalent elsewhere, and I have not tried to develop a mathematical proof. If this proposition is validated sometime in the future, it can then be said “that’s nice; now we know”. But it may still be a challenge to apply this result in practice to specific stochastic process models.  This is one of the many reasons I have given for preferring the non-orthodox Non-Population Probability theory of cyclostationarity to the orthodox stochastic process theory.

    For discrete time, I have no proposition on cycloergodicity for almost cyclostationary processes. Every ACS discrete-time process has periods that are incommensurate with the sampling increment. Therefore, one cannot directly construct the periodic components of its CDFs or its probability measure. But they can be indirectly constructed in principle by calculating all the sine-wave components of the CDFs or the measure, and then partitioning the cycle frequencies into harmonic sets and then summing the component measures over each such set to obtain each of the periodic components. But what we cannot do is test for translation-invariant event sets of the process for translations that are not integer multiples of the sampling increment because we can perform translations by only such integer multiples.