# Statistical Analysis from the Fourier Integral Theorem

## Preview text

arXiv:submit/3789648 [stat.ME] 11 Jun 2021

Statistical Analysis from the Fourier Integral Theorem
Nhat Ho Stephen G. Walker ,
Department of Statistics and Data Sciences, University of Texas at Austin , Department of Mathematics, University of Texas at Austin June 11, 2021
Abstract
Taking the Fourier integral theorem as our starting point, in this paper we focus on natural Monte Carlo and fully nonparametric estimators of multivariate distributions and conditional distribution functions. We do this without the need for any estimated covariance matrix or dependence structure between variables. These aspects arise immediately from the integral theorem. Being able to model multivariate data sets using conditional distribution functions we can study a number of problems, such as prediction for Markov processes, estimation of mixing distribution functions which depend on covariates, and general multivariate data. Estimators are explicit Monte Carlo based and require no recursive or iterative algorithms.
Keywords: Conditional distribution function; Kernel smoothing; Mixing distribution; Nonparametric estimator.
1 Introduction
Estimation of a multivariate distribution or conditional distribution function necessarily requires the construction of a dependence structure between variables. This is usually applied to a multivariate density function and then integrated to obtain the corresponding distribution function; see, for example, Jin and Shao . For example, for a bivariate set of observations, marginal distributions are easy to estimate and can be achieved either parametrically or nonparametrically; the latter including empirical distributions or smoothing estimators. On the other hand, constructing a dependent model is more problematic, particulary from a nonparametric perspective. See, for example, Panaretos and Konis , Wand , Wand and Jones , Staniswalis et al. , and Chacon and Duong .
The most common approach is to use smoothing methods based on kernels, such as the Gaussian kernel. However, this would need the estimation of a covariance matrix. Even if the interest focuses on a conditional distribution function with multiple conditioning variables, the same problem arises as with the multivariate kernel methods. As a consequence, a number of authors consider only a single covariate, such as Hall et al.  and Veraverbeke et al. .
The estimation of a dependence structure can be achieved using a nonparametric estimation of a copula function. See, for example, Chen and Huang  and Geenens et al. . However, this is far from trivial and is by no means a popular or common approach. Even with copulas, kernel methods are employed and now with the additional burden of dealing with boundary values.
We focus on distribution functions with multiple conditional variables, and we manage this fully nonparametrically without the need for the construction of a full covariance bandwidth
1

matrix. Hence, the aim in this paper is to show how it is possible to work with multivariate conditional distribution functions and obtain nonparametric estimators while avoiding the need to estimate any dependence structure. We also consider a variety of other nonparametric estimation of multivariate functions which are connected to the distribution functions, such as quantile functions.
The starting point is the Fourier Integral Theorem. Brieﬂy here, for a suitable function m on Rd, the theorem is given by

1

d

m(y) = Rl→im∞ πd Rd [0,R]d j=1 cos(sj(yj − xj)) m(x) ds dx, (1)

for any y ∈ Rd where each of the two integrals listed are d–fold. In Ho and Walker , the authors consider multivariate density and multivariate regression estimation using equation (1). Natural nonparametric Monte Carlo estimators are obtained. One drawback, despite excellent convergence properties, is that density function estimators are not guaranteed to be densities. However, it is easy to use equation (1) to obtain natural Monte Carlo estimators of distribution functions and these can easily be adapted, using for example, isotonic regression to be proper distribution functions. These are easily sampled, with an unlimited amount of samples available, which can then be used to undertake statistical inference via a “generative model” approach.
The important feature of equation (1) which forms the basis of the paper is that whatever dependence structure exists within m(x), it is transferred to m(y) via the integration and the use of independent terms of the type

d sin(R(yj − xj)) ,
j=1 yj − xj

for some suitable choice of R > 0. This is quite remarkable and as we shall see allows us to obtain nonparametric Monte Carlo estimators of, for example, distribution functions without requiring the construction of any dependence structure.
A number of authors have used the Fourier kernel for density estimation, see Parzen  and Davis . However, there has not to our knowledge been any attempt to use the kernel for multivariate density estimation. The reason could well be the apparent inability to model a covariance matrix within the kernel. On the other hand, from the foundations of the sin kernel arising from the Fourier integral theorem, it is clear that a covariance matrix is not required.
We will be focusing on distribution functions; hence adapting equation (1) to this scenario, and for ease of introduction, we ﬁrst restrict the setting to one dimension. Further, we exchange the limit with R to some ﬁxed value, to get

FR(y) = 12 + π1 Si(R(y − x)) f (x) dx, (2)

where Si(z) = z sin(x)/x dx. Equation (2) comes from equation (1) using an integration by

0

parts along with the fact that 0 (sin x)/x dx = π/2. Here f is a density function on R, FR(y)

the approximated distribution function, with FR(y) → F (y) as R → ∞, and

1n

FR(y)

=

1 2

+

Si(R(y − Yi))

(3)

i=1

2

is how we would estimate (2) from a sample (Y1, . . . , Yn). There are some practical aspects to using equation (3) to estimate the distribution function, the main one being isotonic regression, if required, to ensure one obtains a proper distribution function. We will also be using equation (3) to generate samples when implementing a generative model.
The important contribution of the paper is that we can estimate conditional distributions with more than one conditioning variable. If using alternative approaches, such as Gaussian kernels, there is a need to install a covariance matrix. We do not need to do this so we are estimating conditional distributions fully nonparametrically. Speciﬁc cases we consider include estimating missing outcomes in a Markov process, requiring two conditional variables, and also estimating a conditional mixing distribution.
The layout of the paper is as follows. In Section 2 we set down the theory for the estimators from the Fourier integral theorem used throughout the paper. We also hint at a class of function for which an integral theorem will hold, including the possibility of a Haar wavelet integral theorem. In Section 3 we illustrate with conditional distribution functions; in Section 4 with conditional quantile functions and Markov constructed conditional distributions. Section 5 considers conditional mixing distribution functions. Finally, Section 6 contains a brief discussions and the Appendix contains the proof to the main results.

2 Properties of the Fourier Integral Theorem

Before going into the details of applications of the Fourier integral theorem to estimate distribution functions and other related problems, we reconsider the approximation property of the Fourier integral theorem. In the previous work, Ho and Walker  utilize the tails of the Fourier transform of the function m(·) to characterize the approximation error of the Fourier integral theorem when truncating one of the integrals. However, the proof technique in that work is inherently based on the nice property of sin function in the Fourier integral theorem and is non-trivial to extend to other choices of useful cyclic functions; an example of such a function is in a remark after Theorem 1.
In this work, we provide insight into the approximation error between F (y) and FR(y) via the Riemann sum approximation theorem. This insight can be generalized into any cyclic function which integrates to 0 over the cyclic interval, thereby enriching the family of integral theorems beyond the Fourier integral theorem. Such extensions would provide for example a Haar wavelet integral theorem.
To simplify the presentation, we deﬁne

1

d

mR(y) := πd Rd [0,R]d j=1 cos(sj(yj − xj)) m(x) ds dx

1

d sin(R(yj − xj))

= πd d

m(x) dx. (yj − xj)

(4)

R j=1

We start with the following deﬁnition of the class of univariate functions that we use throughout our study.

Deﬁnition 1. The univariate function f (·) is said to belong to the class T K(R) if for any y ∈ R, the function g(x) = (f (x) − f (y))/(x − y) satisﬁes the following conditions:
3

0.4

1.0

0.8

0.3

0.6

0.2

Density

distribution

0.4

0.1

0.2

0.0

0.0

-3

-2

-1

0

1

2

3

y

-3

-2

-1

0

1

2

3

z

(a)

(b)

Figure 1. Simulation with the cumulative distribution function via the Fourier integral theorem in equations (5) and (6). (a) Estimated FR (bold line) and true distribution (dashed line); (b) Histogram of 10000 samples taken from FR.

1. The function g is diﬀerentiable, uniformly continuous up to the K-th order, and the limits lim|x|→+∞ |g(k)(x)| = 0 for any 0 ≤ k ≤ K where g(k)(.) denotes the k-th order derivative of g;
2. The integrals |g(k)(x)|dx are ﬁnite for all 0 ≤ k ≤ K.
R

Note that, for the function g in Deﬁnition 1, for any y ∈ R when x = y, we choose g(y) = f (1)(y). An example of function that satisﬁes Deﬁnition 1 is Beta density function. Based on
Deﬁnition 1, we now state the following result.

Theorem 1. Assume that the univariate functions mj ∈ T Kj (R) for any 1 ≤ j ≤ d where

K1, . . . , Kd are given positive integer numbers. Then, if we have m(x) =

d j=1

mj

(xj

)

or

m(x) =

d j=1

mj

(xj

)

for

any

x

=

(x1, . . . , xd),

there

exist

universal

constant

C

and

de-

pending on d such that as long as R ≥ C we obtain

|mR(y) − m(y)| ≤ C¯/RK ,

where K = min1≤j≤d{Kj}.

The proof is presented in the Appendix. To appreciate the proof we demonstrate the key idea in the one dimensional case. Here

1 +∞ sin(R(y − x))

mR(y) − m(y) = π −∞

y − x (m(x) − m(y)) dx

which we write as

1 +∞ mR(y) − m(y) = π −∞ sin(R(x − y)) g(x) dx,

4

1.0

0.8

0.6

distribution

0.4

0.2

0.0

-3

-2

-1

0

1

2

3

y2

Figure 2. Simulation with the conditional distribution function via the Fourier integral theorem. In this ﬁgure, we plot the estimated conditional distribution of y2 given y1 = y2 = 0 (bold line) and the true distribution (dashed line).

where g(x) = (m(x) − m(y))/(x − y). Without loss of generality set y = 0 to get

1 +∞ mR(y) − m(y) = π −∞ sin(z) g(z ) dz,

where = 1/R. Now due to the cyclic behaviour of the sin function we can write this as

1 mR(y) − m(y) = π

+∞

sin(t)

g( (t + 2πk)) dt.

0

k=−∞

The term

+∞ k=−∞

g( (t + 2πk)) is a Riemann sum approximation to an integral which con-

verges to a constant, for all t, as → 0. The overall convergence to 0 is then a consequence of

02π sin t dt = 0. Hence, it is how the Riemann sum converges to a constant which determines

the speed at which mR(y) − m(y) → 0.

Remark. It is now interesting to note that the sin function here could be replaced by any cyclic function which integrates to 0 over the cyclic interval. For example,

 π2 (x − 2mπ), φ(x) =
 π2 [(2m + 1)π − x] ,

(2m − 1/2)π < x < (2m + 1/2)π (2m + 1/2)π < x < (2m + 3/2)π.

This yields a wavelet integral theorem based on the Haar wavelet. There are potentially many other such functions such as φ(x) and sin x, and this line of research will be studied in the future.

5

y -4 -2 0 2 4

1

2

3

4

5

y -4 -2 0 2 4

1

2

3

4

5

Figure 3. Generated samples (top) from estimated conditional distribution and 100 data samples (bottom) from the multivariate normal distribution with a Markov structure in Example 3.

3 Conditional distribution function

The distribution estimator for one dimensional independent and identically distributed (i.i.d.) data is given in equation (3). In order to ensure that FR(y) lies between 0 and 1 we adapt to

FR(y) = min 1, max 0, FR(y) .

(5)

Hence, from now on, whenever we write FR(y) it is this adapted estimator we refer to. One of the key ideas we will be working on involves sampling from the distribution FR. We
can do this via the inverse distribution function approach; i.e., for a random uniform variable u from (0, 1) we take the sample as

arg inf FR(y) = u.

(6)

y

Example 1. In the ﬁrst toy example we take n = 100 and take the data Y1:n as independent standard normal random variables. We plot FR(y) in Figure 1(a) with R = 5. We then sample 10000 variables from FR and these are represented as a histogram in Figure 1(b). As we can see from these ﬁgures, both the estimated cumulative distribution and the histogram respectively yield good estimation of the true cumulative distribution and the density function of the standard normal random variable. Example 2. In the next example we consider a conditional distribution function with two conditional variables. We take n = 500 independent observations from a three sequence of Markov variables;

y1 = N(0, 1), [y2 | y1] = N(ρy1, 1 − ρ2), [y3 | y2] = N(ρy2, 1 − ρ2)

and take ρ = 0.6. We estimate the conditional distribution of y2 given y1 = y3 = 0 which in

general is

c(y1 + y3) 1 − c2 f (y2 | y1, y3) = N 1 + c2 , 1 + c2 .

6

2

1

0

y_2

-1

-2

-2

-1

0

1

y_3

Figure

4:

Plot

of

generated

samples

(

y

∗ 2

,

y3∗

)

in

Example

3.

The estimated distribution is a straightforward extension of the one dimensional case, which is given by:

FR1,R2 (y2 | y1, y3) = 12 + π1

n i=1

Si(R1(y2

y2i))

KR2 (y1

y1i)KR2 (y3

y3i)

n KR (y1 − y1i)KR (y3 − y3i)

.

i=1 2

2

where we are now writing KR(z) := sin(Rz)/z. We also allow for the R to be diﬀerent depending on its placement within the estimator; i.e., with the dependent or conditional variables.
The estimated distribution along with the true distribution are shown in Figure 2. For this we took R1 = 10 and R2 = 6. As usual we implemented adaption (5). For graphical representations when drawing the distribution using a grid we implement the isotonic regression technique to ensure the function is non–decreasing. However, for sampling, this is not necessary. For independent observations y from some d–dimensional distribution function F

we can provide a generative model by estimating a sequence of marginal and conditional distribution functions. That is, suppose d = 5 and we extend the set–up illustrated in Example 2.

Example 3. We take data with a sample size of n = 1000 from the d = 5 multivariate normal distribution with a Markov structure; so we take y1 = N(0, 1) and for j = 2, . . . , d we have [yj | yj−1] = N ρyj−1, 1 − ρ2 . We use ρ = 0.6.
For the generative estimator we assume we do not know the data structure. We do this
by obtaining marginal and conditional distributions; speciﬁcally we estimate

1n

F (y1)

=

1 2

+

Si(R1(y1 − y1i))

i=1

7

1.0

0.8

0.6

Distribution

0.4

0.2

0.0

0.0

0.2

0.4 share 0.6

0.8

1.0

Figure 5: Sequence of conditional distributions in the Engel95 dataset in Example 4.

and for j = 2 : d,

F (yj | y1:j−1) = 12 + π1

n i=1

Si(R1(yj

yji))

jl=−11 KR2 (yl − yli) .

n i=1

j−1 l=1

KR2 (yl

yli)

To generate samples we use the idea from equation (6) sequentially, starting with F (y1) to

get y1∗ and subsequently y2∗:d using the F (yj | y1∗:j−1). Figure 3 shows 100 generated y1∗:d and also 100 of the 1000 y1:d represented as a time

series plot. When viewed as samples from a joint distribution, each yj is marginally standard

normal and has correlation 0.6 with yj−1. The average of the means of the generated samples

is

-0.15

and

the

average

of

the

variances

is

0.96.

A

plot

of

the

generated

samples

(y

∗ 2

,

y3∗

)

is

presented in Figure 4 and the measured correlation between the pairs of samples is 0.50.

Example 4. A further example involves a dataset from the R package np and is described in Hayﬁeld and Racine . The dataset is “Engel95” and consists of household data from 1655 married families with 10 observations per family including expenditure on food, catering, fuel and other such commodities. One of the variables is the number of children in the household which is discrete and we leave that variable out of the dataset. We remain with 9 variables and condition the 7 expenditure shares of food, catering, alcohol, fuel, motor, fares, and leisure on the log of total expenditure and the log of total earnings. We number the variables in the same order as we have just written them here.
We do the inference via sequential conditioning, starting with the conditional for F (y1 | y8, y9) and then the conditionals F (yl | y1:l−1, y8, y9) for l = 2, . . . , 7. We estimate the conditional distributions at the means of each of the variables and the conditionals are presented in Figure 5. The jumps at the start of some of the distributions are due to a number of households recording zero share in some of the expenditures. In the analysis, we took R = 20.

8

y 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.00

0.05

0.10

0.15

0.20 x

0.25

0.30

0.35

Figure 6: Data plot with quartile functions from Engel95 dataset.

4 Quantile regression and Markov data
In this section, we discuss an application of the Fourier integral theorem to quantile regression and Markov data.

4.1 Quantile regression
Quantile regression has become an important area of statistical analysis since the prioneering work of Koenker and Bassett . The most popular approaches to nonparametric quantile regression use the “pinball” loss function

lu(ξ) =

ξ≥0

(u − 1)ξ ξ < 0,

with 0 < u < 1 representing the quantile of interest. The idea is that the quantile regression function Q(u | x) is the solution to the minimization problem;

n
min lu(yi − f (xi)) + λ||f ||
f i=1

where

the

observed

data

are

(x

i

,

yi

)

n i=1

and

|| · ||

is

some

chosen

penalty

function

while

f

is

modelled in traditional ways using for example splines or polynomials or kernels. See, for

example, Yu and Jones , Liu and Wu , Takeuchi et al. , and Koenker .

Another traditional approach is to get the inverse of a kernel estimated distribution function, e.g., Daouia et al. ; noting that there appears no direct kernel estimation of a quantile. However, here we show how the Fourier integral theorem can be used to obtain a

9

700

0.5

600

0.0

500

-0.5

400

y

y

300

-1.0

200

-1.5

100

-2.0

0

2000

4000

6000

8000

Index

0

2000

4000

6000

8000

(a)

(b)

Figure 7. (a) Raw data of 9311 daily records of NYSE Composite Index; (b) Transformed NYSE Composite Index data.

quantile regression function using the KR kernel. We are looking to ﬁnd the function Q(u | x), the quantile function corresponding to the distribution F (y | x). Now, ﬁxing R, we have

1 QR(u | x) = π

sin(R(u − v)) Q(v | x) dv.
u−v

Transforming v = F (y | x) we get

1 QR(u | x) = π

sin(R(u − F (y | x))) y f (y | x)dy.
u − F (y | x)

Consequently, the Monte Carlo estimator of QR(· | x) is given by

1 n sin(R(u − F (yi∗ | x))) ∗

QR(u | x) = nπ

u − F (y∗ | x) yi ,

i=1

i

where the (yi∗) are taken from F (· | x), which itself is estimated as

F (y | x) = 1 + 1 2π

n j=1

Si(R(y

yj

))

KR(x

xj

)

nj=1 KR(x − xj ) .

We took data from the Engel95 data set; the ﬁrst and second columns, regressing the ﬁrst column on the second column. With n = 1655 we took R = 10 and took 5000 samples from each F (· | x) with the x values being (0.02, 0.04, . . . , 0.2). We then computed the quartiles; the data with the quartile functions are presented in Figure 6.

4.2 Markov data
The data for this section is to be found in the R package fBasics and consists of n = 9311 data points of daily records of the NYSE Composite Index. A plot of the raw data is given

10 