# MicrostructureNoise.jl

`MicrostructureNoise`

is a Julia package for Bayesian volatility estimation in presence of market microstructure noise. The underlying model is the stochastic differential equation

$ dX*t=b(t,X*t)\,dt + s(t)\,d W*t, \quad X*0=x_0, \quad t\in [0,T] .$

The estimation method is minimalistic in its assumptions on the volatility function $s$, which in particular can be a stochastic process. The process $X$ is latent: observed is its noisy version on a discrete time grid,

$ Y*{i}=X*{t*{i}}+V*{i}, \quad 0<t*1<\cdots<t*n=T.$

Here $\{ V_i \}$ denote unobservable stochastic disturbances, and $n$ is the total number of observations.

For data $\{Y_i\}$ that are densely spaced in time, the drift function $b$ has little effect on estimation accuracy of the volatility function $s$, and can be set to zero. This reduces the original model to the linear state space model, and statistical tools developed for the latter can be used to infer the unknown volatility. The posterior inference is performed via the Gibbs sampler, and relies on Kalman filtering ideas to reconstruct unobservable states $\{X(t_i)\}$.

Essential details of the procedure are as follows: The unknown squared volatility function $s^2$ is a priori modelled as piecewise constant: Fix an integer $m<n$. Then a unique decomposition $n=mN+r$ with $0\leq r<m$ holds, where $N=\lfloor {n}/{m}\rfloor$. Now define bins $B_k=[t_{m(k-1)},t_{mk})$, $k=1,\ldots,N-1$, and $B_N=[t_{m(N-1)},T]$. The number $N$ of bins is a hyperparameter. Let $s$ be piecewise constant on bins $B_k$, so that

$ s^2=\sum*{k=1}^{N} \theta*k \mathbf{1}*{B*k}.$

The coefficients $\{ \theta_k \}$ are assigned the inverse Gamma Markov chain prior, which induces smoothing among adjacent pieces of the function $s^2$. This prior is governed by the smoothing hyperparameter $\alpha$, which in turn is equipped with a hyperprior. The errors $\{V_i\}$ are assumed to follow the Gaussian distribution with mean zero and variance $\eta$. The Bayesian model specification is completed by assigning the noise level $\eta$ the inverse Gamma prior, and equipping the initial state $X_0$ with the Gaussian prior. To sample from the joint posterior of the vector $\{\theta_k\}$, the noise level $\eta$ and the smoothing hyperparameter $\alpha$, the Gibbs sampler is used. In each cycle of the sampler, the unobservable state vector $\{X(t_i)\}$ is drawn from its full conditional distribution using the Forward Filtering Backward Simulation algorithm; this employs Kalman filter recursions in the forward pass.

Synthetic data examples show that the procedure adapts well to the unknown smoothness of the volatility $s$.

See the referenced article for additional details on prior specification, implementation, and numerical experiments.

## Example

```
using MicrostructureNoise, Distributions
# downloads a large file
Base.download("https://www.truefx.com/dev/data//2015/MARCH-2015/EURUSD-2015-03.zip","./data/EURUSD-2015-03.zip")
run(`unzip ./data/EURUSD-2015-03.zip -d ./data`)
dat = readcsv("./data/EURUSD-2015-03.csv")
times = map(a -> DateTime(a, "yyyymmdd H:M:S.s"), dat[1:10:130260,2])
tt = Float64[1/1000*(times[i].instant.periods.value-times[1].instant.periods.value) for i in 1:length(times)]
n = length(tt)-1
T = tt[end]
y = Float64.(dat[1:10:130260, 3])
prior = MicrostructureNoise.Prior(
N = 40,
α1 = 0.0,
β1 = 0.0,
αη = 0.3,
βη = 0.3,
Πα = LogNormal(1., 0.5),
μ0 = 0.0,
C0 = 5.0
)
α = 0.3
σα = 0.1
td, θs, ηs, αs, p = MicrostructureNoise.MCMC(prior, tt, y, α, σα, 1500)
posterior = MicrostructureNoise.posterior_volatility(td, θs)
```

## Library

`MicrostructureNoise.Prior`

— Type.```
MicrostructureNoise.Prior(; N, α1, β1, αη, βη, Πα, μ0, C0)
MicrostructureNoise.Prior(; kwargs...)
```

Struct holding prior distribution parameters. `N`

is the number of bins, `InverseGamma(α1, β1)`

is the prior of `θ[1]`

on the first bin, the prior on the noise variance `η`

is `InverseGamma(αη, βη)`

, the hidden state $X_0$ at start time is `Normal(μ0, C0)`

, and `Πα`

is a prior `Distribution`

for `α`

, for example `Πα = LogNormal(1., 0.5)`

.

Note: All keyword arguments `N, α1, β1, αη, βη, Πα, μ0, C0`

are mandatory.

Example:

```
prior = MicrostructureNoise.Prior(
N = 40, # number of bins
α1 = 0.0, # prior for the first bin
β1 = 0.0,
αη = 0.3, # noise variance prior InverseGamma(αη, βη)
βη = 0.3,
Πα = LogNormal(1., 0.5),
μ0 = 0.0,
C0 = 5.0
)
```

`MicrostructureNoise.MCMC`

— Function.```
MCMC(Π::Union{Prior,Dict}, t, y, α0::Float64, σα, iterations;
subinds = 1:1:iterations, η0::Float64 = 0.0, printiter = 100,
fixalpha = false, fixeta = false, skipfirst = false) -> td, θ, ηs, αs, pacc
```

Run the Markov Chain Monte Carlo procedure for `iterations`

iterations, on data `(t, y)`

, where `t`

are observation times and `y`

are observations. `α0`

is the initial guess for the smoothing parameter `α`

(necessary), `η0`

is the initial guess for the noise variance (optional), and `σα`

is the stepsize for the random walk proposal for `α`

.

Prints verbose output every `printiter`

iteration.

Returns `td, θs, ηs, αs, pacc`

, `td`

is the time grid of the bin boundaries, `ηs`

, `αs`

are vectors of iterates, possible subsampled at indices `subinds`

, `θs`

is a Matrix with iterates of `θ`

rows. `paccα`

is the acceptance probability for the update step of `α`

.

`y[i]`

is the observation at `t[i]`

.

If `skipfirst = true`

and `t`

and `y`

are of equal length, the observation `y[1]`

(corresponding to `t[1]`

) is ignored.

If `skipfirst = true`

and `length(t) = length(y) + 1`

, `y[i]`

is the observation at `t[i + 1]`

.

Keyword args `fixalpha`

, `fixeta`

when set to `true`

allow fixing `α`

and `η`

at their initial values.

`MicrostructureNoise.Posterior`

— Type.```
struct Posterior
post_t # Time grid of the bins
post_qlow # Lower boundary of marginal credible band
post_median # Posterior median
post_qup # Upper boundary of marginal credible band
post_mean # Posterior mean of `s^2`
post_mean_root # Posterior mean of `s`
qu # `qu*100`-% marginal credible band
end
```

Struct holding posterior information for squared volatility `s^2`

.

`MicrostructureNoise.posterior_volatility`

— Function.`posterior_volatility(td, θs; burnin = size(θs, 2)÷3, qu = 0.90)`

Computes the `qu*100`

-% marginal credible band for squared volatility `s^2`

from `θ`

.

Returns `Posterior`

object with boundaries of the marginal credible band, posterior median and mean of `s^2`

, as well as posterior mean of `s`

.

`MicrostructureNoise.piecewise`

— Function.`piecewise(t, y, [endtime]) -> t, xx`

If `(t, y)`

is a jump process with piecewise constant paths and jumps of size `y[i]-y[i-1]`

at `t[i]`

, piecewise returns coordinates path for plotting purposes. The second argument allows to choose the right endtime of the last interval.

## Contribute

See issue #1 (Roadmap/Contribution) for questions and coordination of the development.

## References

Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility estimation. arxiv:1801.09956, 2018.

Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility learning under microstructure noise. In preparation.