Hello all,
I’m interested in using a multi output Gaussian process to predict compositional data of the form:
NxD (ordered by time), j={1,i,...I) , j = {1,2,...J}
where
N i,j = observation_i, response d_j
I assume that there is some correlation along both the spatial axis, as well as the response axis. For simplicity and perhaps ease of computation; I assume that time and space kernels are separable.
I’d like to try to apply a multi output gaussian process to this model. Some preliminaries I think should apply here:
let
y_i
be the ith observation for the entire response vector. Then,
y_i ~ Dirichlet-Multinomial(M_i, a_i)
where M_i
is just the sum of the d_j
counts at i and a_i
are the respective alphas for the Dirichlet parameters on the prior.
Now for the modeling part-what portion of this should i directly apply the GP to?
My thoughts are similar to this paper here:https://arxiv.org/pdf/1903.05036 ; fit a gp to the a’s in the Dirichlet process in a time/space evolving way via some periodic kernels, etc.
However, I may be missing some mathematical detial grounded in geostats or the like-as pages of 6-7 of the linked text do not make clear exactly how their choice of mean vector is chosen or defined.
However, I’m not sure exactly where to start here-nor how to choose candidate alphas corresponding to observed count vectors. My thoughts are:
-
Use some max ent tools to find a sensible prior on the a’s on the proportions observed in a ‘training’ set of the data up to that time t where the training ends
-
define a kernel that separates the time effect and ‘space’ effect of the response to others.
-
do my prior predictive checks.
Thanks so much for having a look!