Gaussian Processes with Compositional Data

Hello all,

I’m interested in using a multi output Gaussian process to predict compositional data of the form:
NxD (ordered by time), j={1,i,...I) , j = {1,2,...J}

where

N i,j = observation_i, response d_j

I assume that there is some correlation along both the spatial axis, as well as the response axis. For simplicity and perhaps ease of computation; I assume that time and space kernels are separable.

I’d like to try to apply a multi output gaussian process to this model. Some preliminaries I think should apply here:

let
y_i
be the ith observation for the entire response vector. Then,

y_i ~ Dirichlet-Multinomial(M_i, a_i)
where M_i is just the sum of the d_j counts at i and a_i are the respective alphas for the Dirichlet parameters on the prior.

Now for the modeling part-what portion of this should i directly apply the GP to?

My thoughts are similar to this paper here:https://arxiv.org/pdf/1903.05036 ; fit a gp to the a’s in the Dirichlet process in a time/space evolving way via some periodic kernels, etc.

However, I may be missing some mathematical detial grounded in geostats or the like-as pages of 6-7 of the linked text do not make clear exactly how their choice of mean vector is chosen or defined.

However, I’m not sure exactly where to start here-nor how to choose candidate alphas corresponding to observed count vectors. My thoughts are:

  1. Use some max ent tools to find a sensible prior on the a’s on the proportions observed in a ‘training’ set of the data up to that time t where the training ends

  2. define a kernel that separates the time effect and ‘space’ effect of the response to others.

  3. do my prior predictive checks.

Thanks so much for having a look!