How would you Conceive of this Iterative Decision Analysis?

I’ve been trying to think of how one would solve this problem analytically, for 2 days now and gotten almost nowhere, I would be interested hear how others would solve this below ‘toy’ problem (that is based on a real problem I have).

You have 100 people, and each of these people will do the following study twice (so clustered design):

They need to perform three trials where a person flips a a coin. The person can keep the largest monetary value coin that lands ‘heads’ out out of the three trials. If the person never gets a ‘heads’ they owe the study designer a dollar (so there is an imperative to get at least one coin). Everyone starts with a nickel (5 cent coin) which is a ‘fair’ coin with 50/50 chance of heads/tails. After that coin flip, they can decide to either flip a penny (1 cent coin) which is has an unknown higher likelihood of landing on it’s heads (let’s say somewhere between 55-80% probability of heads), a nickel again (fair coin) or flip a quarter which has an unknown lower likelihood of landing on it’s heads (say 10-30%) chance of heads or a half-dollar which has a unknown but even lower likelihood of landing on it’s heads (say 0-15%). After trial 2, the person has to make the same decision a third time. Again, it is important they get at least one coin to go heads, but they also want to keep the largest monetary coin as possible.

Other than Even probability, all of the other “higher probability” a “lower probability” are unknown and have an element of chance even though they’re weighted. Let’s say I have data on these 100 participants completing these trials. How would you model what the “best” decision rules are after each flip?

I have a few ideas, none of which I love, so I’d be interested to hear others’ thoughts.

EDIT: to be clear, this is a toy example and the real data and problem are more complex, so it won’t work to simply get a rough idea of the “weights” of the coins from empirical data and work out the problem analytically. I actually need a statistical or machine learning model for the empirical data

To clarify, it sounds like you are asking (assuming the trial participants are making optimal decisions under uncertainty) how to infer the p_{25c} and p_{50c} heads chance parameters from the 100 subjects? Or are you interested in inferring the actual decision making process of the test subjects which may or may not be optimal?

So that’s a good question, let’s assume the latter. In reality, the data is a bit more complicated because the ‘heads chance parameters’ for each coin are relative to the individual and also likely vary across time (a p_{25c} at ‘flip 2’ is unlikely to be the same as p_{25c} at ‘flip 3’).

I’ve had a lot of good discussions with individuals in private and the “best solution” (which I’m not all that familiar with) seems to be a reinforcement learning technique since the process could probably be effectively described as a hierarchical Markov Decision Process.

However, I am wondering if it would be “good enough” to model this as three sequential Bayesian multilevel model logistic regressions (to model heads/tails) and a final order multinomial Bayesian MLM (which predicts which coin they end up with at the end). In this way, the previous M1’s posteriors can inform the priors on M2, M2’s posteriors inform M3… on to M4. This would not directly model the time effect and consider these processes somewhat independent, though they’re providing information to the following model. I could then simulate across all models after building with the empirical data. Is there an immediate fatal flaw in this approach? I realize a couple of minor flaws and inelegance to it.