Frank,

I have given some details of the design in the post Cross sectional study design and analysis technique for evaluating the impact of COVID 19 on radiotherapy quality of care indicators. Briefly my goal is to analyze the impact of the treatment period on the number of Quality Indicators adhered to.

For each treatment course, we will therefore have a count of QI adhered to - for some it may be 19, while others it may be 0. Realistically, however, it is likely the number will lie between 18 to 8. This is because some of the QIs are always adhered to in our setting (and unfortunately one is never).

I started out by doing a sample size calculation using the posamsize function with probability vectors like this : (note that the values in the vector represent the proportion of the treatment courses that will fall in each of the 11 possible counts)

```
p <- c(0.9,0,0,0,0,0,0,0.1,0,0,0) #Sample size = 415.5
q <- c(0.8,0,0,0,0,0,0,0.2,0,0,0) #Sample size = 233.7
r <- c(0.7,0,0,0,0,0,0,0.2,0.1,0,0) #Sample Size = 178.1
```

and so on but this quickly became tedious.

I noted though that the sample size increased as we had more extreme values of the probability and that it did not matter where the values were placed in the vector. For example:

```
x <- c(0.7,0.0,0.01,0.02,0.03,0,0.03,0,0.2,0,0.01) # Sample Size = 172.9
y <- c(0.6,0.0,0.01,0.02,0.03,0,0.03,0,0.3,0,0.01) # Sample Size = 148.9
```

As I have outlined in my last post I tried to use the expand grid function to generate probability but ran out of memory. Essentially I realized I needed a set of 11 probabilities where the sum would equal 1. So that is where the Dirichlet function came in (after some searching in StackExchange).

I put this up here as I also feel that the narrow range of sample sizes I am getting is unrealistic. However, I know that it is also unrealistic that we will have a probablity distribution like p for example. Most likely we will have something like x and y but what exactly will come I have no idea. Hence I decided to simulate this using random numbers.

After some playing around with the rdirichlet function I realized that the alpha parameter influences how the probablity - hence I decided to put random numbers in the alpha parameter in the code above.

After further searching I came across the RandVec which does the same thing - generate a set of vectors of values with a fixed sum (in this case 1).

```
library(Surrogate)
library(data.table)
library(Hmisc)
x <- c(0.6,0.0,0.01,0.02,0.03,0,0.03,0,0.3,0,0.01)
posamsize(x,odds.ratio=2.5)
l <- RandVec(a=0,b=1,s=1,n=11,m=100)$RandVecOutput
l <- as.data.frame(l)
l1 <- l %>% transpose(.)
l1 <- l1 %>%
rowwise() %>%
mutate(ss=posamsize(c_across(V1:V11),odds.ratio=2.5,power=0.8,alpha=0.05)$n)
```

The resultant sample sizes are essentially similar.

I would love to learn more about how we should actually do these sample size calculations from you Frank.

Link to the protocol for the study being drafted is here : Link to protocol.