The question is why IPTW instead of multivariate adjustment.
Let Y be the outcome, X the confounders, and A the binary treatment.
Bear in mind, all of this assumes no unmeasured confounders - that we know that X contains everything that impacts Y and A.
Let Y,X,A \sim P(Y,A,X)=P(Y|A,X)P(A|X)P(X).
Now focus on P(A|X). This is a treatment distribution - it is how treatments are assigned based on confounders (baseline covariates). It is the “true” treatment distribution - in other words, it generated the data. (For intuition, in a trial P(A|X)=0.5.) I am going to subscript expectations with respect to these treatment sampling distributions.
Now consider a different treatment distribution, which we would like to evaluate. In particular, consider the distribution I(A=a), which is 1 if the treatment is a and 0 otherwise. This is what we are after in causal inference. We ask: what would the average outcome be, were we to use treatment a, even if, in reality, we actually sampled the treatment random variable A from P(A|X), sometimes setting A to a and sometimes to not a.
EY(a) is the counterfactual expectation we would like to know - what would happen on average if we could set the treatment A to a.
Note that
EY(a)=E_{I(A=a)}Y=\int y P(y|a,x)I(A=a)p(x)dP.
One way to think about IPTW is as a density transform.
Note (I only multiply by 1):
E_{I(A=a)}Y=E_{I(A=a)}\frac{P(Y,A,X)}{P(Y,A,X)} Y = E_{P(A|X)}\frac{P(Y|A,X)I(A=a)P(X)}{P(Y,A,X)} Y which can be estimated with \frac{1}{n} \sum \frac{I(A_i=a)}{P(A_i|X_i)} Y_i, which is an IPTW estimator with density P(A_i|X_i) as the propensity.
In contrast multivariate adjustment would be something like Y=\theta A + \eta X+\epsilon, where we try to assume a normal distribution for Y|A,X that is linear in its parameters. If that multivariate model is wrong (it almost certaintly is), then the causal inference is wrong.
Whereas with IPW only P(A_i|X_i) needs to be specified correctly, and generally this is done with a nonparametric estimator (will need more sample to do this, of course). It is easier in theory to specify and estimate P(A|X) in IPW than P(Y|A,X) as in the multivariate adjustment case.
Hence IPW is a very popular method, and one would not generally use multivariate adjustment - a multivariable model may require less sample, but the bias from mispecifying the model doesn’t make up for it.