Hi There!
I have recently started toying around with Causal Models specifically the kinds from Judea Perl school of thought. In practical application however, I find that there are several numerous causes to an outcome. Since the approach (and libraries such as DoWhy & ananke) recommend starting with a partial causal model is it a good idea to model only top few causes that have a strong correlation with the Outcome? Are there any pitfalls to this approach?
Here is an example:
I want to measure the causal impact of couple of key Weather related attributes on number of customers visiting a store. Weather forecast has several attributes such as:
‘date’, ‘week_no’, ‘temperature’, ‘dew_point’,
‘pressure’, ‘ground_pressure’, ‘humidity’, ‘clouds’, ‘wind_speed’,
‘wind_deg’, ‘rain’, ‘snow’, ‘ice’, ‘fr_rain’, ‘convective’,
‘snow_depth’, ‘accumulated’, ‘hours’, ‘rate’
If my data shows that Historically ‘week_no’, ‘temperature’, ‘dew_point’, ‘pressure’ have the strongest correlation to store traffic (number of customers visiting a store) is it alright to create a graphical model just using these?
In reality I have 23-30 additional causes apart from weather, which have to be added to the model.