Validating the Causal Affect of the Artificial Management Methodology | by Ryan O'Sullivan

Causal AI, exploring the combination of causal reasoning into machine studying

Welcome to my collection on Causal AI, the place we’ll discover the combination of causal reasoning into machine studying fashions. Anticipate to discover various sensible functions throughout completely different enterprise contexts.

Within the final article we coated measuring the intrinsic causal affect of your advertising campaigns. On this article we’ll transfer onto validating the causal influence of the artificial controls.

In the event you missed the final article on intrinsic causal affect, test it out right here:

On this article we’ll concentrate on understanding the artificial management methodology and exploring how we will validate the estimated causal influence.

The next facets will likely be coated:

What’s the artificial management methodology?
What problem does it attempt to overcome?
How can we validate the estimated causal influence?
A Python case examine utilizing lifelike google development information, demonstrating how we will validate the estimated causal influence of the artificial controls.

The total pocket book might be discovered right here:

What’s it?

The artificial management methodology is a causal approach which can be utilized to evaluate the causal influence of an intervention or remedy when a randomised management trial (RCT) or A/B take a look at was not potential. It was initially proposed in 2003 by Abadie and Gardezabal. The next paper features a nice case examine that can assist you perceive the proposed methodology:

https://net.stanford.edu/~jhain/Paper/JASA2010.pdf

Let’s cowl a few of the fundamentals ourselves… The artificial management methodology creates a counterfactual model of the remedy unit by making a weighted mixture of management items that didn’t obtain the intervention or remedy.

Handled unit: The unit which receives the intervention.
Management items: A set of comparable items which didn’t obtain the intervention.
Counterfactual: Created as a weighted mixture of the management items. Goal is to seek out weights for every management unit that lead to a counterfactual which carefully matches the handled unit within the pre-intervention interval.
Causal influence: The distinction between the post-intervention remedy unit and counterfactual.

If we wished to actually simplify issues, we might consider it as linear regression the place every management unit is a characteristic and the remedy unit is the goal. The pre-intervention interval is our prepare set, and we use the mannequin to attain our post-intervention interval. The distinction between the precise and predicted is the causal influence.

Under are a pair examples to carry it to life after we may think about using it:

When operating a TV advertising marketing campaign, we’re unable to randomly assign the viewers into these that may and may’t see the marketing campaign. We might nonetheless, fastidiously choose a area to trial the marketing campaign and use the remaining areas as management items. As soon as now we have measured the impact the marketing campaign could possibly be rolled out to different areas. That is typically known as a geo-lift take a look at.
Coverage modifications that are introduced into some areas however not others — For instance a neighborhood council might carry a coverage turn into drive to scale back unemployment. Different areas the place the coverage wasn’t in place could possibly be used as management items.

What problem does it attempt to overcome?

After we mix high-dimensionality (numerous options) with restricted observations, we will get a mannequin which overfits.

Let’s take the geo-lift instance for example. If we use weekly information from the final 12 months as our pre-intervention interval, this provides us 52 observations. If we then resolve to check our intervention throughout nations in Europe, that may give us an commentary to characteristic ratio of 1:1!

Earlier we talked about how the artificial management methodology could possibly be applied utilizing linear regression. Nevertheless, the commentary to characteristic ratio imply it is rather possible linear regression will overfit leading to a poor causal influence estimate within the post-intervention interval.

In linear regression the weights (coefficients) for every characteristic (management unit) could possibly be unfavorable or constructive and so they might sum to a quantity higher than 1. Nevertheless, the artificial management methodology learns the weights while making use of the under constraints:

Constraining weights to sum to 1
Constraining weights to be ≥ 0

These constraints assist with regularisation and keep away from extrapolation past the vary of the noticed information.

It’s price noting that when it comes to regularisation, Ridge and Lasso regression can obtain this, and in some instances are affordable options. However we’ll take a look at this out within the case examine!

How can we validate the estimated causal influence?

An arguably larger problem is the truth that we’re unable to validate the estimated causal influence within the post-intervention interval.

How lengthy ought to my pre-intervention interval be? Are we certain we haven’t overfit our pre-intervention interval? How can we all know whether or not our mannequin generalises properly within the submit intervention interval? What if I need to check out completely different implementations of artificial management methodology?

We might randomly choose a couple of observations from the pre-intervention interval and maintain them again for validation — However now we have already highlighted the problem which comes from having restricted observations so we might make issues even worse!

What if we might run some type of pre-intervention simulation? May that assist us reply a few of the questions highlighted above and achieve confidence in our fashions estimated causal influence? All will likely be defined within the case examine!

Background

After convincing Finance that model advertising is driving some critical worth, the advertising staff method you to ask about geo-lift testing. Somebody from Fb has instructed them it’s the following massive factor (though it was the identical one that instructed them Prophet was an excellent forecasting mannequin) and so they need to know whether or not they might use it to measure their new TV marketing campaign which is arising.

You’re a little involved, because the final time you ran a geo-lift take a look at the advertising analytics staff thought it was a good suggestion to mess around with the pre-intervention interval used till they’d a pleasant massive causal influence.

This time spherical, you counsel that they run a “pre-intervention simulation” after which you plan that the pre-intervention interval is agreed earlier than the take a look at begins.

So let’s discover what a “pre-intervention simulation” appears like!

Creating the info

To make this as lifelike as potential, I extracted some google development information for almost all of nations in Europe. What the search time period was isn’t related, simply fake it’s the gross sales for you firm (and that you simply function throughout Europe).

Nevertheless, in case you are all in favour of how I acquired the google development information, try my pocket book:

Under we will see the dataframe. We’ve got gross sales for the previous 3 years throughout 50 European nations. The advertising staff plan to run their TV marketing campaign in Nice Britain.

Now right here comes the intelligent bit. We’ll simulate an intervention within the final 7 weeks of the time collection.

np.random.seed(1234)# Create intervention flag
masks = (df['date'] >= "2024-04-14") & (df['date'] <= "2024-06-02")
df['intervention'] = masks.astype(int)
row_count = len(df)
# Create intervention uplift
df['uplift_perc'] = np.random.uniform(0.10, 0.20, measurement=row_count)
df['uplift_abs'] = spherical(df['uplift_perc'] * df['GB'])
df['y'] = df['GB']
df.loc[df['intervention'] == 1, 'y'] = df['GB'] + df['uplift_abs']

Now let’s plot the precise and counterfactual gross sales throughout GB to carry what now we have carried out to life:

def synth_plot(df, counterfactual):plt.determine(figsize=(14, 8))
sns.set_style("white")
# Create plot
sns.lineplot(information=df, x='date', y='y', label='Precise', colour='b', linewidth=2.5)
sns.lineplot(information=df, x='date', y=counterfactual, label='Counterfactual', colour='r', linestyle='--', linewidth=2.5)
plt.title('Artificial Management Methodology: Precise vs. Counterfactual', fontsize=24)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Metric Worth', fontsize=20)
plt.legend(fontsize=16)
plt.gca().xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter('%Y-%m-%d'))
plt.xticks(rotation=90)
plt.grid(True, linestyle='--', alpha=0.5)
# Excessive the intervention level
intervention_date = '2024-04-07'
plt.axvline(pd.to_datetime(intervention_date), colour='okay', linestyle='--', linewidth=1)
plt.textual content(pd.to_datetime(intervention_date), plt.ylim()[1]*0.95, 'Intervention', colour='okay', fontsize=18, ha='proper')
plt.tight_layout()
plt.present()

synth_plot(df, 'GB')

So now now we have simulated an intervention, we will discover how properly the artificial management methodology will work.

Pre-processing

The entire European nations aside from GB are set as management items (options). The remedy unit (goal) is the gross sales in GB with the intervention utilized.

# Delete the unique goal column so we do not use it as a characteristic by chance
del df['GB']# set characteristic & targets
X = df.columns[1:50]
y = 'y'

Regression

Under I’ve setup a perform which we will re-use with completely different pre-intervention intervals and completely different regression fashions (e.g. Ridge, Lasso):

def train_reg(df, start_index, reg_class):df_temp = df.iloc[start_index:].copy().reset_index()
X_pre = df_temp[df_temp['intervention'] == 0][X]
y_pre = df_temp[df_temp['intervention'] == 0][y]
X_train, X_test, y_train, y_test = train_test_split(X_pre, y_pre, test_size=0.10, random_state=42)
mannequin = reg_class
mannequin.match(X_train, y_train)
yhat_train = mannequin.predict(X_train)
yhat_test = mannequin.predict(X_test)
mse_train = mean_squared_error(y_train, yhat_train)
mse_test = mean_squared_error(y_test, yhat_test)
print(f"Imply Squared Error prepare: {spherical(mse_train, 2)}")
print(f"Imply Squared Error take a look at: {spherical(mse_test, 2)}")
r2_train = r2_score(y_train, yhat_train)
r2_test = r2_score(y_test, yhat_test)
print(f"R2 prepare: {spherical(r2_train, 2)}")
print(f"R2 take a look at: {spherical(r2_test, 2)}")
df_temp['pred'] = mannequin.predict(df_temp.loc[:, X])
df_temp['delta'] = df_temp['y'] - df_temp['pred']
pred_lift = df_temp[df_temp['intervention'] == 1]['delta'].sum()
actual_lift = df_temp[df_temp['intervention'] == 1]['uplift_abs'].sum()
abs_error_perc = abs(pred_lift - actual_lift) / actual_lift
print(f"Predicted raise: {spherical(pred_lift, 2)}")
print(f"Precise raise: {spherical(actual_lift, 2)}")
print(f"Absolute error share: {spherical(abs_error_perc, 2)}")
return df_temp, abs_error_perc

To begin us off we hold issues easy and use linear regression to estimate the causal influence, utilizing a small pre-intervention interval:

df_lin_reg_100, pred_lift_lin_reg_100 = train_reg(df, 100, LinearRegression())

Trying on the outcomes, linear regression doesn’t do nice. However this isn’t shocking given the commentary to characteristic ratio.

synth_plot(df_lin_reg_100, 'pred')

Artificial management methodology

Let’s soar proper in and see the way it compares to the artificial management methodology. Under I’ve setup the same perform as earlier than, however making use of the artificial management methodology utilizing sciPy:

def synthetic_control(weights, control_units, treated_unit):artificial = np.dot(control_units.values, weights)
return np.sqrt(np.sum((treated_unit - artificial)**2))
def train_synth(df, start_index):
df_temp = df.iloc[start_index:].copy().reset_index()
X_pre = df_temp[df_temp['intervention'] == 0][X]
y_pre = df_temp[df_temp['intervention'] == 0][y]
X_train, X_test, y_train, y_test = train_test_split(X_pre, y_pre, test_size=0.10, random_state=42)
initial_weights = np.ones(len(X)) / len(X)
constraints = ({'sort': 'eq', 'enjoyable': lambda w: np.sum(w) - 1})
bounds = [(0, 1) for _ in range(len(X))]
end result = decrease(synthetic_control, 
initial_weights, 
args=(X_train, y_train),
methodology='SLSQP', 
bounds=bounds, 
constraints=constraints,
choices={'disp': False, 'maxiter': 1000, 'ftol': 1e-9},
)
optimal_weights = end result.x
yhat_train = np.dot(X_train.values, optimal_weights)
yhat_test = np.dot(X_test.values, optimal_weights)
mse_train = mean_squared_error(y_train, yhat_train)
mse_test = mean_squared_error(y_test, yhat_test)
print(f"Imply Squared Error prepare: {spherical(mse_train, 2)}")
print(f"Imply Squared Error take a look at: {spherical(mse_test, 2)}")
r2_train = r2_score(y_train, yhat_train)
r2_test = r2_score(y_test, yhat_test)
print(f"R2 prepare: {spherical(r2_train, 2)}")
print(f"R2 take a look at: {spherical(r2_test, 2)}")    
df_temp['pred'] = np.dot(df_temp.loc[:, X].values, optimal_weights)
df_temp['delta'] = df_temp['y'] - df_temp['pred']
pred_lift = df_temp[df_temp['intervention'] == 1]['delta'].sum()
actual_lift = df_temp[df_temp['intervention'] == 1]['uplift_abs'].sum()
abs_error_perc = abs(pred_lift - actual_lift) / actual_lift
print(f"Predicted raise: {spherical(pred_lift, 2)}")
print(f"Precise raise: {spherical(actual_lift, 2)}")
print(f"Absolute error share: {spherical(abs_error_perc, 2)}")
return df_temp, abs_error_perc

I hold the pre-intervention interval the identical to create a good comparability to linear regression:

df_synth_100, pred_lift_synth_100 = train_synth(df, 100)

Wow! I’ll be the primary to confess I wasn’t anticipating such a major enchancment!

synth_plot(df_synth_100, 'pred')

Comparability of outcomes

Let’s not get too carried away but. Under we run a couple of extra experiments exploring mannequin varieties and pre-interventions intervals:

# run regression experiments
df_lin_reg_00, pred_lift_lin_reg_00 = train_reg(df, 0, LinearRegression())
df_lin_reg_100, pred_lift_lin_reg_100 = train_reg(df, 100, LinearRegression())
df_ridge_00, pred_lift_ridge_00 = train_reg(df, 0, RidgeCV())
df_ridge_100, pred_lift_ridge_100 = train_reg(df, 100, RidgeCV())
df_lasso_00, pred_lift_lasso_00 = train_reg(df, 0, LassoCV())
df_lasso_100, pred_lift_lasso_100 = train_reg(df, 100, LassoCV())# run artificial management experiments
df_synth_00, pred_lift_synth_00 = train_synth(df, 0)
df_synth_100, pred_lift_synth_100 = train_synth(df, 100)
experiment_data = {
"Methodology": ["Linear", "Linear", "Ridge", "Ridge", "Lasso", "Lasso", "Synthetic Control", "Synthetic Control"],
"Knowledge Measurement": ["Large", "Small", "Large", "Small", "Large",  "Small", "Large", "Small"],
"Worth": [pred_lift_lin_reg_00, pred_lift_lin_reg_100, pred_lift_ridge_00, pred_lift_ridge_100,pred_lift_lasso_00, pred_lift_lasso_100, pred_lift_synth_00, pred_lift_synth_100]
}
df_experiments = pd.DataFrame(experiment_data)

We’ll use the code under to visualise the outcomes:

# Set the fashion
sns.set_style="whitegrid"# Create the bar plot
plt.determine(figsize=(10, 6))
bar_plot = sns.barplot(x="Methodology", y="Worth", hue="Knowledge Measurement", information=df_experiments, palette="muted")
# Add labels and title
plt.xlabel("Methodology")
plt.ylabel("Absolute error share")
plt.title("Artificial Controls - Comparability of Strategies Throughout Completely different Knowledge Sizes")
plt.legend(title="Knowledge Measurement")
# Present the plot
plt.present()

The outcomes for the small dataset are actually attention-grabbing! As anticipated, regularisation helped enhance the causal influence estimates. The artificial management then took it one step additional!

The outcomes of the massive dataset counsel that longer pre-intervention intervals aren’t at all times higher.

Nevertheless, the factor I would like you to remove is how beneficial finishing up a pre-intervention simulation is. There are such a lot of avenues you can discover with your individual dataset!

At the moment we explored the artificial management methodology and how one can validate the causal influence. I’ll go away you with a couple of ultimate ideas:

The simplicity of the artificial management methodology make it one of the vital extensively used approach from the causal AI toolbox.
Sadly it is usually essentially the most extensively abused — Lets run the R CausalImpact package deal, altering the pre-intervention interval till we see an uplift we like. 😭
That is the place I extremely suggest operating pre-intervention simulations to agree take a look at design upfront.
Artificial management methodology is a closely researched space. It’s price trying out the proposed adaptions Augmented SC, Sturdy SC and Penalized SC.

Alberto Abadie, Alexis Diamond & Jens Hainmueller (2010) Artificial Management Strategies for Comparative Case Research: Estimating the Impact of California’s Tobacco Management Program, Journal of the American Statistical Affiliation, 105:490, 493–505, DOI: 10.1198/jasa.2009.ap08746