This article is based on "Introduction to Effectiveness Verification-Causal Reasoning for Correct Comparison / Basics of Econometrics". This is a summary of what I learned and felt by reading "Chapter 3 Analysis Using Propensity Scores".

Basics

What is a propensity score?

-** Observed ** Probability that the covariate $ X $ will be assigned to the intervention group under the given conditions $ e (X) = P (Z = 1 | X) $ [^ px] --Not all covariates are observed

[^ px]: In this book, the propensity score is written as $ P (X) $, but for convenience, it is set as $ e (X) $.

Regression analysis v.s. Propensity score [^ ra-vs-ps]

[^ ra-vs-ps]: It's a strange comparison, but it can be read like this from this book.

--Regression analysis assumptions: - CIA(Conditional Independence Assumption): \\{Y^{(1)}, Y^{(0)}\\}\perp Z|X --Outcome and covariate linearity (omitted in this book?) --Propensity score assumption --CIA version: $ \ {Y ^ {(1)}, Y ^ {(0)} \} \ perp Z | e (X) $ --I think it's the same thing to explain the allocation with the covariate $ X $ and the propensity score $ e (X) $ to balance the covariates [^ balancing-score] -** SUTVA ** (Stable Unit Treatment Value Assumption) [^ sutva-krsk-phs] [^ sutva-pira-nino] [^ sutva-in-ra](Omitted in this book?)

CIA v.s. Strongly negligible allocation

--CIA is used in this book, but strictly speaking, ** strongly negligible assignment ** is correct [^ rosenbaum_1983] --A strongly negligible allocation adds the following two assumptions to the CIA: --The covariate $ X $ is only observed - 0 ――In short, I think that it is ** quite important ** because it is NG if there is no possibility that any sample will be assigned to both the intervention group and the non-intervention group. --For example, if all men are assigned to the intervention group, $ P (Z = 1 | men) = 1 $, so NG

Propensity score estimation

--Logistic regression is often used to estimate propensity scores, but machine learning is also OK. --The model includes not only covariates but also other variables that affect outcomes [^ tsugawa-ps2] --Probability calibration is required when using machine learning [^ rmizutaa] --Random forest and SVM? --On the other hand, if the loss function is logloss, it converges to the probability, so calibration is unnecessary [^ calibration-unnecessary]

Estimating intervention effect using propensity score

ATT v.s. ATE

ATT(Average Treatment effect on Treated):
- E[Y^{(1)}-Y^{(0)}|Z=1] --Expected value of intervention effect in the sample that received the intervention --In this book, it is calculated by matching
ATE(Average Treatment Effect):
- E[Y^{(1)}-Y^{(0)}] --Expected value of intervention effect for the entire sample --In this book, it is calculated by IPW

Matching v.s. IPW [^ matching-vs-ipw]

[^ matching-vs-ipw]: There are other methods for estimating the effect of intervention using propensity scores other than matching and IPW, but this book deals with these two.

-** Matching **: --A pair of samples with similar propensity score values are taken from the intervention group and the non-intervention group, and one missing measure (counterfact) is complemented by the other.

IPW(Inverse Probability Weighting): --Weight the sample with the reciprocal of the propensity score to get closer to the distribution of the population --IPW does not consider common support (described later) and should not be used? [^ tsugawa-ps2] ――I think you can use it if you consider common support

ATT in matching

--Matching samples with similar propensity score values to the samples in the intervention group from the non-intervention group, and letting the average pair difference be ATT.

ATE in matching

--Similarly for samples in the non-intervention group, the average of the pair difference is ATE.

ATT in IPW

--Calculated by the following formula

\frac{\sum_{i=1}^{N}Z_{i}Y_{i}}{\sum_{i=1}^{N}Z_{i}}
-\frac{\sum_{i=1}^{N}\frac{(1-Z_{i})\hat{e}(X_{i})}{1-\hat{e}(X_{i})}Y_{i}}{\sum_{i=1}^{N}\frac{(1-Z_{i})\hat{e}(X_{i})}{1-\hat{e}(X_{i})}}

ATE in IPW

--Calculated by the following formula

\frac{\sum_{i=1}^{N}\frac{Z_{i}}{\hat{e}(X_{i})}Y_{i}}{\sum_{i=1}^{N}\frac{Z_{i}}{\hat{e}(X_{i})}}
-\frac{\sum_{i=1}^{N}\frac{1-Z_{i}}{1-\hat{e}(X_{i})}Y_{i}}{\sum_{i=1}^{N}\frac{1-Z_{i}}{1-\hat{e}(X_{i})}}

About common support

--Common support is the range in which the distribution of propensity scores of the intervention group and the non-intervention group overlap. --If you use a sample that does not support common support in the calculation of IPW, the reciprocal (weight) of the propensity score becomes too large to estimate well [^ tsugawa-ps1] -Is "out of common support" and "weight too large" equivalent? --Is it common to exclude samples that are not in common support? --This book introduces property score trim mig / clipping ――Isn't it possible to create a new bias by excluding non-common support? ――So it is important to design the experimental plan so that you can observe two comparable groups properly so that you do not have to exclude it ** ――The bias generated on that should be adjusted by the tendency score etc. ――Since there is no sample that can complement the counterfacts outside of common support, isn't it not satisfying the “strongly negligible allocation” in the first place? - P(Z=1|X)=0OrP(Z=1|X)=1Not because I'm using logistic regression

Evaluation of propensity score

――It is important whether the distribution of covariates in the intervention group and the non-intervention group is balanced between the two groups by the propensity score. -That is, make assumptions\\{Y^{(1)}, Y^{(0)}\\}\perp Z|e(X)From\\{Y^{(1)}, Y^{(0)}\\}\perp Z|XI think it means to return to --In this book, we evaluate by standardized mean difference (ASAM). --OK if ASAM is 0.1 or less (why?) ――If the old AUC is 0.8 or more, is the OK theory nonsense? [^ ps-auc] --AUC is OK if it is 0.8 or more [^ iwanami-ds3] --What does it mean that the AUC is small (close to 0.5): --Allocation does not depend on covariates ($ Z \ perp X $) → No need to adjust with propensity score → Does it make sense if AUC is not large to some extent? --What does it mean to have a large AUC (close to 1): ――The common support is small → The intervention effect cannot be estimated → Isn't it good if the AUC is large?

Practical edition

MineThatData E-Mail Analytics And Data Mining Challenge dataset^[1]

--Will be added at a later date

LaLonde(1986)

--Refer to this manual for the explanation of the data. --Refer to here for the explanation of the paper.

Data read

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

df_cps1 = pd.read_stata('https://users.nber.org/~rdehejia/data/cps_controls.dta')
df_cps3 = pd.read_stata('https://users.nber.org/~rdehejia/data/cps_controls3.dta')
df_nsw = pd.read_stata('https://users.nber.org/~rdehejia/data/nsw_dw.dta')

Data set creation

--NSW: RCT dataset --CPS1 + NSW: Data set with NSW non-intervention group replaced by CPS1 --CPS3 + NSW: Data set with NSW non-intervention group replaced by CPS3

treat_label = {0: 'untreat', 1: 'treat'}
X_columns = ['age', 'education', 'black', 'hispanic', 'married', 'nodegree', 're74', 're75']
y_column = 're78'
z_column = 'treat'

CPS1+NSW

df_cps1_nsw = pd.concat([df_nsw[df_nsw[z_column] == 1], df_cps1], ignore_index=True)

CPS3+NSW

df_cps3_nsw = pd.concat([df_nsw[df_nsw[z_column] == 1], df_cps3], ignore_index=True)

Confirm distribution of intervention group and non-intervention group

NSW

--The data is well-balanced because it only sings RCT.

df_nsw[z_column].map(treat_label).value_counts().plot.bar(title='# of treatment in NSW')

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_nsw[c].groupby(df_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

CPS1+NSW

――It's quite unbalanced data

df_cps1_nsw[z_column].map(treat_label).value_counts().plot.bar(title='# of treatment in CPS1+NSW')

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps1_nsw[c].groupby(df_cps1_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

CPS3+NSW

--Not as much as NSW, but more balanced data than CPS1 + NSW

df_cps3_nsw[z_column].map(treat_label).value_counts().plot.bar(title='# of treatment in CPS3+NSW')

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps3_nsw[c].groupby(df_cps3_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

regression analysis

import statsmodels.api as sm

NSW

results = sm.OLS(df_nsw[y_column], sm.add_constant(df_nsw[[z_column] + X_columns])).fit()
results.summary().tables[1]

CPS1+NSW

results = sm.OLS(df_cps1_nsw[y_column], sm.add_constant(df_cps1_nsw[[z_column] + X_columns])).fit()
results.summary().tables[1]

CPS3+NSW

results = sm.OLS(df_cps3_nsw[y_column], sm.add_constant(df_cps3_nsw[[z_column] + X_columns])).fit()
results.summary().tables[1]

result

data set	Intervention effect[$]
NSW	1676.3
CPS1+NSW	699.1
CPS3+NSW	1548.2

Game Start

--NSW intervention effect is about $ 1600 --Challenge how close you can get to $ 1600 using propensity score from biased state

Propensity score estimation

--Where did ʻI (re74 ^ 2) and ʻI (re75 ^ 2) in this book come from?

CPS1+NSW

--Estimate propensity score by logistic regression

ps_model = sm.Logit(df_cps1_nsw[z_column], sm.add_constant(df_cps1_nsw[X_columns])).fit()
ps_model.summary().tables[1]

df_cps1_nsw['ps'] = 1 / (1 + np.exp(-ps_model.fittedvalues))

--Check the distribution of propensity scores

(df_cps1_nsw.groupby(df_cps1_nsw[z_column].map(treat_label))['ps']
 .plot.hist(density=True, alpha=0.5, title=f'propensity score densities', legend=True))

--Calculate AUC

from sklearn.metrics import roc_auc_score
roc_auc_score(df_cps1_nsw[z_column], df_cps1_nsw['ps'])

0.9708918648513447

CPS3+NSW

--Estimate propensity score by logistic regression

ps_model = sm.Logit(df_cps3_nsw[z_column], sm.add_constant(df_cps3_nsw[X_columns])).fit()
ps_model.summary().tables[1]

df_cps3_nsw['ps'] = 1 / (1 + np.exp(-ps_model.fittedvalues))

--Check the distribution of propensity scores

(df_cps3_nsw.groupby(df_cps3_nsw[z_column].map(treat_label))['ps']
 .plot.hist(density=True, alpha=0.5, title=f'propensity score densities', legend=True))

--Calculate AUC

from sklearn.metrics import roc_auc_score
roc_auc_score(df_cps3_nsw[z_column], df_cps3_nsw['ps'])

0.8742266742266742

ATT estimation

CPS1+NSW

matching

--Define a function (recursive generator) to make a pair -Here was used as a reference.

from sklearn.neighbors import NearestNeighbors
def pairs_generator(s0: pd.Series, s1: pd.Series, threshold: np.float) -> pd.DataFrame:
    """
A recursive generator that creates pairs using the K-nearest neighbor method.
    
    Parameters
    ----------
    s0, s1 : pd.Series
    threshold : np.float
The threshold for the distance to pair. If even the nearest neighbor points exceed the threshold value, they will not be paired.
    
    Returns
    -------
    pairs : pd.DataFrame
Index of a pair made up of the s0 sample and the closest s1 sample.
Column l is the index of s0 and column r is the index of s1.
    
    Notes
    -----
The same sample is never used for multiple pairs.
    
    Examples
    --------
    s0 = df[df[z_column] == 0]['ps']
    s1 = df[df[z_column] == 1]['ps']
    threshold = df['ps'].std() * 0.1
    pairs = pd.concat(pairs_generator(s1, s0, threshold), ignore_index=True)
    """
    neigh_dist, neigh_ind = NearestNeighbors(1).fit(s0.values.reshape(-1, 1)).kneighbors(s1.values.reshape(-1, 1))
    pairs = pd.DataFrame(
        {'d': neigh_dist[:, 0], 'l': s0.iloc[neigh_ind[:, 0]].index},
        index=s1.index,
    ).query(f'd < {threshold}').groupby('l')['d'].idxmin().rename('r').reset_index()
    if len(pairs) > 0:
        yield pairs
        yield from pairs_generator(s0.drop(pairs['l']), s1.drop(pairs['r']), threshold)

--Make a pair

s1 = df_cps1_nsw[df_cps1_nsw[z_column] == 1]['ps']
s0 = df_cps1_nsw[df_cps1_nsw[z_column] == 0]['ps']
threshold = df_cps1_nsw['ps'].std() * 0.1
pairs1_cps1_nsw = pd.concat(pairs_generator(s1, s0, threshold), ignore_index=True)
pairs0_cps1_nsw = pd.concat(pairs_generator(s0, s1, threshold), ignore_index=True) #Used in ATE calculation

--Confirm a pair of intervention groups --Orange is a discarded sample ――Originally, it is desirable to make a pair so that the intersection as shown in the figure does not occur (I think)

ax = df_cps1_nsw.loc[pairs1_cps1_nsw.unstack()].plot.scatter(x='ps', y=z_column, label='matched', alpha=0.3, figsize=(16, 4), title='pairs on treated')
df_cps1_nsw.loc[~df_cps1_nsw.index.isin(pairs1_cps1_nsw.unstack())].plot.scatter(x='ps', y=z_column, label='unmatched', alpha=0.3, c='tab:orange', ax=ax)
for x in zip(df_cps1_nsw['ps'].iloc[pairs1_cps1_nsw['l']].values, df_cps1_nsw['ps'].iloc[pairs1_cps1_nsw['r']].values):
    plt.plot(x, [1, 0], c='tab:blue', alpha=0.3)

--Calculate ATT --Standard deviation is quite large

(pairs1_cps1_nsw['l'].map(df_cps1_nsw[y_column]) - pairs1_cps1_nsw['r'].map(df_cps1_nsw[y_column])).agg(['mean', 'std'])

mean    1929.083008
std     9618.346680
dtype: float64

IPW

--Calculate weight

df_cps1_nsw['w'] = df_cps1_nsw[z_column] / df_cps1_nsw['ps'] + (1 - df_cps1_nsw[z_column]) / (1 - df_cps1_nsw['ps'])

--Estimate ATT with weighted linear regression

att_model = sm.WLS(df_cps1_nsw[y_column], df_cps1_nsw[[z_column]].assign(untreat=1-df_cps1_nsw[z_column]), weights=df_cps1_nsw['w'] * df_cps1_nsw['ps']).fit()
att_model.summary().tables[1]

att_model.params['treat'] - att_model.params['untreat']

1180.4078220960955

IPW (common support)

--Extract samples in the range where the distributions of propensity scores overlap

df_cps1_nsw_cs = df_cps1_nsw[df_cps1_nsw['ps'].between(*df_cps1_nsw.groupby(z_column)['ps'].agg(['min', 'max']).agg({'min': 'max', 'max': 'min'}))].copy()

--Estimate ATT with weighted linear regression

att_model = sm.WLS(df_cps1_nsw_cs[y_column], df_cps1_nsw_cs[[z_column]].assign(untreat=1-df_cps1_nsw_cs[z_column]), weights=df_cps1_nsw_cs['w'] * df_cps1_nsw_cs['ps']).fit()
att_model.summary().tables[1]

att_model.params['treat'] - att_model.params['untreat']

1243.8640545001808

--Confirm standardized mean difference between two groups of covariates ――After adjustment, it is almost within the red dotted line. --Does it match the graph in this book? (Especially black before adjustment)

pd.DataFrame([{
    'unadjusted': np.subtract(*df_cps1_nsw.groupby(z_column)[c].mean()) / df_cps1_nsw[c].std(),
    'adjusted(matching)': np.subtract(*df_cps1_nsw.loc[pairs1_cps1_nsw.unstack()].groupby(z_column)[c].mean()) / df_cps1_nsw[c].std(),
    'adjusted(weighting)': np.subtract(*df_cps1_nsw.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w'] * x['ps']))) / df_cps1_nsw[c].std(),
    'adjusted(weighting in common support)': np.subtract(*df_cps1_nsw_cs.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w'] * x['ps']))) / df_cps1_nsw_cs[c].std(),
} for c in X_columns], index=X_columns).abs().plot(marker='o', alpha=0.5, grid=True, ylim=(0, 100), logy=True, title='covariates balance')
plt.axhline(y=0.1, linestyle='--', c='r')

--Confirm the distribution of covariates in matching --The average does not tell the truth, so look at the distribution ――Balancing is done properly compared to the original distribution

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps1_nsw.loc[pairs1_cps1_nsw.unstack()][c].groupby(df_cps1_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW ――Balancing is done properly compared to the original distribution

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps1_nsw.sample(1000000, replace=True, weights=df_cps1_nsw['w'] * df_cps1_nsw['ps'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps1_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW (common support) ――Balancing is done properly compared to the original distribution

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps1_nsw_cs.sample(1000000, replace=True, weights=df_cps1_nsw_cs['w'] * df_cps1_nsw_cs['ps'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps1_nsw_cs[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

CPS3+NSW

matching

--Make a pair

s1 = df_cps3_nsw[df_cps3_nsw[z_column] == 1]['ps']
s0 = df_cps3_nsw[df_cps3_nsw[z_column] == 0]['ps']
threshold = df_cps3_nsw['ps'].std() * 0.1
pairs1_cps3_nsw = pd.concat(pairs_generator(s1, s0, threshold), ignore_index=True)
pairs0_cps3_nsw = pd.concat(pairs_generator(s0, s1, threshold), ignore_index=True) #Used in ATE calculation

--Confirm a pair of intervention groups

ax = df_cps3_nsw.loc[pairs1_cps3_nsw.unstack()].plot.scatter(x='ps', y=z_column, label='matched', alpha=0.3, figsize=(16, 4), title='pairs on treated')
df_cps3_nsw.loc[~df_cps3_nsw.index.isin(pairs1_cps3_nsw.unstack())].plot.scatter(x='ps', y=z_column, label='unmatched', alpha=0.3, c='tab:orange', ax=ax)
for x in zip(df_cps3_nsw['ps'].iloc[pairs1_cps3_nsw['l']].values, df_cps3_nsw['ps'].iloc[pairs1_cps3_nsw['r']].values):
    plt.plot(x, [1, 0], c='tab:blue', alpha=0.3)

--Calculate ATT

(pairs1_cps3_nsw['l'].map(df_cps3_nsw[y_column]) - pairs1_cps3_nsw['r'].map(df_cps3_nsw[y_column])).agg(['mean', 'std'])

mean    1463.115845
std     8779.598633
dtype: float64

IPW

--Calculate weight

df_cps3_nsw['w'] = df_cps3_nsw[z_column] / df_cps3_nsw['ps'] + (1 - df_cps3_nsw[z_column]) / (1 - df_cps3_nsw['ps'])

--Estimate ATT with weighted linear regression

att_model = sm.WLS(df_cps3_nsw[y_column], df_cps3_nsw[[z_column]].assign(untreat=1-df_cps3_nsw[z_column]), weights=df_cps3_nsw['w'] * df_cps3_nsw['ps']).fit()
att_model.summary().tables[1]

att_model.params['treat'] - att_model.params['untreat']

1214.0711733143507

IPW (common support)

--Extract samples in the range where the distributions of propensity scores overlap

df_cps3_nsw_cs = df_cps3_nsw[df_cps3_nsw['ps'].between(*df_cps3_nsw.groupby(z_column)['ps'].agg(['min', 'max']).agg({'min': 'max', 'max': 'min'}))].copy()

--Estimate ATT with weighted linear regression

att_model = sm.WLS(df_cps3_nsw_cs[y_column], df_cps3_nsw_cs[[z_column]].assign(untreat=1-df_cps3_nsw_cs[z_column]), weights=df_cps3_nsw_cs['w'] * df_cps3_nsw_cs['ps']).fit()
att_model.summary().tables[1]

att_model.params['treat'] - att_model.params['untreat']

988.8657151185471

--Confirm standardized mean difference between two groups of covariates

pd.DataFrame([{
    'unadjusted': np.subtract(*df_cps3_nsw.groupby(z_column)[c].mean()) / df_cps3_nsw[c].std(),
    'adjusted(matching)': np.subtract(*df_cps3_nsw.loc[pairs1_cps3_nsw.unstack()].groupby(z_column)[c].mean()) / df_cps3_nsw[c].std(),
    'adjusted(weighting)': np.subtract(*df_cps3_nsw.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w'] * x['ps']))) / df_cps3_nsw[c].std(),
    'adjusted(weighting in common support)': np.subtract(*df_cps3_nsw_cs.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w'] * x['ps']))) / df_cps3_nsw_cs[c].std(),
} for c in X_columns], index=X_columns).abs().plot(marker='o', alpha=0.5, grid=True, ylim=(0, 100), logy=True, title='covariates balance')
plt.axhline(y=0.1, linestyle='--', c='r')

--Confirm the distribution of covariates in matching

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps3_nsw.loc[pairs1_cps3_nsw.unstack()][c].groupby(df_cps3_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps3_nsw.sample(1000000, replace=True, weights=df_cps3_nsw['w'] * df_cps3_nsw['ps'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps3_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW (common support)

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps3_nsw_cs.sample(1000000, replace=True, weights=df_cps3_nsw_cs['w'] * df_cps3_nsw_cs['ps'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps3_nsw_cs[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

result

--The covariates are balanced, but the estimated intervention effect seems to be quite different from $ 1600.

data set	matching	IPW	IPW(Common support)
CPS1+NSW	1929.1	1180.4	1243.9
CPS3+NSW	1463.1	1214.1	988.9

ATE estimation

CPS1+NSW

matching

--Check the pair

pairs_cps1_nsw = pd.concat([pairs1_cps1_nsw, pairs0_cps1_nsw.rename({'l': 'r', 'r': 'l'}, axis=1)], ignore_index=True)

ax = df_cps1_nsw.loc[pairs_cps1_nsw.unstack()].plot.scatter(x='ps', y=z_column, label='matched', alpha=0.3, figsize=(16, 4), title='pairs')
df_cps1_nsw.loc[~df_cps1_nsw.index.isin(pairs_cps1_nsw.unstack())].plot.scatter(x='ps', y=z_column, label='unmatched', alpha=0.3, c='tab:orange', ax=ax)
for x in zip(df_cps1_nsw['ps'].iloc[pairs_cps1_nsw['l']].values, df_cps1_nsw['ps'].iloc[pairs_cps1_nsw['r']].values):
    plt.plot(x, [1, 0], c='tab:blue', alpha=0.3)

--Calculate ATE

(pairs_cps1_nsw['l'].map(df_cps1_nsw[y_column]) - pairs_cps1_nsw['r'].map(df_cps1_nsw[y_column])).agg(['mean', 'std'])

mean    1836.516968
std     9634.636719
dtype: float64

IPW

--Estimate ATE with weighted linear regression

ate_model = sm.WLS(df_cps1_nsw[y_column], df_cps1_nsw[[z_column]].assign(untreat=1-df_cps1_nsw[z_column]), weights=df_cps1_nsw['w']).fit()
ate_model.summary().tables[1]

ate_model.params['treat'] - ate_model.params['untreat']

-6456.300804768925

IPW (common support)

--Estimate ATE with weighted linear regression

ate_model = sm.WLS(df_cps1_nsw_cs[y_column], df_cps1_nsw_cs[[z_column]].assign(untreat=1-df_cps1_nsw_cs[z_column]), weights=df_cps1_nsw_cs['w']).fit()
ate_model.summary().tables[1]

ate_model.params['treat'] - ate_model.params['untreat']

545.3149727568589

--Confirm standardized mean difference between two groups of covariates

pd.DataFrame([{
    'unadjusted': np.subtract(*df_cps1_nsw.groupby(z_column)[c].mean()) / df_cps1_nsw[c].std(),
    'adjusted(matching)': np.subtract(*df_cps1_nsw.loc[pairs_cps1_nsw.unstack()].groupby(z_column)[c].mean()) / df_cps1_nsw[c].std(),
    'adjusted(weighting)': np.subtract(*df_cps1_nsw.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w']))) / df_cps1_nsw[c].std(),
    'adjusted(weighting in common support)': np.subtract(*df_cps1_nsw_cs.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w']))) / df_cps1_nsw_cs[c].std(),
} for c in X_columns], index=X_columns).abs().plot(marker='o', alpha=0.5, grid=True, ylim=(0, 100), logy=True, title='covariates balance')
plt.axhline(y=0.1, linestyle='--', c='r')

--Confirm the distribution of covariates in matching

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps1_nsw.loc[pairs_cps1_nsw.unstack()][c].groupby(df_cps1_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps1_nsw.sample(1000000, replace=True, weights=df_cps1_nsw['w'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps1_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW (common support)

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps1_nsw_cs.sample(1000000, replace=True, weights=df_cps1_nsw_cs['w'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps1_nsw_cs[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

CPS3+NSW

matching

--Check the pair

pairs_cps3_nsw = pd.concat([pairs1_cps3_nsw, pairs0_cps3_nsw.rename({'l': 'r', 'r': 'l'}, axis=1)], ignore_index=True)

ax = df_cps3_nsw.loc[pairs_cps3_nsw.unstack()].plot.scatter(x='ps', y=z_column, label='matched', alpha=0.3, figsize=(16, 4), title='pairs')
df_cps3_nsw.loc[~df_cps3_nsw.index.isin(pairs_cps3_nsw.unstack())].plot.scatter(x='ps', y=z_column, label='unmatched', alpha=0.3, c='tab:orange', ax=ax)
for x in zip(df_cps3_nsw['ps'].iloc[pairs_cps3_nsw['l']].values, df_cps3_nsw['ps'].iloc[pairs_cps3_nsw['r']].values):
    plt.plot(x, [1, 0], c='tab:blue', alpha=0.3)

--Calculate ATE

(pairs_cps3_nsw['l'].map(df_cps3_nsw[y_column]) - pairs_cps3_nsw['r'].map(df_cps3_nsw[y_column])).agg(['mean', 'std'])

mean    1486.608276
std     8629.639648
dtype: float64

IPW

--Estimate ATE with weighted linear regression

ate_model = sm.WLS(df_cps3_nsw[y_column], df_cps3_nsw[[z_column]].assign(untreat=1-df_cps3_nsw[z_column]), weights=df_cps3_nsw['w']).fit()
ate_model.summary().tables[1]

ate_model.params['treat'] - ate_model.params['untreat']

224.6762634665365

IPW (common support)

--Estimate ATE with weighted linear regression

ate_model = sm.WLS(df_cps3_nsw_cs[y_column], df_cps3_nsw_cs[[z_column]].assign(untreat=1-df_cps3_nsw_cs[z_column]), weights=df_cps3_nsw_cs['w']).fit()
ate_model.summary().tables[1]

ate_model.params['treat'] - ate_model.params['untreat']

801.0736196669177

--Confirm standardized mean difference between two groups of covariates

pd.DataFrame([{
    'unadjusted': np.subtract(*df_cps3_nsw.groupby(z_column)[c].mean()) / df_cps3_nsw[c].std(),
    'adjusted(matching)': np.subtract(*df_cps3_nsw.loc[pairs_cps3_nsw.unstack()].groupby(z_column)[c].mean()) / df_cps3_nsw[c].std(),
    'adjusted(weighting)': np.subtract(*df_cps3_nsw.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w']))) / df_cps3_nsw[c].std(),
    'adjusted(weighting in common support)': np.subtract(*df_cps3_nsw_cs.groupby(z_column).apply(lambda x: np.average(x[c], weights=x['w']))) / df_cps3_nsw_cs[c].std(),
} for c in X_columns], index=X_columns).abs().plot(marker='o', alpha=0.5, grid=True, ylim=(0, 100), logy=True, title='covariates balance')
plt.axhline(y=0.1, linestyle='--', c='r')

--Confirm the distribution of covariates in matching

_, ax = plt.subplots(3, 3, figsize=(16, 12))
for i, c in enumerate(X_columns + [y_column]):
    (df_cps3_nsw.loc[pairs_cps3_nsw.unstack()][c].groupby(df_cps3_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps3_nsw.sample(1000000, replace=True, weights=df_cps3_nsw['w'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps3_nsw[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

--Confirm the distribution of covariates in IPW (common support)

_, ax = plt.subplots(3, 3, figsize=(16, 12))
df = df_cps3_nsw_cs.sample(1000000, replace=True, weights=df_cps3_nsw_cs['w'], random_state=2020)
for i, c in enumerate(X_columns + [y_column]):
    (df[c].groupby(df_cps3_nsw_cs[z_column].map(treat_label))
     .plot.hist(density=True, alpha=0.5, title=f'{c} densities', legend=True, ax=ax[i // 3, i % 3]))

result

data set	matching	IPW	IPW(Common support)
CPS1+NSW	1836.5	-6456.3	545.3
CPS3+NSW	1486.6	224.7	801.0

Impressions

--I can't get rid of the "bias" at all ...

https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html ↩︎

"Introduction to effect verification Chapter 3 Analysis using propensity score" + α is tried in Python

Basics

What is a propensity score?

Regression analysis v.s. Propensity score [^ ra-vs-ps]

CIA v.s. Strongly negligible allocation

Propensity score estimation

Estimating intervention effect using propensity score

Matching v.s. IPW [^ matching-vs-ipw]

ATT in matching

ATE in matching

About common support

Evaluation of propensity score

Practical edition

Data read

Data set creation

Confirm distribution of intervention group and non-intervention group

regression analysis

result

Game Start

Propensity score estimation

ATT estimation

matching

IPW (common support)

matching

IPW (common support)

result

ATE estimation

matching

IPW (common support)

matching

IPW (common support)

result

Impressions