Matching

MatchIt.matchit — Function

matchit(df::DataFrame, f::FormulaTerm; dist::String = "glm", link::GLM.Link01=LogitLink(), exact=[], maxDist::Float64=Inf,replacement::Bool=true,order::String="data",tolerance::Float64=-1.)

Performs propensity score or Mahalanobis distance matching. Also allows for exact matching on certain variables.

Arguments:

df::DataFrame
f::FormulaTerm::The formula used in the GLM regression. (use @formula() to create it)
dist::String: Distance measure to be used. One of "glm" (default) or "mahalanobis" ("mahalanobis" is still very experimental and should probably not be used)
link::GLM.Link01: The link function used for the regression. Either LogitLink() (default) or ProbitLink()
exact::Vector: The names of the variables that should be exactly matched. Defaults to no variables ([])
maxDist: The maximum distance between matched treatment and control observations to be included in the sample (defaults to Inf)
replacement::Bool: Whether the matching should occure with or without replacement. Without replacement is much slower and the result depends on the row order. (defaults to true)
order::String: In which order should the matching without replacement take place (defaults to "data"). Allows for:
- "data": Follows the row order of the data
- "smallest": Matches the smalles distances/propensity scores first
- "largest": Matches the largest distances/propensity scores first
- "random": Matches in a randomised order
tolerance::Float64: In matching with replacement this value defines the minimum distance that is good enough such that the search for better matches is halted. (defaults to -1 meaning that all control observations are compared to each treatment observation)

Output:

Returns a MatchedIt struct. It contains:

df: the original DataFrame
matched: the matched DataFrame
link: the Link used (either LogitLink() or ProbitLink())
f: the formula used in the matching process
T: the name of the treatment indicator variable
dist_type: one of glm or mahalanobis

Example

Simple nearest neighbor PSM matching with replacement.

julia> m = matchit(input_data, @formula(treat ~ age + educ + race + nodegree + re74 + re75))

A MatchIt object:
Treatmeant variable: treat
Matching formula: treat ~ age + educ + race + nodegree + re74 + re75
Matched by: GLM
Link: LogitLink
Number of observations: 614 (original), 370 (matched)

Returning the matched sample:

julia> m.matched
370×12 DataFrame
 Row │ Column1  treat  age    educ   race     married  nodegree  re74      re75        ⋯
     │ String7  Int64  Int64  Int64  String7  Int64    Int64     Float64   Float64     ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────
   1 │ PSID368      0     40     11  black          1         1      0.0       0.0     ⋯
   2 │ PSID202      0     20      9  hispan         1         1      0.0    1283.66     
   3 │ PSID293      0     31     12  black          1         0      0.0      42.9677   
   4 │ PSID196      0     18     11  black          0         1      0.0    1367.81     
  ⋮  │    ⋮       ⋮      ⋮      ⋮       ⋮        ⋮        ⋮         ⋮          ⋮       ⋱
 367 │ NSW182       1     25     14  black          1         0  35040.1   11536.6     ⋯
 368 │ NSW183       1     35      9  black          1         1  13602.4   13830.6      
 369 │ NSW184       1     35      8  black          1         1  13732.1   17976.2      
 370 │ NSW185       1     33     11  black          1         1  14660.7   25142.2      
                                                          3 columns and 362 rows omitted

source

View the matching summary:

Base.summary — Method

Base.summary(obj::MatchedIt,test::Bool=false,pre::Bool=false)

Gives a summary output for the matched sample. The output includes:

A summary of the number of observations matched
A table of the means of all variables in the original and matched dataframe split into treatment and control group.
If test=true a p-value of a 2 sample Welch test is reported for each variable used in the matching.

Arguments:

obj::MatchedIt: The output from a call to matchit
test::Bool: Whether a difference in mean test (2 sample Welch test) should be performed between treatment and control observations. (defaults to false)
pre::Bool: Whether to also show the output of the means and T-tests for the sample before matching.

Example

julia> m = matchit(input_data, @formula(treat ~ age + educ + race + nodegree + re74 + re75));

julia> summary(m,true,true)

Match Summary
═════════════════════════════════════════════

Sampel Sizes:
       │    All    Matched    Unmatched
───────│───────────────────────────────
Treat  │    185    185        0
Control│    429    185        244



Summary of Balance for All Data:
Var     │    Mean1        Mean2        P_Value
────────│─────────────────────────────────────
treat   │    0.0          1.0          missing
age     │    28.0303      25.8162      0.0029
educ    │    10.2354      10.3459      0.5848
married │    0.5128       0.1892       0.0
nodegree│    0.5967       0.7081       0.007
re74    │    5619.2365    2095.5737    0.0
re75    │    2466.4844    1532.0553    0.0012
re78    │    6984.1697    6349.1435    0.3491
Dist    │    0.185        0.5711       0.0


Summary of Balance for Matched Data:
Var     │    Mean1        Mean2        P_Value
────────│─────────────────────────────────────
treat   │    0.0          1.0          missing
age     │    24.5622      25.8162      0.1537 
educ    │    10.1351      10.3459      0.3418
married │    0.2919       0.1892       0.0208
nodegree│    0.7568       0.7081       0.2918
re74    │    1948.422     2095.5737    0.7488
re75    │    1819.4709    1532.0553    0.3913
re78    │    6458.5599    6349.1435    0.8863
Dist    │    0.5714       0.5711       0.9905

source

Propensity Score Balance Plot:

MatchIt.balance_plot — Function

balance_plot(df::MatchedIt,barmode="overlay",logscale=false)

Creates a plot with two histograms showing the balance of propensity scores before and after matching.

Arguments:

df::MatchedIt: The output of a function call to matchit
barmode: whehter the bars of the histogram should be overlayed ("overlay").
logscale::Bool: whether the yaxis should be printed on a log-scale (defaults to false)

Output

A PlotlyJS.SyncPlot

source