All Functions
Base.summary
— FunctionBase.summary(obj::MatchedIt,test::Bool=false,pre::Bool=false)
Gives a summary output for the matched sample. The output includes:
- A summary of the number of observations matched
- A table of the means of all variables in the original and matched dataframe split into treatment and control group.
- If
test=true
a p-value of a 2 sample Welch test is reported for each variable used in the matching.
Arguments:
- obj
::MatchedIt
: The output from a call tomatchit
- test
::Bool
: Whether a difference in mean test (2 sample Welch test) should be performed between treatment and control observations. (defaults tofalse
) - pre
::Bool
: Whether to also show the output of the means and T-tests for the sample before matching.
Example
julia> m = matchit(input_data, @formula(treat ~ age + educ + race + nodegree + re74 + re75));
julia> summary(m,true,true)
Match Summary
═════════════════════════════════════════════
Sampel Sizes:
│ All Matched Unmatched
───────│───────────────────────────────
Treat │ 185 185 0
Control│ 429 185 244
Summary of Balance for All Data:
Var │ Mean1 Mean2 P_Value
────────│─────────────────────────────────────
treat │ 0.0 1.0 missing
age │ 28.0303 25.8162 0.0029
educ │ 10.2354 10.3459 0.5848
married │ 0.5128 0.1892 0.0
nodegree│ 0.5967 0.7081 0.007
re74 │ 5619.2365 2095.5737 0.0
re75 │ 2466.4844 1532.0553 0.0012
re78 │ 6984.1697 6349.1435 0.3491
Dist │ 0.185 0.5711 0.0
Summary of Balance for Matched Data:
Var │ Mean1 Mean2 P_Value
────────│─────────────────────────────────────
treat │ 0.0 1.0 missing
age │ 24.5622 25.8162 0.1537
educ │ 10.1351 10.3459 0.3418
married │ 0.2919 0.1892 0.0208
nodegree│ 0.7568 0.7081 0.2918
re74 │ 1948.422 2095.5737 0.7488
re75 │ 1819.4709 1532.0553 0.3913
re78 │ 6458.5599 6349.1435 0.8863
Dist │ 0.5714 0.5711 0.9905
MatchIt.NearestNeighbor
— FunctionNearestNeighbour(data::DataFrame,T::String,exact=String[],maxDist::Float64=Inf, replacement::Bool=true,order::String="data", tolerance::Float64=-1.)
Performs Nearest Neighbor matching. Allows for exact matches on certain variables before performing the nearest neighbor search.
Arguments:
- data
::DataFrame
: Needs to contain a treatment variable and the propensity score (:Dist
) (optionally the variables that need to be exactly matched) - T
::String
: The name of the treatment indicator (0 or 1) - exact::
Vector{String}
: A vector of the variable names used for exact matching. - maxDist
::Float64
: The maximum allowed distance between nearest neighbors (defaults toInf
) - replacement
::Bool
: Whether the matching should occure with or without replacement. Without replacement is much slower and the result depends on the row order. (defaults totrue
) - order
::String
: In which order should the matching without replacement take place (defaults to"data"
). Allows for:"data"
: Follows the row order of the data"smallest"
: Matches the smalles distances/propensity scores first"largest"
: Matches the largest distances/propensity scores first"random"
: Matches in a randomised order
- tolerance
::Float64
: In matching with replacement it defines the minimum distance that is good enough and no better matches are searched for. (defaults to-1
meaning that all control observations are compared to each treatment observation)
Output:
A DataFrame
containg the matched sample. Includes all variables from the provided DataFrame plus the distance metric :Dist
MatchIt.balance_plot
— Functionbalance_plot(df::MatchedIt,barmode="overlay",logscale=false)
Creates a plot with two histograms showing the balance of propensity scores before and after matching.
Arguments:
- df
::MatchedIt
: The output of a function call tomatchit
- barmode: whehter the bars of the histogram should be overlayed (
"overlay"
). - logscale
::Bool
: whether the yaxis should be printed on a log-scale (defaults tofalse
)
Output
A PlotlyJS.SyncPlot
MatchIt.get_propensities!
— Methodget_propensities!(data::DataFrame,f::FormulaTerm,link::GLM.Link01)
Calculates the Propensity Scores and returns a DataFrame
with a new column :Dist
.
Arguments:
- data
::DataFrame
: The DataFrame containg the data. This should have no missing values. Rows with missing values will be deleted. - f
::FormulaTerm
: A formula specifying the functional form of the regression (use@formual
to create it) - link
::GLM.Link01
: The link function used. One ofLogitLink()
orProbitLink()
Output
DataFrame
containing the original data without missing values that has a new column (:Dist
) with the propensity scores.
MatchIt.get_propensities
— Methodget_propensities(data::DataFrame,f::FormulaTerm,link::GLM.Link01)
Calculates the Propensity Scores and returns a DataFrame
with a new column :Dist
.
Arguments:
- data
::DataFrame
: The DataFrame containg the data. This should have no missing values. Rows with missing values will be deleted. - f
::FormulaTerm
: A formula specifying the functional form of the regression (use@formual
to create it) - link
::GLM.Link01
: The link function used. One ofLogitLink()
orProbitLink()
Output
DataFrame
containing the original data without missing values that has a new column (:Dist
) with the propensity scores.
MatchIt.matchit
— Methodmatchit(df::DataFrame, f::FormulaTerm; dist::String = "glm", link::GLM.Link01=LogitLink(), exact=[], maxDist::Float64=Inf,replacement::Bool=true,order::String="data",tolerance::Float64=-1.)
Performs propensity score or Mahalanobis distance matching. Also allows for exact matching on certain variables.
Arguments:
- df
::DataFrame
- f
::FormulaTerm
::The formula used in the GLM regression. (use @formula() to create it) - dist
::String
: Distance measure to be used. One of "glm" (default) or "mahalanobis" ("mahalanobis" is still very experimental and should probably not be used) - link
::GLM.Link01
: The link function used for the regression. EitherLogitLink()
(default) orProbitLink()
- exact
::Vector
: The names of the variables that should be exactly matched. Defaults to no variables ([]
) - maxDist: The maximum distance between matched treatment and control observations to be included in the sample (defaults to
Inf
) - replacement
::Bool
: Whether the matching should occure with or without replacement. Without replacement is much slower and the result depends on the row order. (defaults totrue
) - order
::String
: In which order should the matching without replacement take place (defaults to"data"
). Allows for:"data"
: Follows the row order of the data"smallest"
: Matches the smalles distances/propensity scores first"largest"
: Matches the largest distances/propensity scores first"random"
: Matches in a randomised order
- tolerance
::Float64
: In matching with replacement this value defines the minimum distance that is good enough such that the search for better matches is halted. (defaults to-1
meaning that all control observations are compared to each treatment observation)
Output:
Returns a MatchedIt
struct. It contains:
df
: the originalDataFrame
matched
: the matchedDataFrame
link
: the Link used (eitherLogitLink()
orProbitLink()
)f
: the formula used in the matching processT
: the name of the treatment indicator variabledist_type
: one of glm or mahalanobis
Example
Simple nearest neighbor PSM matching with replacement.
julia> m = matchit(input_data, @formula(treat ~ age + educ + race + nodegree + re74 + re75))
A MatchIt object:
Treatmeant variable: treat
Matching formula: treat ~ age + educ + race + nodegree + re74 + re75
Matched by: GLM
Link: LogitLink
Number of observations: 614 (original), 370 (matched)
Returning the matched sample:
julia> m.matched
370×12 DataFrame
Row │ Column1 treat age educ race married nodegree re74 re75 ⋯
│ String7 Int64 Int64 Int64 String7 Int64 Int64 Float64 Float64 ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────
1 │ PSID368 0 40 11 black 1 1 0.0 0.0 ⋯
2 │ PSID202 0 20 9 hispan 1 1 0.0 1283.66
3 │ PSID293 0 31 12 black 1 0 0.0 42.9677
4 │ PSID196 0 18 11 black 0 1 0.0 1367.81
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
367 │ NSW182 1 25 14 black 1 0 35040.1 11536.6 ⋯
368 │ NSW183 1 35 9 black 1 1 13602.4 13830.6
369 │ NSW184 1 35 8 black 1 1 13732.1 17976.2
370 │ NSW185 1 33 11 black 1 1 14660.7 25142.2
3 columns and 362 rows omitted