## Statistical Methods – McNemar’s Test

What is McNemar’s Test?

The McNemar Chi-Square test is used in situations where data is not independent.

Specifically data is either matched (e.g. 2 people matched on similar characteristics, typically have different exposure – outcomes compared) or before-after data (e.g. one person measured before and after exposure – outcomes compared)

• It is a non-parametric test
• Tests the difference between paired proportions
• Uses RAW numbers

Formulae… How to calculate McNemar’s test

1. Data should be placed into a trusty 2×2 table
2. As this is paired data, we will be ignoring concordant cells (a and d)
3. Plug b + c numbers into the formulae above… et voila!

As with the Chi-Square test, the cut-off point for 95% significance level is 3.85 (which is 1.96 squared).  Therefore if the result is bigger than 3.85, we can say it is statistically significant at the 95% level (where p=0.05)

Here is a handy flow-chart to help decided which test to use.

…This was ‘borrowed’ from this useful website on commonly used statistical tests

## Statistical Methods – Chi-Square and 2×2 tables

Definition

When data is binary (i.e. exposure and outcome have only 2 options) the data can be plotted into a trusty 2×2 table (*they really are trusty – they pop up all over the place!)

The Chi-Square statistic (denoted x^2) is a non-parametric test which examines whether there is an association between 2 variables of a sample.  It determined if a distribution of observed frequencies differ from expected frequencies

• Measured variables much be independent
• Values of independent and dependent variables must be mutually exclusive
• Data must be raw numbers (e.g. nominal or ordinal)
• Data must be randomly drawn from the population
• Observed frequencies must not be too small (as expected No. must be >5… if <5 then Fisher’s Exact Test must be used)

Formulae, and how to…

1. For each observed number, calculate the expected number = [(row total x col total) / table total]
2. Subtract expected from observed [O-E]
3. Square the result and divide by the expected number [(O-E)^2 / e]
4. Chi-Square = Sum of results for all cells

In 2×2 table the cut-off for 95% significance level (p = 0.05) = 3.84 (1.96^2).

If Chi-square is less than 3.84 we can say the result was statistically significant (at the 95% level)

The bigger the chi-square result, the more statisically significant it will be

DONE!

## Statistical Methods – Standard Error and Confidence Intervals

This post covers the 3 applications of standard error required for the MFPH Part A; mean, proportions and differences between proportions (and their corresponding confidence intervals)…

a) What is the etandard error (SE) of a mean?

The SE measures the amount of variability in the sample mean.  It indicated how closely the population mean is likely to be estimated by the sample mean.

(NB: this is different from Standard Deviation (SD) which measures the amount of variability in the population.  SE incorporates SD to assess the difference beetween sample and population measurements due to sampling variation)

• Calculation of SE for mean = SD / sqrt(n)

…so the sample mean and its SE provide a range of likely values for the true population mean.

How can you calculate the Confidence Interval (CI) for a mean?

Assuming a normal distribution, we can state that 95% of the sample mean would lie within 1.96 SEs above or below the population mean, since 1.96 is the 2-sides 5% point of the standard normal distribution.

• Calculation of CI for mean = (mean + (1.96 x SE)) to (mean – (1.96 x SE))

b) What is the SE and of a proportion?

SE for a proprotion(p) = sqrt [(p (1 – p)) / n]

95% CI = sample value +/- (1.96 x SE)

c) What is the SE of a difference in proportions?

SE for two proportions(p) = sqrt [(SE of p1) + (SE of p2)]

95% CI = sample value +/- (1.96 x SE)

## Epidemiology – Attributable Risk (including AR% PAR + PAR%)

These are really important measures for public health as they indicate the magnitude of risk in absolute terms.

Attributable Risk (AR)

• AR is the portion of disease incidence *in the exposed* that is due to the exposure.
• Therefore = the incidence of a disease *in the exposed* that would be eliminated if the exposure were eliminated
• Calculation of AR = risk(incidence) in exposed – risk(incidence) in non-exposed which provides the risk difference

Attributable Risk % (proportion or fraction)

• AR is sometimes expressed as a proportion (%) of the disease incidence in the exposed – this is the proportion of disease incidence *in the exposed* that is due to the exposure.
• Therefore it is the proportion of the disease incidence *in the exposed* that would be eliminated if exposure were eliminated
• Calculation of AR% = AR / risk(incidence) in exposed x 100%
• …When data on disease incidence is not available we can use the RR…
• Calculation of AR% = (RR-1) / RR x 100%

Population Attributable Risk (PAR)

• This is a similar measure to AR except it is concerned not with the excess rate of disease *in the exposed* but the excess rate of disease *in the population* (compared with the rate of disease in the exposed group)
• PAR is the proportion of the disease incidence *in the population* (i.e. exposed and non-exposed) that is due to the exposure
• Therefore it is the disease incidence *in the population* that would be eliminated if the exposure were eliminated
• Calculation of PAR = risk(incidence) in population – risk(incidence) in non-exposed

Population Attributable Risk % (porportion or fraction)

• PAR% is the proportion of disease incidence *in the population* (i.e. exposed and non-exposed) that is due to the exposure
• Therefore it is the % of disease incidence *in the population* that would be eliminated if the exposure were eliminated
• Calculation of PAR% = PAR / risk(incidence) in population
• …When data on disease incidence is not available we can use the RR…
• Calculation of PAR% = prevalence in exposed population x (RR-1) / [1+ prevalence in exposed population (RR-1)]

PAR% = important indice in prioritising population interventions

• However, it assumes that all the association between disease and exposure is causal… and PAR varies according to how common an exposure to the risk factor is in the population
• AR + PAR are hypothetical constructs… there is no temporal depth
• Important for the counterfactual to be defined in order to explain their meaning (e.g. if smoking = lung cancer, no smoking = no lung cancer)
• Based on logic of risk subtraction (rather than risk explanation)

## Epidemiology – Positive and Negative Predictive Value (PPV + NPV)

Strongly related to the concepts of sensitivity and specificity are the concepts of PPV and NPV.  These terms are quite similar (and can be confused) so it’s important to remember that sensitivity and specificity measure the accuracy of the test (not any relation to the disease or population), whereas PPV and NPV measure the proportion of people whose test results reflect their health status and therefore *are* affected by the disease prevalence… it’s a sublte, but important distinction! (Especially to diffrentiate what is being asked of you in an exam…)

Definition – Positive Predictive Value (PPV)

• PPV = the proportion of individuals who test positively (a+b) AND trully have the disease (a)
• Formulae = a / (a+b)
• *Importnat* PPV increases with high prevalence of disease

Definition – Negative Predictive Value (NPV)

• NPV = the proportion of individuals who test negatively (c+d) AND trully do not have the disease (d)
• Formulae = d / (c+d)
• *Important* NPV descreases with high prevalence of disease

## Epidemiology – Sensitivity and Specificity

Sensitivity and specificity are two statistical measures of test performance.  The origins of these measures comes (unsurprisingly) from screening tests for diseases whereby the purpose of the test is to differentiate between those who do and do not have the disease (so that appropriate diagnosis and treatment can occur).

The key thing here is to acknowledge that tests are rarely 100% accurate… but the purpose of Sensitivity and Specificity is to identify how accurate tests are in their discrimination between diseased and non-diseased individuals.

Definition – Sensitivity

• Sensitivity identifies the proportion of individuals who truly DO have the disease AND are given a positive test result
• I find it helpful to remember: sensiTivity = sensitive to the Truth (i.e. do have disease + do have positive result)

Formulae – Sensitivity

The trusty 2×2 table (on the right) always have the outcome along the top (disease, death…etc) and the intervention or exposure on the side (in this case – the test).

We want to know what proportion of individuals who have the disease (a+c) were given a positive test result (a), therefore…

• Sensitivity = a / (a+c)

Definition – Specificity

• Specificity identifies the proportion of individuals who truly DO NOT have the disease AND were given the correct negative test result
• I find it helpful to remember: specificity = speciFies the False (i.e. do not have disease and do not have positive test result

Formulae – Specificity

Back to the trusty 2×2 table…

This time we want to know what proportion of people who do not have the disease (b+d) were given the correct negative test result (d), therefore…

• Specificity = d / (b+d)

Interpretation – So what does it mean…?

Calculating sensitivity and specificity help to understand how accurate the tests are at providing the correct result.  This is really important information for understanding how much harm individuals could be subjected to through taking the test.

For example, in screening harm can be receiving a false positive (b – you get a positive test result, but you don’t have the disease) or a false negative (c – you have a negative test result, but unknowingly you do have the disease)… these psychological implications for the individual should never be taken lightly, and therefore it is important to minimise such harms by a) explaining the potential risks to all participating individuals and b) using tests which are as accurate as possible

Ideally a test would be 100% sensitive and specific.  Yet in reality, there is usually a trade-ff between the two properties.  The cut-off point (or ‘criterion for positivity’) depends on the consequences of missing positives and falsely classifying negatives.  For example

Sensitivity is often prioritised when…

• Disease is serious (we want to identify as many true cases as possible)
• Treatment is effective + available (we want to identify + treat as many cases as possible)
• High risk of infectivity if individuals are not treated (we want to minimise harm to others)
• Subsequent test is cheap and low-risk

Specificity is often prioritised when…

• Treatment is unpalatable (we only want to treat those we are confident have the disease and would benefit from the treatment)
• Subsequent test is expensive and risky

So sensitivity and specificity is all about how accurate is the test at discriminating those who are healthy from those with the disease.

## Epidemiology – Relative Risk (RR)

Firstly, a few points need to be made regarding what is meant by risk

• Risk = the statistical likelihood of having an adverse event (e.g. illness or death) following exposure to some factor
• Risk is a measure of association NOT causation… it cannot tell us about the likelihood of harm

Definition

Relative Risks (RR) are used to compare the risks of different groups.   Defined as:

The probability that a member of an exposed group will develop disease relative to the probability that a member of an unexposed group will develop the same disease

• As such RR measure the strength of association between an the risk of exposure and an outcome, compared to the risk of non-exposure and the same outcome.
• RR can be assessed using 3 calculations: risk ratio, rate ratio and odds ratio
• Risk calculations require all knowledge of those exposure and unexpose
• It is the risk of developing the disease (or outcome) relative to the exposure

Risk Ratio

• Risk Ratio = (risk of disease in the exposed) / (risk of disease in the non-exposed)
• Requires complete follow-up of data – calculation of risk is based on the population at risk at the start of the study
• Risk Ratio doesn’t account for time to event between groups, only final outcome
• Risk ratio is most appropriate to assess protective effects of an intervention (e.g. vaccinations)

Rate Ratio

• Rate Ratio = (rate of disease in the exposed) / (rate of disease in the non-exposed)
• Calculation of rate is based on the total person-years at risk during the study, therefore reflecting the changing poplation at risk
• Preferential choice for longitudinal studies as it incorporates changes over time

Odds Ratio

• Odds Ratio = (odds of disease in exposed) / (odds of disease in the non-exposed)
• Always the measure of association for case-control studies
• For rare diseases (or diseases with long latency periods) the OR can be an approximate measure to the RR (relative risk)
• Doesn’t require denominator (i.e. total number in population) unlike measuring risk
• For more info, visit my blog post on Odds Ratios here

• RR is a measure of association, as such as cannot infer causation from any of these calculations
• RR assesses the risk of developing disease relative to exposure… *but* gives no indication of the magnitude of the excess risk in absolute terms.  For this we need to understand the Attributable Risk
• Can sometimes be confusing deciding which RR calculation to use when… best advice is to think about a) what is the study design? (OR is always used for case-control) and b) has follow-up data been completed for all participants? (Yes = risk ratio, No = rate ratio)

NB: I couldn’t find any ‘risk’ appropriate pictures, so here are some dogs having fun instead.  Enjoy!

## Epidemiology – Numbers Needed to Treat (NNT)

Definition

NNT = the number of patients that need to be treated in order for 1 extra patient to benefit

Alternatives to NNT include:

• Numbers Needed to Screen (NNS = No. needed to be screened for 1 to benefit)
• Numbers Needed to Harm (NNH = No. needed to be exposed to a risk factor for 1 to be harmed)

Formulae

• NNT = 1/ARR
• Absolute Risk Reduction (ARR) is calculated by the difference between the rate of event in controls and the rate of event in cases = (a/a+c) – (b/b+d)
• NNTs should always be reported with 95% Confidence Intervals for interpretation

Interpretation

• The lower the NNT the better.
• E.g. Drug FAB helps prevent strokes and has an NNT of 1.  By treating Bob with FAB this should prevent him having a stroke.  On the otherhand, drug BAD has an NNT of 50, so you would have to treat 50 Bobs in order to prevent one stroke.
• If the treatment or exposure if harmful (i.e. result is a negative number) the omit the sign and measure is renamed as NNH

• Useful to communicate benefit and harm – easy to understand (risk communication)
• NNTs can be used either for summarising the results of trials
• A clinically useful measure of the relative benefit of an active treatment over a control (better than RR or OR)
• Takes into account the frequency of the outcome – thus reflects the ublic health impact of the intervention

• Cannot be used for performing a meta-analysis. Pooled NNTs derived from meta-analyses can be seriously misleading because the baseline risk often varies appreciably between the trials
• Do not compare NNTs for different therapies *unless* the baseline risks of the disease are similar…

For further info, check out http://www.thennt.com/the-nnt-explained/

## Epidemiology – Odds Ratio (OR)

Definition

The Odds Ratio is a measure of association which compares the odds of disease of those exposed to the odds of disease those unexposed.

Formulae

• OR = (odds of disease in exposed) / (odds of disease in the non-exposed)

Example

I often think food poisoning is a good scenario to consider when interpretting ORs:  Imagine a group of 20 friends went out to the pub – the next day a 7 were ill.  They suspect that it may have been something they ate, maybe the fish casserole… here are the numbers:

 Cases (ill) Controls (not ill) Total Exposed (ate fish) 5 3 8 Unexposed (didn’t eat fish) 2 10 12 7 13 20
• Odds of exposure in cases = a/c = 5/2 = 2.5
• Odds of exposure in controls = b/d = 3/10 = 0.3
• Odds Ratio = (a/c) / (b/d) = 2.5/0.3 = 8.33

Interpretation: What does this mean?

• OR of 1 would suggests that there is no difference between the groups; i.e. there would be no association between the suggested exposure (fish) and the outcome (being ill)
• OR of > 1 suggests that the odds of exposure are positively associated with the adverse outcome compared to the odds of not being exposed
• OR of < 1 suggests that the odds of exposure are negatively associated with the adverse outcomes compared to the odds of not being exposed.  Potentially, there could be a protective effect

In the example above, we can conclude that those who ate the fish casserole (exposure) were 8.3 times more likely (OR = 8.3) to be ill (outcome), compared to those who did not eat the fish casserole.  Of course this is an entirely ficticious example, and I have nothing against fish

• Appropriate to analyse associations between groups from case-control and prevalent (or cross-sectional) data.
• For rare diseases (or diseases with long latency periods) the OR can be an approximate measure to the RR (relative risk)
• Doesn’t require denominator (i.e. total number in population) unlike measuring risk
• Good method to estimate the strength of an association between exposures and outcomes

• Association does not infer causation! *epidemiology golden rule*

## Exam Techniques

This week I attended: “Part A Revision Seminar: How to Answer Questions!”

This was a useful session which (thankfully) built up my confidence and also provided some useful practical tips, which I thought might be helpful to share.

(I would like to acknowledge and thank Kirsteen Macleod (ST5) for sharing her wisdom and collated advice – the following information have been taken from her presentation)

ARGH I’ve got to answer a question – how do I start?

• Identify what aspect of the curriculum the question covers
• Do a brain dump of all ideas/knowledge/critiques… e.g. spider diagram
• Think laterally – why is this important to public health? Have wider determinents been considered?
• Identify an appropriate structure (something logical, e.g. framework)
• Write a structured answer with bullet points, neat handwriting and headings