Showing posts with label Biostatistics. Show all posts
Showing posts with label Biostatistics. Show all posts

Histograms vs. Bar graphs: what is the difference.

They both look very similar and can be confusing. Essentially histograms are a type of bar graphs. Both these charts have frequency on the Y (vertical) axis and categories on the X (horizontal) axis.

In a bar graph the variables on X axis are arbitrary. For example if you ask one hundred 8 year old students to pick their favorite ice cream flavor the result can be plotted as a bar graph as follows. Note that the bars do not touch each other.
Bar graph


 On the other hand in a histogram the variables on X axis are part of a population and the data on Y axis is numerical. For example say we measured the heights of the same 100 children. We could plot the graph as a histogram as follows. The X axis has range for height in inches. Note that the bars touch each other. In the USMLE you might be given some data and asked what kind of a chart will be most appropriate to present that data. With increased emphasis on bio-statistics and evidence based medicine, interpretation and presentation of such data may be tested more frequently in the USMLE.
Histogram

Click biostatistics label on the label cloud to view more posts on statistics in the USMLE

Sensitivity, Specificity, Positive Predictive Value, Likelihood Ratio explained with an example

Sensitivity-SnOut: Sensitive test to rule-out a disease. Meaning it will try to not miss any case. Formula for Sensitivity is


Specificity-SpIn: Specific test to rule-in a disease. Meaning it will try not to misdiagnose a normal case as diseased. Formula for specificity is

Positive Predictive Value (PPV): Positive predictive value is also known as precision rate is the proportion of subjects with positive test results who are correctly diagnosed. It is a measure of the performance of a diagnostic method, as it reflects the probability that a positive test reflects the underlying condition being tested for. Its value does however depend on the prevalence of the outcome of interest. Hence the formula for PPV is as follows:


PPV can be calculated if the specificity, sensitivity and prevalence of the disease under question is known by the following formula.


Negative predictive value is the opposite of PPV hence the formula for NPV will be




Based on specificity, sensitivity and prevalence the formula is





Likelihood Ratio: The likelihood that a given test result would be expected in a patient with a disease compared to the likelihood that the same result would be expected in a patient without that disease.

Likelihood Ratio Positive (LR+): The odds that a positive test result would be found in a patient with, versus without, a disease. Formula for LR+ is as follows:
Likelihood Ratio Positive (LR+) = Sensitivity / (1 - Specificity).

Likelihood Ratio Negative (LR-): The odds that a negative test result would be found in a patient without, versus with, a disease.
Likelihood Ratio Negative (LR-) = (1- Sensitivity) / Specificity.

Now let us solve a question using all the above formulae


Imagine that a group of 203 patients had a Chest Xray to look for cancer. Of these 203 patients 20 patients had abnormal chest xray (positive test). 183 had a negative Chest XRay(negative test). 2 of the 20 with positive chest Xray actually had cancer while one of the normal chest xray patient had cancer.

True positives are those patients who had abnormal xray and had cancer so TP=2

False positives are those patients who had abnormal xray but did not have cancer so FP=18 (20 abnormal Xrays but only 2 cancers)

True negatives are those patients who had normal xray and did not have cancer so TN=182 (183 normal Xrays but 1 had cancer)

False negatives are those patients who had normal xray but had cancer so FN=1

Hence based on these numbers the sensitivity is:
TP/(TP+FN)
that is 2(2+1)=0.667 or 66.7%

Specificity is:
TN/(TN+FP)
that is 182/(182+18)=0.91 or 91%

PPV is:
TP/(TP+FP)
that is 2/(2+18)=0.1 or 10%

NPV is:
TN / (FN + TN)
that is 182/(1+182)= 0.995 or 99.5%

Hardy–Weinberg equilibrium with example

The Hardy–Weinberg principle states that both allele and genotype frequencies in a population remain in an equilibrium (are constant) as long as no specific disturbing influences are introduced.

Those disturbing influences could be mutations, non-random mating, selection, limited population size, gene flow and random genetic drift. Because these influences are universal in real life Hardy-Weinberg principle is never absolutely accurate. Although it sounds more theoretical it can help predict things with reasonable error.

Take for example a single gene that could occur as either dominant allele (A) or recessive(a) and their frequencies are denoted by p and q respectively. Assuming population equilibrium, frequency of occurrence of AA(A homozygosity) is p^2 and frequency of aa (a homozygocity)is q^2. Similarly the frequency of Aa (heterozygous) should be 2pq

To better understand this consider an autosomal recessive mutation that causes sickle cell anemia in homozygous recessive children. The parents of a boy with this mutation wants to know the probability of their grandchildren inheriting the disease. In order to determine the chance that the child will reproduce with a carrier of the recessive mutation we can use the above equation.  In order to know this we should know the incidence of heterozygous girls with this mutation and this can be derived if the incidence of homozygous (disease) state is known.

Let us assume that the homozygous state occurs at the rate of 64 per 10,000 people. Hence the occurrence of aa (disease) is 0.0064 hence q^2 0.0064 Hence q = 0.08

p+q=1 hence p=1-q=1-.08=.92. According to this, AA is p^2, which is .8464. Heterzygotic frequency is 2pq, which is 0.1472

Hence the chance that the young boy will mate with a heterozygous girl are about 14% and half of all their kids will be homozygous hence the chance that their kids will have the disease is about 30%