Descriptive Statistics

**QUESTION**

This is a three-part assignment related to a study of contraceptive drug use among women. Table2A is a distribution of systolic blood pressures cross tabulated by age and pill use for women.

Table 2A. Distribution of systolic blood pressure, cross-tabulated by age and pill use.

Blood Pressure

(mm) Age 35-44

Non-Users Users

% %

Under 90 1 1

91-95 2 1

96-100 5 4

101-105 9 5

106-110 11 7

111-115 15 12

116-120 16 14

121-125 9 11

126-130 10 11

131-135 8 10

136-140 5 7

141-145 4 6

146-150 2 5

151-155 1 3

156-160 1 1

160 and over 1 2

Total Percent 100 100

Total Number 3,494 1,028

First, state whether blood pressure in Table 2A would be a continuous variable or a discrete variable. Explain. Then supposing that the number of women in each of groups (non-users and users) were identified, would the number of women in each category be a continuous variable or a discrete variable? Explain.

Part 2: Use any free online histogram maker* to draw histograms for the blood pressures of the users and nonusers ages 35–44. Discuss one conclusion that can be made about blood pressure and pill use. *Here are some free resources:

http://www.zweigmedia.com/RealWorld/stats/histogram.html

Part 3: Based on what you’ve learned in this module about normal distributions, explain why a normal approximation of data would be helpful to view the data. For example, you could describe the steps that one would take to estimate the percentage of women with blood pressures in an age group.

Supporting information:

As we learned in Module 1, data can be classified into various types. We now turn our attention to statistical techniques that Health Scientists use to analyze data. At this point we concern ourselves with descriptive statistics to examine a sample. Later in the course we will turn our attention to inferential statistics—those techniques used to make generalizations to a wider population (Dancey et al., 2012).

Suppose we want to find out whether stroke patients differ from heart attack patients in their ability to come to terms with their illness. We could design a questionnaire that measures the ability of patients to cope after they have left the hospital (Dancey et al., 2012). In the example below, suppose that a higher score means that a patient has a higher coping ability.

Stroke Heart Attack

39 27 27 27

26 1 29 23

26 25 27 26

9 23 27 35

14 23 33 25

28 40 22 32

21 9 29 32

26 13 23 22

23 13 29 25

18 21 30 30

Examining these scores, think about how you would describe them to a friend who couldn’t see them. What would be a typical score for stroke? As Norman & Streiner (2008) explain, a measure of central tendency is the typical value for a data set. It is important to study the concepts of Range.

When we perform many statistical tests, we are assuming that the data come from a normal distribution.

When describing how data are distributed, we concern ourselves with Shape (e.g. symmetry, skew, modality), Center (e.g. mean, median, mode), Spread (e.g. Range, Interquartile Range), and Outliers.

The Central Limit Theorem posits that regardless of how data are distributed, if we were draw a reasonably sized sample, then the distribution of the means of those samples would always be normally distributed (Norman & Streiner, 2008).

There are many ways to plot data. A histogram is one useful way that we can visually display a large amount of data. In a histogram, the relative frequency of observations is displayed as a bar graph. Notice in the example below that the histogram illustrates the underlying distribution of data (i.e. Body Mass Index for Patients), revealing the “shape” of the data and variation in the data.

A continuous variable is one that can take on any value between two specified values. Otherwise, it is called a discrete variable (i.e. one that can only take on a finite number of values).

When describing continuous data, it is useful to organize them in a way that readily enables visual interpretation. A frequency distribution is a useful way to summarize data for these purposes. One useful way to describe data is to express a measure of central tendency, which refers to the “typical” or the “average” value in a distribution (Hays, 1994).

Even in an introductory statistics course, some mathematical notation is important. A capital Greek sigma ∑ is used to show the sum of values that are represented by expressions following a symbol. In the example below, we could sum a set of values that are labeled x1, x2, and so on, up to xN as follows:

As you study the background materials for this module, it is important that you understand the following terms:

Mode: The midpoint of the most frequent measurement class

Median: the point exactly midway between the top and bottom halves of a distribution

Arithmetic Mean: the familiar “average”.

By contrast, measures of variability, describe the “spread” or extent of difference among observations or events that make up the distribution. (Hays, 1994).

Standard Deviation is computed as the square root of the variance. It is an index of the variability in the original measurement units.

Variance of the distribution: is equal to the average of the squared deviations from the mean.

Standard Error of the Mean: a statistic used to gauge the ability of a single sample mean to estimate the true mean. It is the standard deviation of the population of sample means.

Modality:

Skewness refers to the symmetry of a distribution curve (Norman & Streiner, 2008).

Kurtosis refers to how flat or peaked a distribution curve.

The sampling distribution of a sample mean is a theoretical probability distribution McGready (2009); it describes the distribution of all sample means from all possible random samples of the same size taken from a population.

Once the standard error has been calculated, then we can estimate confidence intervals. For example, if a single sample of size n is drawn and get one sample mean, then we can move 2 SEs in either direction, and expect that the interval will contain μ most (95 out of 100) of the time (The Johns Hopkins University and John McGready, 2009).

Please proceed to the background readings for this module.

Sources:

Hays, W.L. (1994). Statistics (5th ed.) Harcourt Brace College Publishers. Fort Worth, TX. ISBN: 0-03-074467-9

Minnesota Department of Health. Histogram. Retrieved July 1, 2013 from http://www.health.state.mn.us/divs/cfh/ophp/consultation/qi/resources/toolbox/histogram.html

Norman, G. & Streiner, D. (2008). Biostatistics the bare essentials (3rd ed.) BC Decker Inc. PMPH USA, Ltd. Shelton, CT. eISBN: 9781607950585 pISBN: 9781550093476.

Descriptive Statistics

**ANSWER**

Descriptive Statistics

Student’s Name:

Institutional Affiliation:

Professor’s Name

Course Name: Course Code

Date:

Descriptive Statistics

This is a three-part assignment related to a study of contraceptive drug use among women. Table2A is a distribution of systolic blood pressures cross tabulated by age and pill use for women.

Table 2A. Histogram of systolic blood pressure, cross-tabulated by age and pill use.

Blood Pressure

(mm) Age 35-44

Non-Users Users

% %

Under 90 1 1

91-95 2 1

96-100 5 4

101-105 9 5

106-110 11 7

111-115 15 12

116-120 16 14

121-125 9 11

126-130 10 11

131-135 8 10

136-140 5 7

141-145 4 6

146-150 2 5

151-155 1 3

156-160 1 1

160 and over 1 2

Total Percent 100 100

Total Number 3,494 1,028

First, state whether blood pressure in Table 2A would be a continuous variable or a discrete variable. Explain. Then supposing that the number of women in each of groups (non-users and users) were identified, would the number of women in each category be a continuous variable or a discrete variable? Explain.

Blood pressure represented in the table is a continuous variable since the values represented lie between a range of two specified variables. The values are placed on categories to make it easier to understand and interpret. With continuous variables, a blood pressure variable can take any value within a given category (Glen, 2020). On the other hand, the number of women in a given group or percentage is identifiable and takes a fixed number; hence it is a discrete variable (Glen, 2020). Discrete variables do not have to be a whole number in given circumstances, for example, a shoe size 3.5, but in our case, we are dealing with women, human beings, of which there is nothing like half a woman or such but only a whole being.

Part 2: Use any free online histogram maker* to draw histograms for the blood pressures of the users and nonusers ages 35–44. Discuss one conclusion that can be made about blood pressure and pill use.

The Histograms take the shape of a normal distribution curve. The above graphs show that the majority of oral contraceptive users and non-users for persons aged between 35-44 years have normal and a little elated blood pressure levels, with most of their systolic blood pressure ranging between 110 to 130 mm. The systolic blood pressure of contraceptive users and non-users takes a similar trend as from the normal B.P point of 120mm; the numbers drop as we approach high blood pressure stage 1, stage 2, and hypertensive crisis. However, there is a notable difference between the two groups for systolic blood pressure above 120. The percentage of contraceptive users who are likely to be hypertensive is slightly higher than that of the non-users. From the graphs, several conclusions and recommendations can be made;

1. The use of contraceptive pills has a small effect on blood pressure for the majority of women aged between 35-44 years.

2. On the other hand, contraceptives are likely to induce hypertension in a small minority of women aged 35-44

3. The adversity of oral contraceptives on blood pressure should be assessed before any prescriptions to avoid inducing hypertension to clients that are at risk of high blood pressure. Nurses should take care during the general prescribing policy.

Part 3: Based on what you’ve learned in this module about normal distributions, explain why a normal approximation of data would be helpful to view the data. For example, you could describe the steps that one would take to estimate the percentage of women with blood pressures in an age group.

A normal distribution is a function that describes how values appear to be symmetrically distributed. The majority of the variables cluster around the mid-central peak, and the probabilities and values away from the middle tapper off in the same manner in both directions (Campbell, 2016). The normal curve derives its name from the fact that it fits many natural phenomena in the universe. The mean, mode, and median are all equal in a normal distribution. Half of the population is greater than the mean while the other half lies below the mean value in a normal distribution. The empirical rule for normal distribution states that 68% of observations always take a +/-1 standard deviation from the mean. Using the central limit theorem, as the sample size increases in a normal distribution, the mean’s sampling distribution takes a normal distribution by all means even if the underlying distribution of the variables is non-normal (Karney, 2016). The normal distribution is significant as there are multiple ways analysts can use it.

For example, suppose one needs to estimate women’s percentage with blood pressures in an age group. In that case, they can use the percentage of data in a given standard deviation to understand the population’s mean. Using the central limit theorem, a caregiver can estimate the percentage of women with blood pressure in an age group using results from a few participants, which can then be generalized to the whole group.

References

Campbell, M. (2016). Normal distributions. African Journal of Midwifery and Women’s Health, 10(2), 59-61.

Glen, S. (2020) “Discrete vs Continuous variables: How to Tell the Difference” From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/discrete-vs-continuous-variables/

Karney, C. F. (2016). Sampling exactly from the normal distribution. ACM Transactions on Mathematical Software (TOMS), 42(1), 1-14.