A researcher interested in security events. Researcher suspected that people in the 20-29 age group were more likely to say they had experienced security events than people in the 30-39 age group. Researcher obtained separate random samples of people from each age group. Here are the results: 

Impacted?

20-29

30-39

Yes

24

24

No

56

96

Total

80

120

Researcher wants to use above results to construct a 95% confidence interval to estimate the difference between the proportion of people in each age group who would say they have been impacted (x=p20s-p30s). Assume that all of the conditions for inference have been met.

What is the correct 95% confidence interval (x) based on researcher's samples? 

in Statistics Answers by Level 1 User (300 points)
reshown by

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
To avoid this verification in future, please log in or register.

1 Answer

Best answer

We are dealing with proportions so the proportion of 20s saying yes is 24/80=0.3, while that for 30s is 24/120=0.2. So the difference is 0.1. It's this difference we need to examine more closely later. We also need to note the sample sizes.

Let p20=0.3 and p30=0.2 and let μ=p20-p30=0.1. Statistically we can equate the proportions to the two means in a probability distribution. We can determine the variance because we have a binary situation-yes and no are the only two states. So for the individual groups: 

μ20=p20=0.3 and μ30=p30=0.2, variance σ202=p20(1-p20)/n20=0.21/80=0.002625; variance σ302=p30(1-p30)/n30=0.16/120=0.001333. (We can deduce the standard deviations from these if we need them.)

What we called μ (p20-p30) is the mean for the probability distribution for the difference in proportions. The variance is simply the sum of the individual variances:

σ2202302=0.002625+0.001333=0.003958, so standard deviation σ=√0.003958=0.06292 approx.

The sample sizes n20=80 and n30=120 are reasonably large (bigger than 30) so we can assume a normal distribution.

A 95% confidence interval corresponds to a Z-score of 1.96, a two-tailed value because we are considering a data range [low,high] on either side of the mean, so the 100-95=5% is distributed equally between the left and right tails of the distribution. We are actually looking at the Z-score corresponding to 100-5/2=97.5%=0.975 (Z=1.96) when we inspect the body of the distribution table. Z=(X-μ)/σ for some value X in the dataset. We are looking for a low X value and a high X value to give us a 95% confidence interval. So:

|X-μ|/σ=1.96, from which X=μ±1.96σ=0.1±1.96×0.06292. The confidence interval is about [-0.02,0.22]. We can be 95% confident that the proportion difference lies between these two extremes. This means that in a population -0.02≤p20-p30≤0.22. (This appears to allow for a small minority of 20-29 year olds to actually be a little less impacted than 30-39 year olds; but the majority of 20-29 year olds will be impacted at least as much as 30-39 year olds.)

by Top Rated User (1.1m points)
selected by
Hello Rod, thank you for your help as always.

Related questions

1 answer
1 answer
1 answer
asked Jan 29, 2015 in Statistics Answers by VLanham Level 1 User (420 points) | 1.3k views
Welcome to MathHomeworkAnswers.org, where students, teachers and math enthusiasts can ask and answer any math question. Get help and answers to any math problem including algebra, trigonometry, geometry, calculus, trigonometry, fractions, solving expression, simplifying expressions and more. Get answers to math questions. Help is always 100% free!
87,542 questions
99,811 answers
2,417 comments
523,700 users