Late homeworks are not accepted. If you cannot make it to lecture, you should arrange to hand in your homework ahead of time. For data, you must use the August 2015 CPS data set. Simulate the Central Limit Theorem. [First construct a series x. Then fill it with random integers using the command rndint(x,138). Open the variable wages and view `stats by classification’. Classify by series x, and uncheck the two `Group into bins if’ conditions below. Select and copy the means associated with each integer category (not the total), and paste them as a new workfile.] Print the resulting histogram and statistics. Do this two more times (you only need to `re-seed' the variable x). Create a new random variable Y, such that Y~N(μ, σ2/100), where μ and σ are the mean and sd of the original wages variable from the CPS data set. (Use the command: series y = μ + σ[email protected] , where μ and σ are the appropriate numbers.) Take a random sample (of size n = 139) from Y, and print out the histogram and statistics. Do this two more times. Write a short paragraph comparing these statistics and the distribution with those of the original wages variable, and your random samples from Y.
Unless specified otherwise, your Basic Sample should be persons whose age is in the 23-28 (inclusive) range, who are working at least 40 hours per week. Modify those, and only those, parameters that you must in order to follow the instructions for the particular exercises below. If a (sub-)sample has less than 4 observations, simply note “insufficient sample size”, and move to the next item. 1. Produce the “histogram and statistics”, and confidence intervals (variance unknown) at the .95 and .99 levels for the following groups’ wages: a. Persons with a bachelor’s degree.
b. Persons with less education than a bachelor’s degree.
c. Persons with more education than a bachelor’s degree.
• Calculate the lengths of these intervals. Describe what the various differing lengths tell you about the uncertainty in your point estimate of the mean. What is responsible for the various changes in length?
• For each pair of these groups, what conclusions, if any, can you draw about the plausibility they come from the same underlying population (in terms of wages earned)? Explain your answers.
2. Return to the Basic Sample, but further restrict it to men living in California who have Bachelor’s degree as their highest level of education. Produce point estimates of the mean of the wages of males whose highest degree is a bachelor’s degree. Assume the variance is known and construct confidence intervals at the .95 and .99 levels. Assume the variance is unknown and reconstruct these confidence intervals. What is the ratio of length of the latter confidence interval over the former? What is the ratio of the corresponding margins of error? (Compare the variance known/unknown assumptions at each confidence level, and compare the confidence levels at each category of known/unknown variance assumption.) What effect is the substantial decrease in sample size having on your statistical analyses?
3. Do problem 2, but replacing men with women. What, if anything do the lengths and locations of the two sets of intervals tell you about the plausibility of the relevant men's and women's wages being drawn from the same population?
4. Use the entire data set (not just the Basic Sample) to construct (i) point estimates, (ii) margins of error, and (ii) confidence intervals (all at the 95% level, variance unknown) to address the following questions:
a. What is the average age of persons who make at least 100k/year? b. What is the average age of persons with Bachelor’s or higher degrees who make at least 100k/year?
c. What is the average age of Californians who make at least 100k/year? d. What is the average age of Missourians who make at least 100k/year?
5. Use any combination of any variables (except wages) in the entire data set you to create a subsample that is (i) most like you personally, but that (ii) contains at least 8 observations (i.e. n ≥ 8). (There will be many correct choices; just be specific about what your subsample is.) Create point estimates and confidence intervals (95%, variance unknown) of the average wages of your group. Also calculate the difference and ratios of the averages compared to the Basic Sample (your group minus the Basic Sample, and your group over the Basic Sample). Do all of this a second time, but with a different group that meets conditions (i) and (ii). Write a couple sentences about what you think might account for the similarities/differences between these two groups of people like you, and their contrasts with the Basic Sample. (Don’t forget that sheer, unexplained random variation (aka sampling error) is often a legitimate part of such accounts.) Which of the two groups you created do you think is a better representative of yourself? Why?