CS 5014 Homework 1
Due October 29
The purpose of these exercises is to help you review the material
in Chapters 12-23 of Jain. You may work together on these problems, but
each student should turn in a solution.
- Suppose we collect data from 32 VT CS students on the number of times per
semester they stay up all night working.
Here is the raw data:
13 18 16 14 19 17 23 24
20 19 18 23 15 19 19 18
25 16 20 17 25 23 15 19
16 21 15 21 21 16 15 17
- Construct a histogram of this data. Use your own judgement in choosing
the cell size so that the histogram gives a useful indication of
the apparent distribution of the data.
- Construct a normal quantile-quantile plot for the data. Does the
distribution appear to be normal?
- Compute the sample mean, variance and standard deviation.
- Compute a 95% confidence interval for the mean.
- What sample size would be needed to estimate the mean with an
accuracy of 3% and a confidence level of 95%?
- Consider the following data set that records pain level of
CS 1044 GTA's as a function of lines-of-code in students' programs:
| L.O.C. | Pain |
| 30 | 73 |
| 20 | 50 |
| 60 | 128 |
| 80 | 170 |
| 40 | 87 |
| 50 | 108 |
| 60 | 135 |
| 30 | 69 |
| 70 | 148 |
| 60 | 132 |
- Fit a simple linear regression model to this data, i.e., compute
the parameters
and
such that
is the best fit linear model to this data, in the least-squares sense.
Report the coefficient of determination,
for your model.
- Prepare two plots to help evaluate the goodness of your model:
- A scatter plot of the data and the model (e.g., Figure 14.2 in Jain).
- A plot of residuals vs. predicted response (e.g., Figure 14.7 in Jain).
- Compute a 90% confidence interval for the mean pain level (taken over
many future observations) for a GTA grading a 50-line program.
- In an effort to predict the productivity of CS Department faculty
members, the data shown here
was collected:
|
Factor B -- Ofc. Space |
| Factor A -- Salary |
Small |
Big |
| Low |
(52,47,44,53) |
(69,63,70,70) |
| High |
(70,77,71,68) |
(69,74,76,82) |
It records
productivity (on a mysterious 100 point scale) for 16
different faculty members, grouped into four groups
depending on their salary level (low or high) and their
office size (small or big).
- Analyze this data by computing a simple model
(as in Chapter 18 of Jain). Compute the effects and the allocation
of variation (as in Example 18.3).
- Compute 90% confidence intervals for the four effects in this model.