CS 5014 Homework 1 Due October 29

The purpose of these exercises is to help you review the material in Chapters 12-23 of Jain. You may work together on these problems, but each student should turn in a solution.

1. Suppose we collect data from 32 VT CS students on the number of times per semester they stay up all night working. Here is the raw data:
```      13 18 16 14 19 17 23 24
20 19 18 23 15 19 19 18
25 16 20 17 25 23 15 19
16 21 15 21 21 16 15 17
```

1. Construct a histogram of this data. Use your own judgement in choosing the cell size so that the histogram gives a useful indication of the apparent distribution of the data.

2. Construct a normal quantile-quantile plot for the data. Does the distribution appear to be normal?

3. Compute the sample mean, variance and standard deviation.

4. Compute a 95% confidence interval for the mean.

5. What sample size would be needed to estimate the mean with an accuracy of 3% and a confidence level of 95%?

2. Consider the following data set that records pain level of CS 1044 GTA's as a function of lines-of-code in students' programs:

L.O.C. Pain
30 73
20 50
60 128
80 170
40 87
50 108
60 135
30 69
70 148
60 132

1. Fit a simple linear regression model to this data, i.e., compute the parameters and such that

is the best fit linear model to this data, in the least-squares sense. Report the coefficient of determination, for your model.

2. Prepare two plots to help evaluate the goodness of your model:
• A scatter plot of the data and the model (e.g., Figure 14.2 in Jain).
• A plot of residuals vs. predicted response (e.g., Figure 14.7 in Jain).

3. Compute a 90% confidence interval for the mean pain level (taken over many future observations) for a GTA grading a 50-line program.

3. In an effort to predict the productivity of CS Department faculty members, the data shown here was collected:

 Factor B -- Ofc. Space Factor A -- Salary Small Big Low (52,47,44,53) (69,63,70,70) High (70,77,71,68) (69,74,76,82)

It records productivity (on a mysterious 100 point scale) for 16 different faculty members, grouped into four groups depending on their salary level (low or high) and their office size (small or big).

1. Analyze this data by computing a simple model (as in Chapter 18 of Jain). Compute the effects and the allocation of variation (as in Example 18.3).

2. Compute 90% confidence intervals for the four effects in this model.