CS 5014 Homework 1
Due October 29

The purpose of these exercises is to help you review the material in Chapters 12-23 of Jain. You may work together on these problems, but each student should turn in a solution.

  1. Suppose we collect data from 32 VT CS students on the number of times per semester they stay up all night working. Here is the raw data:
          13 18 16 14 19 17 23 24 
          20 19 18 23 15 19 19 18 
          25 16 20 17 25 23 15 19 
          16 21 15 21 21 16 15 17
    

    1. Construct a histogram of this data. Use your own judgement in choosing the cell size so that the histogram gives a useful indication of the apparent distribution of the data.

    2. Construct a normal quantile-quantile plot for the data. Does the distribution appear to be normal?

    3. Compute the sample mean, variance and standard deviation.

    4. Compute a 95% confidence interval for the mean.

    5. What sample size would be needed to estimate the mean with an accuracy of 3% and a confidence level of 95%?

  2. Consider the following data set that records pain level of CS 1044 GTA's as a function of lines-of-code in students' programs:

    L.O.C. Pain
    30 73
    20 50
    60 128
    80 170
    40 87
    50 108
    60 135
    30 69
    70 148
    60 132

    1. Fit a simple linear regression model to this data, i.e., compute the parameters tex2html_wrap_inline9 and tex2html_wrap_inline11 such that

      displaymath7

      is the best fit linear model to this data, in the least-squares sense. Report the coefficient of determination, tex2html_wrap_inline13 for your model.

    2. Prepare two plots to help evaluate the goodness of your model:
      • A scatter plot of the data and the model (e.g., Figure 14.2 in Jain).
      • A plot of residuals vs. predicted response (e.g., Figure 14.7 in Jain).

    3. Compute a 90% confidence interval for the mean pain level (taken over many future observations) for a GTA grading a 50-line program.

  3. In an effort to predict the productivity of CS Department faculty members, the data shown here was collected:

    Factor B -- Ofc. Space
    Factor A -- Salary Small Big
    Low (52,47,44,53) (69,63,70,70)
    High (70,77,71,68) (69,74,76,82)

    It records productivity (on a mysterious 100 point scale) for 16 different faculty members, grouped into four groups depending on their salary level (low or high) and their office size (small or big).

    1. Analyze this data by computing a simple model (as in Chapter 18 of Jain). Compute the effects and the allocation of variation (as in Example 18.3).

    2. Compute 90% confidence intervals for the four effects in this model.