Saturday, February 25, 2012

Statistics Problem Sets and Remaining Schedule

Week 2
Exercise 1

Week 3
Week 4

  1. Call Centers have high turnover rates because of the stressful environment. The national average is approximately 50%. The director of HR for a large bank has compiled data about 70 former employees at one of the bank's call centers.  Use Call Center Data and Pivot Tables to determine the average length of service for:
    1. Males and females in the sample
      1. Males 1.8 Years average length of service
      2. Females 2.0 Years average length of service
    2. Individuals with and without a college degree
      1. 1.5 Years with College Degree
      2. 2.1 Years without College Degree
    3. Males and females with and without prior call center experience.
      1. 1.7 Years with experience for males
      2. 1.8 Years with experience for females
      3. 1.8 Years without experience for males
      4. 2.2 Years without experience for females
    4. Conclusion: We should target inexperienced females for hiring. They stay longer! 
  2. Construct a PivotTable to find the average credit score, years of credit history, revolving balance, and revolving utilization based on whether the applicant was a homeowner and whether the application was approved. Credit Approval Decisions
  3. Homeowners, Approved or not



    Homeowners, Approved Only
    Homeowners, Dis-Approved Only
    Non-Homeowners, Approved Only
    Non-Homeowners, Dis-Approved Only
    1. Ave Credit Score: 652.3
    2. Ave Years Credit History: 11.2 Years
    3. Ave Revolving Balance: $13,939
    4. Note: Utilization not shown above, but just check the utilization box to add it as an additional data item. 
    5. Conclusions: Approved Homeowners have on average, 3 times the credit history than Approved Non-homeowners. This is most likely due to their mortgage.
    6. Approved Non-homeowners have almost twice the revolving credit balance, on average, as their Home-owning counterparts.
    7. Disapproved Homeowners have, on average, a lower credit score than their non-home owning counterparts, although they have a longer credit history (10.7 yrs on average) and a lower revolving credit balance ($14,867)
  4. The Spreadsheet Energy Production & Consumption provides various energy data from 1949 to 2007.
    1. construct and area chart showing the fossil fuel production as a proportion of total energy production
    2. construct line charts for each of the variables
    3. construct a line chart showing the total energy production and consumption during these years
    4. construct a scatter diagram for total energy exports and production
    5. discuss what the information in the chart conveys
      1. There is a growing gap in the total consumption and production of energy.
      2. Fossil Fuel Consumption is rising in a fixed proportion to total consumption.
      3. Fossil fuel production is flat, and not keeping pace with fossil fuel consumption.
  5. Referring to the Call-Center Data, how strongly is length of service correlated with starting age? 
    1. The correlation coefficient is obtained using the Excel CORREL(array 1, array 2) function.
    2. R=-0.61 meaning that as the starting age increases, the length of service tends to decrease.
    3.  This means that the firm is retaining fewer employees who join the company later in life.
  6.  Compute the mean, variance and standard deviation for all of the variables in the File: Major League Baseball
  7.  Construct a chart to show the proportion of funds in each investment category in the spreadsheet: Retirement Portfolio.
  8.  Week 2, Question 1: Based on the data in the spreadsheet Consumer Transportation Survey, develop a probability mass function and cumulative distribution function (both tabular and as charts) for the random variable Number of Children. What is the probability that an individual in this survey has:
    1. Less than 2 children -> Answer 72%
    2. At least 2 children -> Answer 28%
    3. 5 or more children-> Answer 0%
  9.  Week 2, Question 2: A Bus company knows that travel time is normally distributed with a mean time of 15 mins, and a standard deviation of 2 mins. What is the probability that:
    1. Answers above.
  10. Week 2, Question 3
  11. Week 3, Question 1: Using the data in the spreadsheet Accounting Professionals, find and interpret the 95% CI for the following
    1.  mean years of service
    2. proportion of employees with a graduate degree
  12. Week 3, Question 2: Using the spreadsheet Consumer Transportation survey, develop 90%, 95%, and 99% confidence intervals for the following:
    1. mean hours per week spent in vehicles
    2. average weekly miles driven
    3. proportion satisfied with vehicles
    4. proportion with at least 1 child
  13. Week 3, Question 3: Trade associations such as the United Dairy Farmers frequently conduct surveys to identify characteristics of their membership. If this organization conducted a survey to estimate the annual per-capita consumption of milk and wanted to be 90% confident that the estimate was no more than +/-0.3 gallons away from the actual average, what sample size is needed? Past data have indicated that the standard deviation is approximately four gallons.
    1. standard error of z distribution: Zval*(s / sqrt (n))
    2. s = 4
    3. z(90) = -1.64
    4. 0.3 = -1.64 * 4 / sqrt (n)
    5. n= 478.151 or 479 people (round up) (note there is rounding error due to rounding to 2 decimal places, excel will report 481...)
  14.  Week 4, Question 1: A reporter claims that a call center (Call Center Data) has an average tenure of no more than 3 years. Test this hypothesis.
    1. Set H0 (Null Hypothesis) to disprove the reporter, that tenure exceeds 3 years.
    2. Set H1 (Alternative Hypothesis) to Reporter's claim (< 3 years)
    3. Because H1 is <, a lower tail test is called for. Calculate the "Lower Critical Value" for t. Use the Excel Function T.INV(0.05, 69) for 95% confidence and DOF = 69.
    4. This lower critical value should equal -1.67.
    5. Now calculate your test statistic = sample mean - hypothesis mean (3) / standard error of the mean = -8.4
    6. -8.4 < -1.67 within the rejection region of the left-tail. Therefore, reject the Null Hypothesis. You 95% confident that the mean will be less than 3 years.
  15. Week 4, Question 2: Use the spreadsheet called "Sales Data". An industry trade publication stated that the average profit per customer was $4450. Can this company claim that they meet the standard? 
    1. Set H0 to prove that our sales average = 4450. This is a two tailed test. H1 states that our sales average does not equal 4450. Calculate the upper and lower limits for rejection, using the following formula in excel: =-T.INV.2T(0.05, 59) (lower limit). T.INV.2T(0.05, 59) (upper limit). Next calculate the t-test statistic = ( sample mean - 4450 ) / sample standard dev / sqrt(60) We fail to reject the Null Hypothesis, and we state that we are at the industry mean sales.
    2. To prove that we exceed the average, set H0: our average < industry average and H1: our average > industry average using a upper tail test. The upper limit is -T.INV(0.05,59) (minus because the excel function gives you the lower test by default... Conclude that our average cannot be higher, and fail to reject H0. We have a smaller or equal mean to the industry...
  16. Week 4, Question 3: Assume equal variance between two populations and
    1. test H0: service length for males = for females
    2. test H0: service length for experience = no experience
    3. test H0: service length no college = with college
    4. test hypothesis for variance equality. Repeat above problems with unequal variances as required.

     
To Apply Pivot tables, you need data with Column labels in the first row.

What is a Pivot Table you say? According to Microsoft: 
PivotTable report An interactive, cross-tabulated Excel report that summarizes and analyzes data, such as database records, from various sources, including ones that are external to Excel.
Here's how to access this feature: 


How to access Pivot Tables

  1. Select any cell in the data set and choose PivotTable from the Data Tab
  2. Select Pivot Table

Make a new worksheet from your new Pivot Table


No comments:

Post a Comment