Instructions for assignment file 2 of 3
Overview
You need to submit a word file with the answers to 9 questions.
You must use the datasets in file 1 and the automatic dataset summarizer to get the descriptive statistics that are used questions 1 to 5 and the inferential statistics that are used in question 6 to 8.
The word count can be less than 1500 words if you are giving answers that demonstrate you have understood the material.
File 1 Dataset overview
Each student is given two datasets, students MUST use the datasets they are given, They CANNOT use datasets they make themselves or take from other sources. one dataset is information about each staff member in call centre 1. one dataset is information about each staff member in call centre 2.
Each of the datasets consists of the following variables,
*Original staff or replacement staff? Is the staff member one of the original staff or replacement staff, this is a categorical variable.
*Median call time?, The duration of each of the staff member’s calls is recorded and the median call time is calculated, this is a quantitative variable.
*Median above 3 minutes? This is a categorical variable because the possible answers are yes or no.
*Number of complaints? The number of complaints lodged against the staff member, this is a quantitative variable. For the original staff it is the number of complaints in their last month in for the replacement staff it is the number of complaints in the first month
question 1 THERE ARE TWO OPTIONS FOR QUESITON 1 JUST PICK ONE OPTION OPTION 1 a) Just using the information for call centre 1
i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Original staff or Replacement staff?” and “median call time?” using the sample
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
b) Just using the information for call centre 2
i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median call time?”
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
c) Compare the results in parts (a) and (b) OPTION 2
a) Just using the information for call centre 1
i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Original staff or Replacement staff?” and “number of complaints?” using the sample
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
b) Just using the information for call centre 2
i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “number of complaints?”
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficien r
c) Compare the results in parts (a) and (b)
question 2
a) Just using the information for call centre 1
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
b) Just using the information for call centre 2
i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median above 3 minutes?”?”
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
c) Compare the results in parts (a) and (b)
question 3
a) Just using the information for call centre 1
i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “median call time?” and “number of complaints?” using the sample
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
b) Just using the information for call centre 2
ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics
Difference between sample means – Difference between sample proportions – correlation coefficient r
c) Compare the results in parts (a) and (b)
Question 4
Just using the call centre 1 data set
a) Just considering the original staff
i) What is sample size and sample mean of median call time
ii) what is the zscore of sample mean if population mean is 3 and population standard deviation is 1.1
b) Just considering the replacement staff
i) What is sample size and sample mean of median call time
ii) what is the zscore of sample mean if population mean is 3 and population standard deviation is 1.1
Question 5
Just using the call centre 1 data set
a) Just considering the original staff
i) What is the sample size
ii) What is the sample proportion of workers that have a median call time above 3 minutes
ii) use the answer in part (i) and (ii) to find a 95% confidence interval of the population proportion
b) Just considering the replacement staff
i) What is the sample size
ii) What is the sample proportion of workers that have a median call time above 3 minutes
ii) use the answer in part (i) and (ii) to find a 95% confidence interval of the population proportion
question 6
a) Just using the information for call centre 1
i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median call time?” if you consider the whole population
ii) make suitable comments about the output in part (i)
b) Just using the information for call centre 2
i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median call time?” if you consider the whole population
ii) make suitable comments about the output in part (i)
c) Compare the results in parts (a) and (b)
question 7
a) Just using the information for call centre 1
i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median above 3 minutes?” if you consider the whole population
ii) make suitable comments about the output in part (i)
b) Just using the information for call centre 2
i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median above 3 minutes?” if you consider the whole population
ii) make suitable comments about the output in part (i)
c) Compare the results in parts (a) and (b)
question 8
a) Just using the information for call centre 1
i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “median call time?” and “number of complaints?” if you consider the whole population
ii) make suitable comments about the output in part (i)
b) Just using the information for call centre 2
i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “median call time?” and “number of complaints?” if you consider the whole population
ii) make suitable comments about the output in part (i)
c) Compare the results in parts (a) and (b)
Question 9
Comment on the sample report on pages 7 to 11 of this document discuss the main message of the report and how it is communicated Hint: If you have no idea how to comment on a report you can download the following example it is a report with a comment on the first page.
Sample report you must use this to answer question 9
“Report Title : Market research checking if the new version of a product is more popular
Introduction: Company XYZ has tried has a new version of a product and has conducted some market research using a sample of 100 people to check if people prefer a new version of the product and if there is a difference in popularity in two different countries. “
“Description the dataset:
The first 8 rows of the data set are given below, the dataset has 100 rows , Each row gives information a person that is reviewing a version of the product
Which country? | Which version ? | Would they buy the product? | Age? |
Country 1 | old version | would buy | 40 |
Country 1 | old version | would not buy | 45 |
Country 1 | new version | would buy | 42 |
Country 1 | new version | would not buy | 41 |
Country 2 | new version | would buy | 35 |
Country 2 | old version | would buy | 36 |
Country 2 | old version | would not buy | 33 |
Country 2 | old version | would buy | 23 |
The variables are
Which country? Country 1 or country 2, this is a categorical variable
Which version? Which version was reviewed, the old version or new version. This is a categorical variable
Would they buy the product? Would the person buy the product or not buy the product? This is a categorical variable
Age? How many years old is the person reviewing the product? This is a quantitative variable “
“Main findings
Descriptive sample statistics
Just considering country 1, what is the popularity of the new version and old version
The following output lets you see the relationship between the variable “which version” or “would they buy the product”
Would you buy the product | |||
Row Labels | Would buy | Would not buy | Grand Total |
New version count | 35 | 15 | 50 |
New version % | 70% | 30% | 100% |
Old version count | 25 | 25 | 50 |
Old version % | 50% | 50% | 100% |
70% want to buy the new version whereas only 50% of people want to buy the old version of the product , formally speaking the difference in sample proportions is =0.7-0.5=0.2=20% so there is a 20% upward swing
Just considering country 2, what is the popularity of the new version and old version
The following output lets you see the relationship between the variable “which version” or “would they buy the product”
we can also check if there is a relationship between the variable “Does the manager want to keep the product” and region
Would you buy the product | |||
Row Labels | Would buy | Would not buy | Grand Total |
New version count | 40` | 10 | 50 |
New version % | 80% | 20% | 100% |
Old version count | 25 | 25 | 50 |
Old version % | 50% | 50% | 100% |
80% o want to buy the new version whereas only 50% of people want to buy the old version of the product
=0.8-0.5=0.3=30% so there is a 30% upward swing
When you compare country 1 and country 2 , country 2 has larger upswing so the new product is more successful in country 2 .
Comparing the ages in country 1 and country 2
Average age | Standard deviation | Sample size | |
Country 1 | 40 | 18 | 100 |
Country 2 | 41 | 17 | 100 |
Difference between sample means -=-1
This means the average age in country 1 is 1 lower than the average age in country 2
Comparing average age in both countries there is not much of a difference between the average age of each country so you cannot use age to explain why there is a difference between the two countries
Inferential statistics
Computer output that measures the amount of evidence for the relationship between variables
“which version?” and “would they buy the product?” for country 1
n1 | n2 | phat 1 | phat 2 | |
50 | 50 | 0.7 | 0.5 | |
Estimate of the difference between sample proportions | ||||
0.2 | ||||
standard error of estimate | test stat | two sided pvalue | ||
0.09797959 | -2.041241452 | 0.0412268 | ||
To calculate the p-value H0:p1=p2 is assumed to be true | ||||
since the test is two sided H1 is H1:p1≠p2 |
The p-value is less than 0.05 so there is strong evidence there is a difference between population proportions
Checking the claim there is a relationship between the variable “which version?” and “would they buy the product?” for country 2
n1 | n2 | phat 1 | phat 2 | |
50 | 50 | 0.8 | 0.5 | |
Estimate of the difference between sample proportions | ||||
0.3 | ||||
standard error of estimate | test stat | two sided pvalue | ||
0.0954 | 3.14 | 0.00166 | ||
To calculate the p-value H0:p1=p2 is assumed to be true | ||||
since the test is two sided H1 is H1:p1≠p2 |
The p-value is less than 0.05 so there is strong evidence there is a difference between population proportions
Comparison of the amount of evidence for an upswing in county 1 and country 2
Country 2 has more evidence because there is a larger upswing in the sample so there is more evidence of an upswing in the sample so country 2 has a lower p-value
Computer output that measures evidence for claim there is a relationship between the variables age and which country
Estimate | ||
xbar1-xbar2 | ||
-1 | ||
standard error of estimate xbar1-xbar2 | ||
2.475883681 | ||
t test stat | df | two sided pvalue |
-0.403896196 | 197 | 0.68673 |
To calculate the p-value H0:μ1=μ2 is assumed to be true | ||
since the test is two sided H1 is H1:μ1≠μ2 |
The p-value is not less than 0.05 so there is strong evidence there is a difference between population means
Conclusion
Just considering the sample the new version is more popular in both countries, but it is more successful in country 2.
There is strong evidence the results above also apply to the whole population of country 1 There is stronger evidence the results above also apply to the whole population of country 2
The existing dataset does not have any information that explains why the countries are different. The only other variable is age and there is no significant difference.
More variables could be gathered to find out why the new version of the product is more successful in country 2 “