Instructions for assignment file 2 of 3

Overview

You need to submit a word file with the answers to 9 questions.

You must use the datasets in file 1 and the automatic dataset summarizer to get the descriptive statistics that are used questions 1 to 5 and the inferential statistics that are used in question 6 to 8.

The word count can be less than 1500 words if you are giving answers that demonstrate you have understood the material.

File 1 Dataset overview

Each student is given two datasets, students MUST use the datasets they are given, They CANNOT use datasets they make themselves or take from other sources. one dataset is information about each staff member in call centre 1. one dataset is information about each staff member in call centre 2.

Each of the datasets consists of the following variables,

*Original staff or replacement staff? Is the staff member one of the original staff or replacement staff, this is a categorical variable.

*Median call time?, The duration of each of the staff member’s calls is recorded and the median call time is calculated, this is a quantitative variable.

*Median above 3 minutes? This is a categorical variable because the possible answers are yes or no.

*Number of complaints? The number of complaints lodged against the staff member, this is a quantitative variable. For the original staff it is the number of complaints in their last month in for the replacement staff it is the number of complaints in the first month

question 1 THERE ARE TWO OPTIONS FOR QUESITON 1 JUST PICK ONE OPTION OPTION 1 a) Just using the information for call centre 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Original staff or Replacement staff?” and “median call time?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

b) Just using the information for call centre 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median call time?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

c) Compare the results in parts (a) and (b) OPTION 2

a) Just using the information for call centre 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Original staff or Replacement staff?” and “number of complaints?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

b) Just using the information for call centre 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “number of complaints?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficien r

c) Compare the results in parts (a) and (b)

question 2

a) Just using the information for call centre 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Original staff or Replacement staff?” and “Median above 3 minutes?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

b) Just using the information for call centre 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median above 3 minutes?”?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

c) Compare the results in parts (a) and (b)

question 3

a) Just using the information for call centre 1

i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “median call time?” and “number of complaints?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

b) Just using the information for call centre 2

i) Paste in descriptive sample statistics and a graph that let you investigate the claim there is a relationship between the variables “median call time?” and “number of complaints?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means – Difference between sample proportions – correlation coefficient r

c) Compare the results in parts (a) and (b)

Question 4

Just using the call centre 1 data set

a) Just considering the original staff

i) What is sample size and sample mean of median call time

ii) what is the zscore of sample mean if population mean is 3 and population standard deviation is 1.1

b) Just considering the replacement staff

i) What is sample size and sample mean of median call time

ii) what is the zscore of sample mean if population mean is 3 and population standard deviation is 1.1

Question 5

Just using the call centre 1 data set

a) Just considering the original staff

i) What is the sample size

ii) What is the sample proportion of workers that have a median call time above 3 minutes

ii) use the answer in part (i) and (ii) to find a 95% confidence interval of the population proportion

b) Just considering the replacement staff

i) What is the sample size

ii) What is the sample proportion of workers that have a median call time above 3 minutes

ii) use the answer in part (i) and (ii) to find a 95% confidence interval of the population proportion

question 6

a) Just using the information for call centre 1

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median call time?” if you consider the whole population

ii) make suitable comments about the output in part (i)

b) Just using the information for call centre 2

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “Median call time?” if you consider the whole population

ii) make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

question 7

a) Just using the information for call centre 1

i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median above 3 minutes?” if you consider the whole population

ii) make suitable comments about the output in part (i)

b) Just using the information for call centre 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Original staff or Replacement staff?” and “median above 3 minutes?” if you consider the whole population

ii) make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

question 8

a) Just using the information for call centre 1

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “median call time?” and “number of complaints?” if you consider the whole population

ii) make suitable comments about the output in part (i)

b) Just using the information for call centre 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “median call time?” and “number of complaints?” if you consider the whole population

ii) make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

Question 9

Comment on the sample report on pages 7 to 11 of this document discuss the main message of the report and how it is communicated Hint: If you have no idea how to comment on a report you can download the following example it is a report with a comment on the first page.

https://app.box.com/s/2epgchvf2ljhlx9chgx8iey5rugxwn89

Sample report you must use this to answer question 9

“Report Title : Market research checking if the new version of a product is more popular

Introduction: Company XYZ has tried has a new version of a product and has conducted some market research using a sample of 100 people to check if people prefer a new version of the product and if there is a difference in popularity in two different countries. “

“Description the dataset:

The first 8 rows of the data set are given below, the dataset has 100 rows , Each row gives information a person that is reviewing a version of the product

Which country?Which version ?Would they buy the product?Age?
Country 1old versionwould buy40
Country 1old versionwould not buy45
Country 1new versionwould buy42
Country 1new versionwould not buy41
Country 2new versionwould buy35
Country 2old versionwould buy36
Country 2old versionwould not buy33
Country 2old versionwould buy23

The variables are

Which country? Country 1 or country 2, this is a categorical variable

Which version? Which version was reviewed, the old version or new version. This is a categorical variable

Would they buy the product? Would the person buy the product or not buy the product? This is a categorical variable

Age? How many years old is the person reviewing the product? This is a quantitative variable “

“Main findings

Descriptive sample statistics

Just considering country 1, what is the popularity of the new version and old version

The following output lets you see the relationship between the variable “which version” or “would they buy the product”

Would you buy the product
Row LabelsWould buyWould not buyGrand Total
New version count351550
New version %70%30%100%
Old version count252550
Old version %50%50%100%

70% want to buy the new version whereas only 50% of people want to buy the old version of the product , formally speaking the difference in sample proportions is =0.7-0.5=0.2=20% so there is a 20% upward swing

Just considering country 2, what is the popularity of the new version and old version

The following output lets you see the relationship between the variable “which version” or “would they buy the product”

we can also check if there is a relationship between the variable “Does the manager want to keep the product” and region

Would you buy the product
Row LabelsWould buyWould not buyGrand Total
New version count40`1050
New version %80%20%100%
Old version count252550
Old version %50%50%100%

80% o want to buy the new version whereas only 50% of people want to buy the old version of the product

=0.8-0.5=0.3=30% so there is a 30% upward swing

When you compare country 1 and country 2 , country 2 has larger upswing so the new product is more successful in country 2 .

Comparing the ages in country 1 and country 2

Average ageStandard deviationSample size
Country 14018100
Country 24117100

Difference between sample means -=-1

This means the average age in country 1 is 1 lower than the average age in country 2

Comparing average age in both countries there is not much of a difference between the average age of each country so you cannot use age to explain why there is a difference between the two countries

Inferential statistics

Computer output that measures the amount of evidence for the relationship between variables

“which version?” and “would they buy the product?” for country 1

n1n2 phat 1phat 2
5050 0.70.5
Estimate of the difference between sample proportions  
0.2    
     
standard error of estimatetest stat two sided pvalue
0.09797959 -2.041241452 0.0412268
To calculate the p-value H0:p1=p2 is assumed to be true  
since the test is two sided H1 is H1:p1≠p2  

The p-value is less than 0.05 so there is strong evidence there is a difference between population proportions

Checking the claim there is a relationship between the variable “which version?” and “would they buy the product?” for country 2

n1n2 phat 1phat 2
5050 0.80.5
Estimate of the difference between sample proportions 
0.3    
     
standard error of estimatetest stat two sided pvalue
0.0954 3.14 0.00166
To calculate the p-value H0:p1=p2 is assumed to be true 
since the test is two sided H1 is H1:p1≠p2  

The p-value is less than 0.05 so there is strong evidence there is a difference between population proportions

Comparison of the amount of evidence for an upswing in county 1 and country 2

Country 2 has more evidence because there is a larger upswing in the sample so there is more evidence of an upswing in the sample so country 2 has a lower p-value

Computer output that measures evidence for claim there is a relationship between the variables age and which country

Estimate  
xbar1-xbar2  
-1  
standard error of estimate xbar1-xbar2 
2.475883681  
t test statdftwo sided pvalue
-0.4038961961970.68673
To calculate the p-value H0:μ1=μ2 is assumed to be true
since the test is two sided H1 is H1:μ1≠μ2 

The p-value is not less than 0.05 so there is strong evidence there is a difference between population means

Conclusion

Just considering the sample the new version is more popular in both countries, but it is more successful in country 2.

There is strong evidence the results above also apply to the whole population of country 1 There is stronger evidence the results above also apply to the whole population of country 2

The existing dataset does not have any information that explains why the countries are different. The only other variable is age and there is no significant difference.

More variables could be gathered to find out why the new version of the product is more successful in country 2 “