Top Banner
Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data
50

Dr. Ka-fu Wong

Feb 09, 2016

Download

Documents

chelsa

Dr. Ka-fu Wong. ECON1003 Analysis of Economic Data. Chapter Twelve. An alysis o f Va riance. GOALS. Discuss the general idea of analysis of variance. List the characteristics of the F distribution. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 1

Dr. Ka-fu Wong

ECON1003Analysis of Economic Data

Page 2: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 2l

GOALS

1. Discuss the general idea of analysis of variance.2. List the characteristics of the F distribution.3. Conduct a test of hypothesis to determine whether the

variances of two populations are equal.4. Organize data into a one-way and a two-way ANOVA

table.

5. Define and understand the terms treatments and blocks.6. Conduct a test of hypothesis among three or more

treatment means.7. Develop confidence intervals for the difference between

treatment means.8. Conduct a test of hypothesis to determine if there is a

difference among block means.

Chapter TwelveAnAnalysis alysis oof f VaVarianceriance

Page 3: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 3

Two Sample Tests

TEST FOR EQUAL VARIANCESTEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANSTEST FOR EQUAL MEANS

HHo

HH1

Population 1

Population 2

Population 1

Population 2

HHo

HH1

Population 1

Population 2

Population 1Population 2

Page 4: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 4

Characteristics of F-Distribution

There is a “family” of F Distributions.

Each member of the family is determined by two parameters: the numerator degrees of freedom and the denominator degrees of freedom.

F cannot be negative, and it is a continuous distribution.

The F distribution is positively skewed. Its values range from 0 to . As F

the curve approaches the X-axis.

Page 5: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 5

The F-Distribution, F(m,n)

0 1.0

Not symmetric (skewed to the right)

F

Nonnegative values only

Each member of the family is determined by two parameters: the numerator degrees of freedom (m) and the denominator degrees of freedom (n).

Page 6: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 6

Test for Equal Variances

For the two tail test, the test statistic is given by:

where s12 and s2

2 are the sample variances for the two samples.

The null hypothesis is rejected at level of significance if the computed value of the test statistic is greater than the critical value with a confidence level /2 and numerator and denominator dfs.

),(

),( arg=

22

21

22

21SSofSmaller

SSoferLF ),(

),( arg=

22

21

22

21SSofSmaller

SSoferLF

Page 7: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 7

Test for Equal Variances

For the one tail test, the test statistic is given by:

where s12 and s2

2 are the sample variances for the two samples.

The null hypothesis is rejected at level of significance if the computed value of the test statistic is greater than the critical value with a confidence level and numerator and denominator dfs.

22

2112

2

21 > :H if = σσS

SF 2

22112

2

21 > :H if = σσS

SF

Page 8: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 8

EXAMPLE 1

Colin, a stockbroker at Critical Securities, reported that the mean rate of return on a sample of 10 internet stocks was 12.6 percent with a standard deviation of 3.9 percent. The mean rate of return on a sample of 8 utility stocks was 10.9 percent with a standard deviation of 3.5 percent. At the .05 significance level, can Colin conclude that there is more variation in the software stocks?

Page 9: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 9

EXAMPLE 1 continued

Step 1: The hypotheses are:

Step 2: The significance level is .05.

Step 3: The test statistic is the F distribution.

Step 4: H0 is rejected if F>3.68. The degrees of freedom are 9 in the numerator and 7 in the denominator.

Step 5: The value of F is

221

220

:

:

UI

UI

H

H

2416.1)5.3(

)9.3(2

2

F

H0 is not rejected. There is insufficient evidence to show more variation in the internet stocks.

Page 10: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 10

Analysis of Variance(ANOVA)

Page 11: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 11

Underlying Assumptions for ANOVA

The F distribution is also used for testing whether two or more sample means came from the same or equal populations. if any group mean differs from the mean of all

groups combinedAnswers: “Are all groups equal or not?”

This technique is called analysis of variance or ANOVA.

ANOVA requires the following conditions: The sampled populations follow the normal

distribution. The populations have equal standard

deviations. The samples are randomly selected and are

independent.

Page 12: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 12

The hypothesis

Suppose that we have independent samples of n1, n2, . . ., nK observations from K populations. If the population means are denoted by 1, 2, . . ., K, the one-way analysis of variance framework is designed to test the null hypothesis

ji1

210

, pair one least at For:

===:

μμμμH

μμμH

ji

K

Page 13: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 13

Sample Observations from Independent Random Samples of K Populations

Population1 2 . . . K

Mean1 2 . . . K

Variance2 2 . . . 2

Sample

observations

from the

population

x11

x12

.

.

.x1n1

x21

x22

.

.

.x2n2

. . .. . .

. . .

xK1

xK2

.

.

.xKnK

Sample sizen1 n2 . . . nK

Same !!

unequal !!

Unequal number of observations in the K samples in general.nT=n1+…+nK

Page 14: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 14

Sum of Squares Decomposition for one-way analysis of variance

Suppose that we have independent samples of n1, n2, . . ., nK observations from K populations.

Denote by the K group sample means and by the overall sample mean. We define the following sum of squaressum of squares:

Kxxx ,,, 21 x

∑∑ -in

jiij

K

i

xxSSE1

2

1

)( :Groups)-(Within Error Squares of Sum

∑ -K

iii xxnSST

1

2)( :Groups)-(Between Treatment Squares of Sum

∑∑ -in

jij

K

i

xxSSTotal1

2

1

)( :Total Squares of Sum

where xij denotes the jth sample observation in the ith group.

Page 15: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 15

An Numerical Example of Sum of Squares Decomposition

Population

1 2 3

Mean 1 2 K

Variance 2 2 2

Sample obs from

the populatio

n(xij)

123

2345

135

Sample size (nj)

3 4 3

Sample mean

2 3.5 3

Grand mean

2.9

9.18

])9.25(...)9.21[(...])9.23(...)9.21[(

)(

2222

1

2

1

∑∑ -in

jij

K

i

xxSSTotal

9.3

)9.23(3)9.25.3(4)9.22(3)( 222

1

2

∑ -K

iii xxnSST

15

])35(...)31[(...])23(...)21[(

)(

2222

1

2

1

∑∑ -in

jiij

K

i

xxSSE

SSTotal = SST + SSESSTotal = SST + SSE

Page 16: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 16

A proof of SSTotal = SST + SSE

Population

1 2 . . . K

Sample obs

x11

x12

.

.

.x1n1

x21

x22

.

.

.x2n2

. . .. . .

. . .

xK1

xK2

.

.

.xKnK

Sample size

n1 n2 . . . nK

SSTSSE

xxnxx

xxxxxxnxx

xxxxxxxx

xxxxxxxx

xxxx

xxxx

xx

SSTotal

K

ij

n

jij

K

i

n

jij

K

i

K

ij

n

jij

K

i

n

jij

K

i

n

j

K

i

n

jij

K

i

n

jijij

K

i

n

jij

K

i

n

jij

K

i

n

jij

K

i

i

i

i

i

iii

i

i

i

ii

i

i

i

i

i

iiii

i

ii

i

ii

i

∑∑∑

∑∑∑∑∑

∑∑∑∑∑∑

∑∑

∑∑

∑∑

∑∑ -

1

2

1

2

1

111

2

1

2

1

111

2

11

2

1

1

22

1

1

2

1

1

2

1

1

2

1

)()(

)()(2)()(

))((2)()(

)])((2)()[(

)]()[(

)(

)(

ii

i

i

ii

i xnxnxxxx ii

n

j

n

jij

n

jij

∑∑∑111

)(

Page 17: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 17

Two Ways to estimate the population variance

Note that the variance is assumed to be identical across populations

If the population means are identical, we have two ways to estimate the population varianceBased on the K sample variances.Based on the deviation of the K sample

means from the grand mean.

Page 18: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 18

An estimate the population variance based on sample variances

Anyone of the K sample variances can be used to estimate the population.

)1/()(ˆ1

222

i

n

jiiji nxxs

i

∑ -

KnSSE

Knsn

nxx

K

ii

K

ii

K

iii

K

ii

n

jiij

K

i

i

)/(

)/()1(

)1(/)(ˆ

1

11

2

11

2

1

2

∑∑

∑∑∑ -

We can get a more precise estimate if we use all the information from the K samples.

Page 19: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 19

An estimate the population variance based on deviation of the K sample means from the grand sample mean.

If the sample sizes are the same for all samples, the Central Limit Theorem suggests that sample mean will be distributed normally with the population mean and the population variance divided by sample size.

)1/()(ˆ1

22

KxxnK

i

i∑

)1/(

)1/()(ˆ1

22

KSST

KxxnK

ii i∑

When sample sizes are different across samples, we will have to weight

???

Page 20: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 20

Comparing the Variance Estimates: The F Test

If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of ratio of the two variance estimates follows F distribution with K - 1 and nT - K.

If the means of the K populations are not equal, the value of F-stat will be inflated because SST/(K-1) will overestimate2.

Hence, we will reject H0 if the resulting value of F-stat appears to be too large to have been selected at random from the appropriate F distribution.

)/(

)1/(

)/(

)1/(

1

KnSSE

KSST

KnSSE

KSSTstatF

TK

ii

Page 21: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 21

Test for the Equality of k Population Means

Hypotheses H0: 1=2=4=….=k

H1: Not all population means are equal

Test StatisticF = [SST/(K-1)] / [SSE/(nT-K)]

Rejection Rule Reject H0 if F > F

where the value of F is based on an F distribution with k - 1 numerator degrees of freedom and nT - K denominator degrees of freedom.

Page 22: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 22

Sampling Distribution of MST/MSE

Do Not Reject Do Not Reject HH00Do Not Reject Do Not Reject HH00

Reject Reject HH00Reject Reject HH00

MST/MSE

Critical ValueFF

The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value.

Page 23: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 23

The ANOVA Table

Source of Variation

Sum of Squares

Degree of

Freedom

Mean Squares F

Treatment SST K-1 MST MST/MSE

Error SSE nT-K MSE

Total SSTotal nT-1

Page 24: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 24

Does learning method affect student’s exam scores?

Consider 3 methods: standard osmosis shock therapy

Convince 15 students to take part. Assign 5 students randomly to each method.

Wait eight weeks. Then, test students to get exam scores.

Are the three learning methods equally effective? i.e., are their population means of exam

scores same?

Page 25: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 25

“ Analysis of Variance” (Study #1)

The variation between the group means and the grand mean is larger than the variation within each of the groups.

Page 26: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 26

ANOVA Table for Study #1

One-way Analysis of Variance

Source DF SS MS F PFactor 2 2510.5 1255.3 93.44 0.000Error 12 161.2 13.4Total 14 2671.7

“ Source” means “find the components of variation in this column”

“ DF” means “degrees of freedom”

“ SS” means “sums of squares”

“ F” means “F test statistic”

“ MS” means “mean squared”

P-Value

Page 27: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 27

ANOVA Table for Study #1

One-way Analysis of Variance

Source DF SS MS F PFactor 2 2510.5 1255.3 93.44 0.000Error 12 161.2 13.4Total 14 2671.7

“ Factor” means “Variability between groups” or “Variability due to the factor of interest” “ Error” means “Variability within groups” or “unexplained random variation”

“ Total” means “Total variation from the grand mean”

Page 28: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 28

ANOVA Table for Study #1

One-way Analysis of Variance

Source DF SS MS F PFactor 2 2510.5 1255.3 93.44 0.000Error 12 161.2 13.4Total 14 2671.7

14 = 2 + 12

2671.7 = 2510.5 + 161.2

1255.2 = 2510.5/2 13.4 = 161.2/12

93.44 = 1255.3/13.4

Page 29: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 29

“ Analysis of Variance” (Study #2)

The variation between the group means and the grand mean is smaller than the variation within each of the groups.

Page 30: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 30

ANOVA Table for Study #2

One-way Analysis of Variance

Source DF SS MS F PFactor 2 80.1 40.1 0.46 0.643Error 12 1050.8 87.6Total 14 1130.9

The P-value is pretty large so cannot reject the null hypothesis. There is insufficient evidence to conclude that the average exam scores differ for the three learning methods.

Page 31: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 31

Do Holocaust survivors have more sleep problems than others?

Page 32: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 32

ANOVA Table for Sleep Study

One-way Analysis of Variance

Source DF SS MS F PFactor 2 1723.8 861.9 61.69 0.000Error 117 1634.8 14.0Total 119 3358.6

The P-value is so small that we reject the null hypothesis of equal population means and favor the alternative hypothesis that at least one pair of population means are different.

Page 33: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 33

Potential problem with the analysis

What is driving the rejection of null of equal population means? From the plot, the Healthy and Depress seem to have

different mean sleep quality. It looks like that the rejection is due to the difference between these two groups.

If we pooled Healthy and Depress, the distribution will look more like Survivor. That is, an acceptance of the null is more likely.

This example illustratse that we have to be careful about our analysis and interpretation of the result when we conduct a test of equal population means.

Page 34: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 34

EXAMPLE 2

Rosenbaum Restaurants specialize in meals for senior citizens. Katy Polsby, President, recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants. She would like to know if there is a difference in the mean number of dinners sold per day at the Anyor, Loris, and Lander restaurants. Use the .05 significance level.

Page 35: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 35

Example 2 continued

# of dinners sold per day

Obs Aynor Loris Lander

1 13 10 18

2 12 12 16

3 14 13 17

4 12 11 17

5 17

Page 36: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 36

EXAMPLE 2 continued

Step 1: H0: 1 = 2 = 3

H1: Treatment means are not the same

Step 2: H0 is rejected if F>4.10. There are 2 df in the numerator and 10 df in the denominator.

Page 37: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 37

Example 2 continued

To find the value of F:

Source SS df MS F p-value

Treatment 76.25 2 38.125 39.10 1.87E-05

Error 9.75 10 0.975

Total 86.00 12      

The decision is to reject the null hypothesis. The treatment means are not the same. The mean number of meals sold at the three

locations is not the same.

Page 38: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 38

Inferences About Treatment Means

When we reject the null hypothesis that the means are equal, we may want to know which treatment means differ.

One of the simplest procedures is through the use of confidence intervals.

Page 39: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 39

Confidence Interval for the Difference Between Two Means

where t is obtained from the t table with degrees of freedom (nT - k).

MSE = [SSE/(nT - k)]

21

2111

)(nn

MSEtXX

because

Page 40: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 40

EXAMPLE 3

From EXAMPLE 2 develop a 95% confidence interval for the difference in the mean number of meat loaf dinners sold in Lander and Aynor. Can Katy conclude that there is a difference between the two restaurants?

)73.5,77.2(48.125.4

5

1

4

1975.228.2)75.1217(

Because zero is not in the interval, we conclude that this pair of means differs.

The mean number of meals sold in Aynor is different from Lander.

Page 41: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 41

Two-Factor ANOVA

For the two-factor ANOVA we test whether there is a significant difference between the treatment effect and whether there is a difference in the blocking effect.

Page 42: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 42

Sample Observations from Independent Random Samples of K Populations

TREATMENT

1 2 . . . K

BLOCK

1 x11 x21 . . . xK1

2 x12 x22 . . . xK2

.

.

.

.

.

.

.

.

.

. . . . . . . . .

.

.

.

B x1B x2B . . . xKB

Page 43: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 43

Sum of Squares Decomposition for Two-Way Analysis of Variance

Suppose that we have a sample of observations with xij denoting the observation in the ith group and jth block. Suppose that there are K groups and B blocks, for a total of n = KH observations. Denote the group sample means by ,

the block sample means by and the overall sample mean by x.

B

1j

2j )xx(KSSB

),,2,1( Kixi

K

1i

2i )xx(BSST

B

1j

2ij

K

1i

)x(xSSTotal

),,2,1( Bjx j

B

1j

2jiij

K

1i

)xxx(xSSE

SSTotal = SSE+SST+SSBSSTotal = SSE+SST+SSB

Page 44: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 44

General Format of Two-Way Analysis of Variance Table

Source of Variation

Sums of Squares

Degrees of

Freedom

Mean Squares F Ratios

Treatments

SST K-1 MST=SST/K-1) MST/MSE

Blocks SSB B-1 MSB=SSB/(B-1) MSB/MSE

Error SSE (K-1)(B-1)

MSE=SSE/[(K-1)(B-1)]

Total SSTotal nT-1

Page 45: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 45

EXAMPLE 4

The Bieber Manufacturing Co. operates 24 hours a day, five days a week. The workers rotate shifts each week. Todd Bieber, the owner, is interested in whether there is a difference in the number of units produced when the employees work on various shifts. A sample of five workers is selected and their output recorded on each shift. At the .05 significance level, can we conclude there is a difference in the mean production by shift and in the mean production by employee?

Page 46: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 46

EXAMPLE 4 continued

Employee Day Output

Evening Output

Night Output

McCartney 31 25 35

Neary 33 26 33

Schoen 28 24 30

Thompson 30 29 28

Wagner 28 26 27

Page 47: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 47

EXAMPLE 4 continued

TREATMENT EFFECTStep 1: H0: µ1= µ2= µ3 versus H1:

Not all means are equal.

Step 2: H0 is rejected if F>4.46, the degrees of freedom are 2 and 8.

Page 48: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 48

Example 4 continued

Step 3: Compute the various sum of squares:

Source SS df MS F p-value

Treatments 62.53 2 31.267 5.75 .0283

Blocks 33.73 4 8.433 1.55 .2762

Error 43.47 8 5.433

Total 139.73 14      

Step 4: H0 is rejected. There is a difference in the mean number of units produced for the different time periods.

Page 49: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 49

EXAMPLE 4 continued

Block Effect:Step 1: H0: µ1= µ2= µ3 = µ4 = µ5 versus H1:

Not all means are equal.

Step 2: H0 is rejected if F>3.84, the degrees of freedom are 4 and 8.

Step 3: F=[33.73/4]/[43.47/8]=1.55

Step 4: H0 is not rejected since there is no significant difference in the average number of units produced for the different employees.

Page 50: Dr. Ka-fu Wong

Ka-fu Wong © 2003 Chap 12- 50

- END -

Chapter TwelveAnalysis of VarianceAnalysis of Variance