Analisis Komponen Utama Dg R

7/17/2019 Analisis Komponen Utama Dg R

http://slidepdf.com/reader/full/analisis-komponen-utama-dg-r 1/29

Analisis Komponen Utama/ Principal Component Analysis

(Teori)

•

•

Tujuannya mereduksi dimensi peubah yang saling berkorelasi menjadi peubah2 baru yang tidak

berkorelasi dengan tetap mempertahankan sebanyak mungkin keragaman data asalnya. (patokan

80%)

Misal ada 1000 variable apa kelebihan n kekurangannya..!!

1. Terlalu rumit

2. "egi interpretasi sulit

"ehingga perlu dilakukan reduksi data. "yaratnya harus ada korelasi kuat antar variable.

#angkah langkah $& ' $*

engujian hipotesis matriks korelasi+ melihat ada tidaknya korelasi yang erat antarvariable. dengan menggunakan uji bartlet*

H0 : ρ =Ip ("elain diagonal utama,0 $rtinya korelasi antar peubah 0)

-1 * ρ≠ Ip ("elain diagonal utama,0 $rtinya korelasi yang erat antar peubah

UJi Barlett:

n = jumlah observasi p , jumlah variable R , matrik korelasi (estimasi) = determinan

matrik korelasi

Tolak -0 jika

x2 hitung > x2 tabel

arena kita niatnya make $& yang kita harapkan Tolak -0. $rtinya antar variable a/al ada

korelasi sehingga tujuan reduksi' penyusutan dimensi data menjadi terapai.1. Menari akar iri dari matriks kovarian (") atau basis korelasi (). ika satuan variable

sama pake kovarian jika satuan berbeda pake korelasi.

2. Mengurutkan akar iri yang diperoleh dari terbesar ke terkeil (3 143 2...4 3 p4 0)

5. Membuat peubah baru (komponen utama) yang merupakan kombinasi linear dari peubahasalnya.

Membuat vetor iri yang dinormalisasi (dibuat orthonormal) dari masing2 akar iri yang

bersesuaian

61,e17,e1191:;:e1p9 p

62,e27,e2191:;:e2p9 p



;

6 p,e p7,e p191:;:e pp9 p

<disini belum ada proses reduksi

=imana 7,>91 ; 9 p?

"i@at peubah baru* Tidak saling berkorelasi dan berurutan dari ukuran kepentingannya. 61

paling penting sampai 6p 1. Melakukan proses reduksi KU yang terbentuk. Ada 3 ara :

1. =engan proporsi keragaman (bagi akar iri per total akar iri)

2. $kar iri A1

5. "ree plot

Misal * proporsi keragaman

61,e17,e1191:;:e1p9 p BC%

62,e27,e2191:;:e2p9 p 25%

;

6 p,e p7,e p191:;:e pp9 p

roporsi keragaman variable baru pertama belum ukup sehingga ditambah dengan variable

baru kedua. adi banyaknya & yang terbentuk adalah 2.

$kar iri* "elama akar irinya A1 itulah banyaknya &.

"ree plot * dilihat landau uramnya dan besarnya akar iri. (sree plot tu plot antara jumlahvariable dengan akar irinya)

Melakukan pena!aan pada KU yang digunakan setelah ter"adi proses reduksi. Ada 2

ara:1. orelasi antar & dengan variable asalnya. orelasi yang besar tu yang menirikan &

2. =engan melihat penimbang (weighting )

61,e17,e1191:;:e1 p9p penimbang tu eDnya. enimbangnya yang paling besar. alo

penimbangnya beda2 tipis berarti & diirikan oleh variable2 tsb.



5 functions to do Principal Components

Analysis in RPosted on June 17, 2012

Principal Component Analysis (PCA) is a multivariate technique that allows us to

summarize the systematic patterns of variations in the data.

From a data analysis standpoint, PCA is used for studying one table of observations and

variables with the main idea of transforming the observed variables into a set of new

variables, the principal components, which are uncorrelated and explain the variation in

the data. For this reason, PCA allows to reduce a “complex” data set to a lower

dimension in order to reveal the structures or the dominant types of variations in both

the observations and the variables.

PCA in R

In R, there are several functions from different packages that allow us to perform PCA.

In this post I’ll show you 5 different ways to do a PCA using the following functions (with

their corresponding packages in parentheses):

• prcomp() (stats)

• princomp() (stats)

• PCA() (FactoMineR)

• dudi.pca() (ade4)

• acp() (amap)

Brief note: It is no coincidence that the three external packages ("FactoMineR","ade4",

and"amap") have been developed by French data analysts, which have a long tradition

and preference for PCA and other related exploratory techniques.

No matter what function you decide to use, the typical PCA results should consist of a

set of eigenvalues, a table with the scores or Principal Components (PCs), and a table

of loadings (or correlations between variables and PCs). The eigenvalues provide

information of the variability in the data. The scores provide information about the

structure of the observations. The loadings (or correlations) allow you to get a sense of

the relationships between variables, as well as their associations with the extracted

PCs.

http://en.wikipedia.org/wiki/Principal_component

http://en.wikipedia.org/wiki/Principal_component



The Data

To make things easier, we’ll use the datasetUSArrests that already comes with R. It’s a

data frame with 50 rows (USA states) and 4 columns containing information about

violent crime rates by US State. Since most of the times the variables are measured in

different scales, the PCA must be performed with standardized data (mean = 0,

variance = 1). The good news is that all of the functions that perform PCA come with

parameters to specify that the analysis must be applied on standardized data.

Option 1: using prcomp()

The functionprcomp() comes with the default"stats" package, which means that you

don’t have to install anything. It is perhaps the quickest way to do a PCA if you don’t

want to install other packages.

# PCA with function prcomppca1 = prcomp(USArrests, scale. = TRUE)

# sqrt of eigenvalues

pca1$sdev

!1 1.#$4% &.%%4% &.#%$1 &.41'4

# loadings

head (pca1$rotation)

PC1 PC PC PC4

Murder *&.##% &.41+ *&.41 &.'4%

Assault *&.#+ &.1++& *&.'+1 *&.$441

UranPop *&.$+ *&.+$+ *&.$+& &.1++

Rape *&.#44 *&.1'$ &.+1$+ &.&+%&

# PCs (aka scores)

head (pca1$-)

PC1 PC PC PC4

Alaama *&.%$#$ 1.1& *&.4%+& &.1#4$&

Alasa *1.%&# 1.&'4 .&1%#& *&.441+

Ari/ona *1.$4#4 *&.$+# &.&#4 *&.+''

Aransas &.14&& 1.1&+# &.114 *&.1+&%$

Cali0ornia *.4%+' *1.#$4 &.#%#4 *&.+#'

Colorado *1.4%% *&.%$$' 1.&+4&& &.&&14#



Option 2: using princomp()

The functionprincomp() also comes with the default"stats" package, and it is very

similar to her cousinprcomp(). What I don’t like ofprincomp() is that sometimes it won’t

display all the values for the loadings, but this is a minor detail.# PCA with function princomp

pca = princomp(USArrests, cor = TRUE)


pca$sdev

Comp.1 Comp. Comp. Comp.4

1.#$4% &.%%4% &.#%$1 &.41'4

# loadings

unclass(pca$loadins)


Murder *&.##% &.41+ *&.41 &.'4%

Assault *&.#+ &.1++& *&.'+1 *&.$441

UranPop *&.$+ *&.+$+ *&.$+& &.1++

Rape *&.#44 *&.1'$ &.+1$+ &.&+%&

# PCs (aka scores)

head (pca$scores)


Alaama *&.%+#' 1.14 *&.444$ &.1#''$

Alasa *1.%#&1 1.&$ .&4&&& *&.4+#+

Ari/ona *1.$' *&.$4'& &.&#4$+ *&.+4'#

Aransas &.1414 1.11%+ &.114#$ *&.1++11

Cali0ornia *.#4& *1.#4% &.#%+#' *&.41%%'

Colorado *1.#14' *&.%+$' 1.&%#&1 &.&&14'#

Option 3: using PCA()

A highly recommended option, especially if you want more detailed results and

assessing tools, is thePCA() function from the package"FactoMineR". It is by far thebest PCA function in R and it comes with a number of parameters that allow you to

tweak the analysis in a very nice way.# PCA with function PCA

lirar2(FactoMineR)

# apply PCA



pca = PCA(USArrests, rap3 = FALSE)

# matrix with eigenvalues

pca$ei

eienvalue percentae o0 variance cumulative percentae o0 variance

comp 1 .4+& '.&&' '.&1

comp &.%+%+ 4.$44 +'.$#

comp &.#'' +.%14 %#.''

comp 4 &.1$4 4.' 1&&.&&

# correlations between variables and PCs

pca$var$coord

im.1 im. im. im.4

Murder &.+44& *&.41'& &.&+ &.$&$

Assault &.%1+4 *&.1+$& &.1'&1 *&.&%#%

UranPop &.4+1 &.+'+ &.#$ &.&##$#

Rape &.+##+ &.1''# *&.4++ &.&$&$

# PCs (aka scores)

head (pca$ind$coord)

im.1 im. im. im.4

Alaama &.%+#' *1.14 &.444$ &.1#''$

Alasa 1.%#&1 *1.&$ *.&4&&& *&.4+#+

Ari/ona 1.$' &.$4'& *&.&#4$+ *&.+4'#

Aransas *&.1414 *1.11%+ *&.114#$ *&.1++11

Cali0ornia .#4& 1.#4% *&.#%+#' *&.41%%'

Colorado 1.#14' &.%+$' *1.&%#&1 &.&&14'#

Option 4: using dudi.pca()

Another option is to use thedudi.pca() function from the package"ade4"which has a

huge amount of other methods as well as some interesting graphics.# PCA with function dudipca

lirar2(ade4)

# apply PCA

pca4 = dudi.pca(USArrests, n0 = #, scann0 = FALSE)

# eigenvalues

pca4$ei

!1 .4+& &.%+%+ &.#'' &.1$4



# loadings

pca4$c1

CS1 CS CS CS4

Murder *&.##% &.41+ *&.41 &.'4%

Assault *&.#+ &.1++& *&.'+1 *&.$441

UranPop *&.$+ *&.+$+ *&.$+& &.1++

Rape *&.#44 *&.1'$ &.+1$+ &.&+%&

# correlations between variables and PCs

pca4$co

Comp1 Comp Comp Comp4

Murder *&.+44& &.41'& *&.&+ &.$&$

Assault *&.%1+4 &.1+$& *&.1'&1 *&.&%#%

UranPop *&.4+1 *&.+'+ *&.#$ &.&##$#

Rape *&.+##+ *&.1''# &.4++ &.&$&$

# PCs

head (pca4$li)

A-is1 A-is A-is A-is4

Alaama *&.%+#' 1.14 *&.444$ &.1#''$

Alasa *1.%#&1 1.&$ .&4&&& *&.4+#+

Ari/ona *1.$' *&.$4'& &.&#4$+ *&.+4'#

Aransas &.1414 1.11%+ &.114#$ *&.1++11

Cali0ornia *.#4& *1.#4% &.#%+#' *&.41%%'

Colorado *1.#14' *&.%+$' 1.&%#&1 &.&&14'#

Option 5: using acp()

A fifth possibility is theacp() function from the package"amap".# PCA with function acp

lirar2(amap)

# apply PCA

pca#= acp(USArrests)


pca#$sdev

Comp 1 Comp Comp Comp 4

1.#$4% &.%%4% &.#%$1 &.41'4

# loadings

pca#$loadins




Murder &.##% &.41+ *&.41 &.'4%

Assault &.#+ &.1++& *&.'+1 *&.$441

UranPop &.$+ *&.+$+ *&.$+& &.1++

Rape &.#44 *&.1'$ &.+1$+ &.&+%&

# scores

head (pca#$scores)


Alaama &.%$#$ 1.1& *&.4%+& &.1#4$&

Alasa 1.%&# 1.&'4 .&1%#& *&.441+

Ari/ona 1.$4#4 *&.$+# &.&#4 *&.+''

Aransas *&.14&& 1.1&+# &.114 *&.1+&%$

Cali0ornia .4%+' *1.#$4 &.#%#4 *&.+#'

Colorado 1.4%% *&.%$$' 1.&+4&& &.&&14#

Of course these are not the only options to do a PCA, but I’ll leave the other approaches

for another post.

PCA plots

Everybody uses PCA to visualize the data, and most of the discussed functions come

with their own plot functions. But you can also make use of the great graphical displays

of"plot". Just to show you a couple of plots, let’s take the basic resultsfrom prcomp().

Plot of observations

# load ggplot!

lirar2(plot)

# create data frame with scores

scores = as.data.frame(pca1$-)

# plot of observations

plot(data = scores, aes(- = PC1, 2 = PC, lael = rownames(scores))) +

eom53line(2intercept = &, colour = "ra2'#") +

eom5vline(-intercept = &, colour = "ra2'#") +

eom5te-t(colour = "tomato", alp3a = &.+, si/e = 4) +

title("PCA plot o0 USA States * Crime Rates")



Circle of correlations

# function to create a circle

circle <- function(center = c(&, &), npoints = 1&&) 6

r = 1

tt = seq (&, !i, lent3 = npoints)

-- = center!1 + r cos(tt)

22 = center!1 + r sin(tt)

return(data.frame(- = --, 2 = 22))

7

corcir = circle(c(&, &), npoints = 1&&)

# create data frame with correlations between variables and PCs

correlations = as.data.frame(cor(USArrests, pca1$-))

# data frame with arrows coordinates

arro8s = data.frame(-1 = c(&, &, &, &), 21 = c(&, &, &, &), - = correlations$PC1,

2 = correlations$PC)

# geom"path will do open circles

plot() + eom5pat3(data = corcir, aes(- = -, 2 = 2), colour = "ra2'#") +

eom5sement(data = arro8s, aes(- = -1, 2 = 21, -end = -, 2end = 2), colour =

"ra2'#") +



eom5te-t(data = correlations, aes(- = PC1, 2 = PC, lael =

rownames(correlations))) +

eom53line(2intercept = &, colour = "ra2'#") + eom5vline(-intercept = &,

colour = "ra2'#") + -lim(*1.1, 1.1) + 2lim(*1.1, 1.1) + las(- = "pc1 ai-s",

2 = "pc a-is") + title("Circle o0 correlations")

Published in categories how-to Tagged withprincipal

components analysis pca multivariate plot R

← previous next →

See all posts →

©Gaston Sanchez. All contents under(CC) BY-NC-SA license, unless otherwise noted.

Did you find this site useful? If yes, consider helping me with my wishlist.

http://gastonsanchez.com/blog/categories/#how-to


http://gastonsanchez.com/blog/tags/#principal%20components


http://gastonsanchez.com/blog/tags/#analysis

http://gastonsanchez.com/blog/tags/#pca

http://gastonsanchez.com/blog/tags/#multivariate


http://gastonsanchez.com/blog/tags/#plot

http://gastonsanchez.com/blog/tags/#R

http://gastonsanchez.com/blog/how-to/2012/06/08/Using-tapply-ddply-and-sqldf-for-the-same-task.html

http://gastonsanchez.com/blog/how-to/2012/06/23/Read-file-by-columns.html


http://gastonsanchez.com/blog/archive/

http://gastonsanchez.com/


http://creativecommons.org/licenses/by-nc-sa/4.0/


http://amzn.com/w/A8P707HJ94QI






http://gastonsanchez.com/blog/tags/#analysis

http://gastonsanchez.com/blog/tags/#pca


http://gastonsanchez.com/blog/tags/#plot

http://gastonsanchez.com/blog/tags/#R

http://gastonsanchez.com/blog/how-to/2012/06/08/Using-tapply-ddply-and-sqldf-for-the-same-task.html


http://gastonsanchez.com/blog/archive/






Principal Components and Factor Analysis

This section covers principal components and factor analysis. The later includes both exploratory and confirmatory methods.

Principal ComponentsThe princomp( ) function produces an unrotated principal component analysis.

# Pricipal Components Analysis

# entering raw data and extracting PCs

# from the correlation matrix

t <- princomp(mydata, cor=T!"

s$mmary(t # print %ariance acco$nted for

loadings(t # pc loadings

plot(t,type=&lines& # scree plot

t'scores # the principal components

iplot(t

click to view

Use cor=FALSE to base the principal components on the covariance matrix. Use the covmat= option to enter a correlation or

covariance matrix directly. If entering a covariance matrix, include the optionn.obs=.

The principal( ) function in the psych package can be used to extract and rotate principal components.

# )arimax otated Principal Components

# retaining * components

lirary(psych

t <- principal(mydata, nfactors=*, rotate=&%arimax&

t # print res$lts

mydata can be a raw data matrix or a covariance matrix. Pairwise deletion of missing data is used. rotate can "none", "varimax",

"uatimax", "promax", "oblimin", "simplimax", or "cluster" .

Exploratory Factor Analysis

http://www.statmethods.net/advstats/images/biplot.jpg

http://cran.r-project.org/web/packages/psych/index.html



http://www.statmethods.net/advstats/images/biplot.jpg

http://www.statmethods.net/advstats/images/screeplot.jpg




The factanal( ) function produces maximum likelihood factor analysis.

#+axim$mielihood.actorAnalysis

#enteringrawdataandextracting/factors,

#with%arimaxrotation

t<-factanal(mydata,/,rotation=&%arimax&print(t, digits=0, c$to1=2/, sort=T!"

# plot factor 3 y factor 0

load <- t'loadings4,3506

plot(load,type=&n& # set $p plot

text(load,laels=names(mydata,cex=27 # add %ariale names

click to view

The rotation= options include "varimax", "promax", and "none". !dd the option scores="regression" or "artlett" to produce factor

scores. Use the covmat= option to enter a correlation or covariance matrix directly. If entering a covariance matrix, include the

option n.obs=.

The factor.pa( ) fnction in the psych package offers a number of factor analysis related functions, including principal axis

factoring.

# Principal Axis .actor Analysis

lirary(psych

t <- factor2pa(mydata, nfactors=/, rotation=&%arimax&

t # print res$lts

mydata can be a raw data matrix or a covariance matrix. Pairwise deletion of missing data is used. #otation can be "varimax" or

"promax".

!eterminin" the #mber of Factors to Extract

! crucial decision in exploratory factor analysis is how many factors to extract. The nFactors package offer a suite of functions to

aid in this decision. $etails on this methodology can be found in a PowerPoint presentation by #aiche, #iopel, and

lais. %f course, any factor solution must be interpretable to be useful.

# 8etermine 9$mer of .actors to "xtract

lirary(n.actors


http://cran.r-project.org/web/packages/nFactors/index.html


http://www.er.uqam.ca/nobel/r17165/RECHERCHE/COMMUNICATIONS/2006/IMPS/IMPS_2006.ppt#1


http://www.statmethods.net/advstats/images/efa.jpg






e% <- eigen(cor(mydata # get eigen%al$es

ap <- parallel(s$:ect=nrow(mydata,%ar=ncol(mydata,

rep=3;;,cent=2;*

n <- ncree(x=e%'%al$es, aparallel=ap'eigen'e%pea

plotncree(n

click to view

$oin" Frther

The Facto%ine& package offers a large number of additional functions for exploratory factor analysis. This includes the use of

both uantitative and ualitative variables, as well as the inclusion of supplimentary variables and observations. &ere is an

example of the types of graphs that you can create with this package.

# PCA )ariale .actor +ap

lirary(.acto+ine

res$lt <- PCA(mydata # graphs generated a$tomatically

click to view

Thye $PA&otation package offers a wealth of rotation options beyond varimax and promax.

http://cran.r-project.org/web/packages/FactoMineR/index.html


http://www.statmethods.net/advstats/images/factominer1.png

http://cran.r-project.org/web/packages/GPArotation/index.html



http://www.statmethods.net/advstats/images/nsScree.jpg


http://cran.r-project.org/web/packages/GPArotation/index.html



Principal Component Analysis (PCA)

Introduction

rinipal omponent $nalysis ($) is a po/er@ul tool /hen you have many variables and you/ant to look into things that these variables an e9plain. $s the name o@ $ suggests $

@inds the ombination o@ your variables /hih e9plains the phenomena. En this sense PCA is

useful when you want to reduce the number of the ariables . Fne ommon senario o@ $

is that you have n variables and you /ant to ombine them and make them 5 or G variables

/ithout losing muh o@ the in@ormation that the original data have. More mathematially $ istrying to @ind some linear projetions o@ your data /hih preserve the in@ormation your data

have.

$ is one o@ the methods you may /ant to try i@ you have lots o@ #ikert data and try tounderstand /hat these data tell you. #etHs say /e asked the partiipants @our BDsale #ikert

Iuestions about /hat they are about /hen hoosing a ne/ omputer and got the results like

this.

Particip

ant

Price Softwar

e

Aesthetic

s

Brand

P3 > * / ?

P0 7 / 0 0

P/ > ? ? *

P? * 7 3 /

P* 7 7 * *

P> > ? 0 /

P7 * 7 0 3

P@ > * ? ?

P / * > 7

P3; 3 / 7 *

P33 0 > > 7



Particip

ant

Price Softwar

e

Aesthetic

s

Brand

P30 * 7 7 >

P3/ 0 ? * >

P3? / * > *

P3* 3 > * *

P3> 0 / 7 7

Price5 A new comp$ter is cheap to yo$ (35 strongly disagree B 75 stronglyagree,

oftware5 The on a new comp$ter allows yo$ to $se software yo$ want to$se (35 strongly disagree B 75 strongly agree,

Aesthetics5 The appearance of a new comp$ter is appealing to yo$ (35strongly disagree B 75 strongly agree,

Drand5 The rand of the on a new comp$ter is appealing to yo$ (35strongly disagree B 75 strongly agree

Jo/ /hat you /ant to do is /hat ombination o@ these @our variables an e9plain the

phenomena you observed. E /ill e9plain this /ith the e9ample ode.

R code e!ample

#etHs prepare the same data sho/n in the table above.

Price <- c(>,7,>,*,7,>,*,>,/,3,0,*,0,/,3,0 oftware <-

c(*,/,?,7,7,?,7,*,*,/,>,7,?,*,>,/ Aesthetics <- c(/,0,?,3,*,0,0,?,>,7,>,7,*,>,*,7

Drand <- c(?,0,*,/,*,/,3,?,7,*,7,>,>,*,*,7 data <- data2frame(Price, oftware,Aesthetics, Drand

$t this point data looks pretty muh the same as the table above. Jo/ /e do $. En thereare t/o @untions @or $* promp() and prinomp(). promp() uses a orrelation oe@@iient

matri9 and prinomp() uses a variane ovariane matri9. Kut it seems that the results beome

similar in many ases (/hih E havenHt @ormally tested so be are@ul) and the results gained @rom

prinomp() have nie @eatures so here E use prinomp().



pca <- princomp(data, cor=T s$mmary(pca, loadings=T

$nd here is the result o@ the $.

Emportance of components5 Comp23 Comp20 Comp2/ Comp2? tandard de%iation

32**@/3 ;2@;?;0 ;2>@3>>7/ ;2/70*777 Proportion of )ariance ;2>;7*707;20?;/;;> ;233>3>7> ;2;/**33 C$m$lati%e Proportion ;2>;7*707 ;2@?7@7//

;2>?;?; 32;;;;;;;; oadings5 Comp23 Comp20 Comp2/ Comp2? Price -;2*0/

;2@?@ oftware -;2377 ;277 -;230; Aesthetics ;2*7 ;23/? ;20* -;27/? Drand

;2*@/ ;23>7 ;2?0/ ;2>7?

E /ill e9plain ho/ to interpret this result in the ne9t setion.

Interpretation of the results of PCA

#etHs take a look at the table @or loadings /hih mean the oe@@iients @or the Lne/ variables.

Comp.1Comp.2Comp.3Comp.4

Price -;2*0/ ;2@?@

oftwar

e

-;2377 ;277 -;230;

Aestheti

cs

;2*7 ;23/? ;20* -;27/?

Drand ;2*@/ ;23>7 ;2?0/ ;2>7?

Nrom the seond table (loadings) $ @ound @our ne/ variables /hih an e9plain the same

in@ormation as the original @our variables (rie "o@t/are $esthetis and Krand) /hih are

omp.1 to omp.G. $nd omp.1 is alulated as @ollo/s*

Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * e!thetic! + 0.5"3 * #rand

Thus $ suess@ully @ound a ne/ ombination o@ the variables /hih is good. The ne9t thing

/e /ant to kno/ is ho/ muh eah o@ ne/ variables has a po/er to e9plain the in@ormation that

the original data have. Nor this you need to look at "tandard deiation and Cumulatie

Proportion #of $ariance% in the result.



Comp.1Comp.2Comp.3Comp.4

tandard

de%iation

32*> ;2@ ;2>@ ;2/@

C$m$lati%e

Proportion

;2>3 ;2@* ;2> 32;;

"tandard deviation means the standard deviation o@ the ne/ variables. $ alulates theombination o@ the variables suh that ne/ variables have a large standard deviation. Thus

generally a larger standard deviation means a better variable. $ heuristis is that /e take all the

ne/ variables /hose standard deviations are roughly over 1.0 (so /e /ill take omp.1 andomp.2).

$nother /ay to determine ho/ many ne/ variables /e /ant to take is to look at umulative

proportion o@ variane. This means ho/ muh o@ the in@ormation that the original data have an be desribed by the ombination o@ the ne/ variables. Nor instane /ith only omp.1 /e an

desribe C1% o@ the in@ormation the original data have. E@ /e use omp.1 and omp2 /e an

desribe 8O% o@ them. Penerally 80% is onsidered as the number o@ the perentage /hihdesribes the data /ell. "o in this e9ample /e an take omp.1 and omp.2 and ignore

omp.5 and omp.G.

En this manner /e an derease the number o@ the variables (in this e9ample @rom G variables to2 variables). 6our ne9t task is to understand /hat the ne/ variable means in the onte9t o@ your

data. $s /e have seen the @irst ne/ variable an be alulated as @ollo/s*

Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * e!thetic! + 0.5"3 * #rand

Et is a very good idea to plot the data to see /hat this ne/ variable means. 6ou an use !core! to

take the values o@ eah variable modeled by $.

plot(pca'scores4,36 arplot(pca'scores4,36

Qith the graphs (sorry E /as kinda laRy to upload the graph but you an Iuikly generate it byyoursel@) you an see artiipant 1 D 8 get negative values and the other partiipants get positive

values. Et seems that this ne/ variable indiates /hether a user ares about rie and "o@t/are or

$esthetis and Krand @or her omputer. "o /e probably an name this variable as

LNeature'Nashion inde9 or something. There is no de@initive ans/er @or this part o@ $. 6ouneed to go through your data and make sense /hat the ne/ variables mean by yoursel@.



PCA and &o'istic re'ression

#ne you ha$e done the analysis %ith &'A( you !ay %ant to look into %hether the ne% $ariables an predit so!e

pheno!ena %ell. )his is kinda like !ahine learning: *hether +eatures an lassi+y the data %ell. ,et-s say you ha$e

asked the partiipants one !ore thing( %hih #they are using /*indo%s or Ma0 in your sur$ey( and the results are

like this.

Particip

ant

Price Softwar

e

Aesthetic

s

BrandOS

P3 > * / ? ;

P0 7 / 0 0 ;

P/ > ? ? * ;

P? * 7 3 / ;

P* 7 7 * * 3

P> > ? 0 / ;

P7 * 7 0 3 ;

P@ > * ? ? ;

P / * > 7 3

P3; 3 / 7 * 3

P33 0 > > 7 ;

P30 * 7 7 > 3

P3/ 0 ? * > 3

P3? / * > * 3

P3* 3 > * * 3

P3> 0 / 7 7 3



ere %hat %e are going to do is to see %hether the ne% $ariables gi$en by &'A an predit the # people are

using. # is or 1 in our ase( %hih !eans the dependent $ariable is bino!ial. )hus( %e are going to do logisti

regression. %ill skip the details o+ logisti regression here. + you are interested( the details o+ logisti regression are

a$ailable in a separate page.

4irst( %e prepare the data about #.

<- c(;,;,;,;,3,;,;,;,3,3,;,3,3,3,3,3

)hen( +it the +irst $ariable %e +ound through &'A /i.e.. 'o!p.10 to a logisti +untion.

model <- glm( F pca'scores4,36, family=inomial s$mmary(model

5o% you get the logisti +untion !odel.

Call5 glm(form$la = F pca'scores4, 36, family = inomial 8e%iance esid$als5

+in 3G +edian /G +ax -0237?> -;2??*@> ;2;3/0 ;2>;;3@ 32>*0>@ CoeHcients5

"stimate td2 "rror I %al$e Pr(JKIK (Entercept -;2;@/73 ;27?03> -;233/ ;23;0

pca'scores4, 36 32?07/ ;2>030 02/;3 ;2;03? L --- ignif2 codes5 ; MLLLN ;2;;3 MLLN

;2;3 MLN ;2;* M2N ;23 M N 3 (8ispersion parameter for inomial family taen to e 3 9$ll

de%iance5 0023@3 on 3* degrees of freedom esid$al de%iance5 302;// on 3?

degrees of freedom AEC5 3>2;// 9$mer of .isher coring iterations5 *

,et-s see ho% %ell this !odel predits the kind o+ #. 6ou an use +itted/0 +untion to see the predition.

tted(model 3 0 / ? * > 7 ;23*37/70/ ;2;?3*?? ;2/?>@7// ;2;??;>3//

;20**0;7?* ;2;7@;@>// ;2;0>?3>> @ 3; 33 30 3/ 3? ;2037???*? ;2@?//;7

;2/>30?33 ;23;*7? ;27/?0@>?@ ;2@*3;/3 ;27>0@*37; 3* 3> ;27@3?@@;2>?3;@?3

)hese $alues represent the probabilities o+ being 1. 4or exa!ple( %e an expet 178 hane that &artiipant 1 is

using # 1 based on the $ariable deri$ed by &'A. )hus( in this ase( &artiipant 1 is !ore likely to be using # (

%hih agrees %ith the sur$ey response. n this %ay( &'A an be used %ith regression !odels +or alulating the

probability o+ a pheno!enon or !aking a predition.

http://yatani.jp/teaching/doku.php?id=hcistats:logisticregression





actor Analysis

Introduction

Nator $nalysis is another po/er@ul tool to understand /hat your data mean partiularly /hen

you have many variables. Qhat Nator $nalysis does is to try to @ind hidden variables /hih

e9plain the behavior o@ your observed variables. Fur interests here also lie in reduing thenumber o@ variables. "o /e hope that /e an @ind a smaller number o@ ne/ variables /hih

e9plain your data /ell. En this sense it sounds very similar to$. $lthough the outome is very

similar in terms o@ reduing the number o@ variables the approah to redue the number o@variable is di@@erent. E /ill e9plain this in the ne9t setion.

E@ you are a little more kno/ledgeable you may have heard o@ the terms like S9ploratory Nator

$nalysis (SN$) and on@irmatory Nator $nalysis (N$). SN$ means that you donHt really kno//hat hidden variables (or @ators) e9ist and ho/ many they are. "o you are trying to @ind them.

N$ means that you already have some guesses or models @or your hidden variables (or @ators)

and you /ant to hek /hether your models are orret. En many ases your Nator $nalysis isSN$ and E e9plain it in this page.

Qe are going to use a similar e9ample in $. #etHs say you have some data like this @rom your

survey about /hat is important /hen they deide /hih omputer to buy.

Particip

ant

Price Softwar

e

Aesthetic

s

Brand amilyriend

P3 > * / ? 7 >

P0 7 / 0 0 0 /

P/ > ? ? * * ?

P? * 7 3 / > 7

P* 7 7 * * 0 3

P> > ? 0 / ? *

P7 * 7 0 3 3 ?

P@ > * ? ? 7 *

P / * > 7 / ?

http://yatani.jp/teaching/doku.php?id=hcistats:pca







Particip

ant

Price Softwar

e

Aesthetic

s

Brand amilyriend

P3; 3 / 7 * 0 ?

P33 0 > > 7 > *

P30 * 7 7 > 7 7

P3/ 0 ? * > > 0

P3? / * > * 0 /

P3* 3 > * * ? *

P3> 0 / 7 7 * >

Price5 A new comp$ter is cheap to yo$ (35 strongly disagree B 75 stronglyagree,

oftware5 The on a new comp$ter allows yo$ to $se software yo$ want to$se (35 strongly disagree B 75 strongly agree,

Aesthetics5 The appearance of a new comp$ter is appealing to yo$ (35strongly disagree B 75 strongly agree,

Drand5 The rand of the on a new comp$ter is appealing to yo$ (35strongly disagree B 75 strongly agree,

.riend5 Oo$r friends opinions are important to yo$ (35 strongly disagree B 75

strongly agree, .amily5 Oo$r familys opinions are important to yo$ (35 strongly disagree B 75

strongly agree2Nor suess@ully doing Nator $nalysis /e need more data than this e9ample. E@ your /ant to

@ind n @ators you /ant to have roughly 3n - $ndimensions o@ data and 5n - 10n samples. $nd

Nator $nalysis assumes the normality o@ the data so it is not a great tool @or ordinal data.-o/ever in pratie /e an use Nator $nalysis on ordinal data i@ the sale is O or more and

data an be treated as interval data.

Through Nator $nalysis you /ant to @ind hidden variables (common factors) /hih maye9plain the responses you gained. Nor looking at ho/ to do Nator $nalysis in E /ould like to

brie@ly e9plain the di@@erene bet/een $ and N$.

(ifference between )actor Analysis and PCA



*he intuition of Principal Component Analysis is to find new combination of ariables

which form lar'er ariances. Qhy are larger varianes important! This is a similar onept o@

entropy in in@ormation theory. #etHs say you have t/o variables. Fne o@ them (ar 1) @orms J(10.01) and the other (ar 2) @orms J(1 1). Qhih variable do you think has more in@ormation!

ar 1 is al/ays pretty muh 1 /hereas ar 2 an take a /ider range o@ values like 0 or 2. Thus

ar 2 has more hanes to have various values than ar 1 /hih means ar 2Hs entropy is largerthan ar 1Hs. Thus /e an say ar 2 ontains more in@ormation than ar 1.

$lthough the e9ample above just looks at one variable at one time PCA tries to find linear

combination of the ariables which contain much information by loo+in' at the ariance .This is /hy the standard deviation is one o@ the important metris to determine the number o@

ne/ variables in $. $nother interesting aspet o@ the ne/ variables derived by $ is that all

ne/ variables are orthogonal. 6ou an think that $ is rotating and translating the data suhthat the @irst a9is ontains the most in@ormation and the seond has the seond most in@ormation

and so @orth.

*he intuition of )actor Analysis is to find hidden ariables which affect your obseredariables by loo+in' at the correlation. E@ one variable is orrelated /ith another variables /e

an say that these t/o variables are generated @rom one hidden variable so /e an e9plain the

phenomena /ith that one hidden variable instead o@ the t/o variable. #etHs take a look at theorrelation matri9 o@ the data /e have (see the ode e9ample belo/ to reate the data @rame)

be@ore doing Nator $nalysis.

cor(data

$nd you get the orrelation matri9.

Price oftware Aesthetics Drand .riend .amily Price 32;;;;;;;; ;23@*>30/-;2>/0;003 -;2*@;0>>@; ;2;/;@0;;> -;2;>3@/33@ oftware ;23@*>30/;

32;;;;;;; -;23?>03*3> -;233@*@>?* ;23;;>77? ;237>*70/> Aesthetics

-;2>/0;003 -;23?>03*0 32;;;;;;;; ;2@*0@*?/> ;2;/@7 -;2;>77/>; Drand

-;2*@;0>>@; -;233@*@>? ;2@*0@*?/> 32;;;;;;;; ;2///3>73 ;2;0>>0/@ .riend

;2;/;@0;;> ;23;;>77 ;2;/@7 ;2///3>73 32;;;;;;;; ;2>;7073@ .amily

-;2;>3@/33@ ;237>*70? -;2;>77/>; ;2;0>>0/@ ;2>;7073@ 32;;;;;;;;

"o it looks like that rie has strong negative orrelations /ith $esthetis and Krand and Nriend

has a strong orrelation /ith Namily. This means that /e an e9pet that /e /ill have t/oommon @ators and one /ill be related to rie $esthetis and Krand and the other /ill be

related to Nriend and Namily. #etHs move on to Nator $nalysis and see /hat /ill happen.





R code e!ample

En the @ollo/ing ode e9ample E skipped some details suh as using varima9 rotation or proma9rotation ( uses varima9 rotation by de@ault). E@ you /ant to kno/ more details E reommend

you to read other books or re@erenes @or no/. E may add these details later but not sure;

Nirst /e prepare the data.

Price <- c(>,7,>,*,7,>,*,>,/,3,0,*,0,/,3,0 oftware <-

c(*,/,?,7,7,?,7,*,*,/,>,7,?,*,>,/ Aesthetics <- c(/,0,?,3,*,0,0,?,>,7,>,7,*,>,*,7

Drand <- c(?,0,*,/,*,/,3,?,7,*,7,>,>,*,*,7 .riend <-

c(7,0,*,>,0,?,3,7,/,0,>,7,>,0,?,* .amily <- c(>,/,?,7,3,*,?,*,?,?,*,7,0,/,*,> data <-

data2frame(Price, oftware, Aesthetics, Drand, .riend, .amily

Nator $nalysis is easy to do in . #etHs do Nator $nalysis assuming that the number o@ the

hidden variables is 1.

fa <- factanal(data, factor=3

$nd you get the result.

Call5 factanal(x = data, factors = 3 !ni$enesses5 Price oftware Aesthetics Drand

.riend .amily ;2*>7 ;277 ;230> ;23>7 ;27? 32;;; oadings5 .actor3 Price -;2>*@

oftware -;23*0 Aesthetics ;2/* Drand ;230 .riend ;23>3 .amily .actor3

loadings 023; Proportion )ar ;2/>* Test of the hypothesis that 3 factor is s$Hcient2

The chi s$are statistic is 3027 on degrees of freedom2 The p-%al$e is ;2370

-ere the @ator analysis is doing a null hypothesis test in /hih the null hypothesis is that the

model desribed by the @ator /e have @ound predits the data /ell. "o /e have the hiDsIuare

goodnessDo@D@it /hih is 12.8 and the p value is 0.1B. This means /e annot rejet the null

hypothesis so the @ator predits the data /ell @rom the statistis perspetive. This is /hy theresult says LTest o@ the hypothesis that 1 @ator is su@@iient. #etHs take a look at the Nator

$nalysis /ith t/o @ators.

fa <- factanal(data, factor=0 Call5 factanal(x = data, factors = 0 !ni$enesses5

Price oftware Aesthetics Drand .riend .amily ;2** ;2>; ;230> ;2;@; ;2;;* ;2>;

oadings5 .actor3 .actor0 Price -;2>*7 oftware -;23>3 ;233 Aesthetics ;2//

Drand ;20@ ;20?0 .riend ;23;; ;20 .amily ;2>0; .actor3 .actor0 loadings020;7 32?*/ Proportion )ar ;2/>@ ;20?0 C$m$lati%e )ar ;2/>@ ;2>3; Test of the

hypothesis that 0 factors are s$Hcient2 The chi s$are statistic is 023> on ? degrees

of freedom2 The p-%al$e is ;27;>

The p value gets larger and the umulative portion o@ variane beomes 0.C1 (/ith one variable

it is 0.5B). "o the model seems to be improved. #oadings sho/s the /eights to alulate the

hidden variables @rom the observed variables.



Kut obviously the model gets improved i@ you have more variables /hih sho/s the tradeDo@@

bet/een the number o@ variables and the auray o@ the model. "o ho/ should /e deide ho/

many @ators /e should pik up! This is the topi @or the ne9t setion.

How many factors should we use,

Qe @ound the t/o @ators in the e9ample /hih are*

actor

1

actor

2

Price -;2>*7

oftwar

e

-;23>3 ;233

Aestheti

cs

;2//

Drand ;20@ ;20?0

.riend ;23;; ;20

.amily ;2>0;

En the results o@ N$ some oe@@iients are missing but this means these oe@@iients are just too

small and not neessary eIual to Rero. 6ou an see the all oe@@iients by doing

like fa%&oading!'(1) /ith more preisions.

$lthough the goodnessDo@D@it tells you /hether the urrent number o@ variables are su@@iient or

not it does not tell /hether the number o@ variables are large enough @or desribing the

in@ormation that the original data have. Nor instane /hy donHt /e try three @ators instead o@one or t/o @ators! There are a @e/ /ays to ans/er this Iuestion.



Comprehensi!ility

This means /hether you an e9plain your ne/ variables in a sensible /ay. Nor e9ample Nator 1

has large /eight on rie $esthetis and Krand /hih may indiate /hether people /ant

pratial aspets or @ashionable aspets on their omputers. Nator 2 has large /eights on Nriendand Namily /hih seems to mean that people around users have some e@@ets on the omputer

purhase. Thus both @ators seem to have some meanings and thatHs /hy /e should keep them.

This is not really a mathematial /ay to determine the number o@ @ators but is a standard /ayto do. Keause /e /ant to @ind @ators /hih e9plain something /e an just ignore @ators

/hih donHt really make sense. This is probably intuitive but E kno/ you may argue that it is too

subjetive. "o /e have more mathematial /ays to determine the number o@ @ators.

C"m"lati#e #ariance

"imilar to $ you an look at the umulative portion o@ variane and i@ that reahes some

numbers you an stop adding more @ators. =eiding the threshold @or the umulative portion is

kind o@ heuristi. Et an be 80% similar to $. E@ your @ous is on reduing the number o@variables it an be O0 D C0 %.

$aiser criterion

The aiser rule is to disard omponents /hose eigenvalues are belo/ 1.0. This is also used in""". 6ou an easily alulate the eigenvalues @rom the orrelation matri9.

e% <- eigen(cor(data e%'%al$es 02?*7;33/; 32>@;;;*> ;2@3*7;?7 ;2>;*@//0>

;2070@*//? ;2;@/7/3;7

"o /e an determine that the number o@ @ators should be 2. Fne problem o@ aiser rule is that

it o@ten beomes too strit.



Scree plot

Another %ay to deter!ine the nu!ber o+ +ators is to use ree plot. 6ou plot the o!ponents on the 9 axis( and the

eigen$alues on the 6 axis and onnet the! %ith lines. 6ou then try to +ind the spot %here the slope o+ the line

beo!es less steep. o( ho% exatly should %e +ind the spot like that Again( it is kind o+ heuristi. n so!e ases

/partiularly %hen the nu!ber o+ your original $ariables are s!all like the exa!ple abo$e0( you an-t +ind a lear spot

like that /try to !ake a plot by using the +ollo%ing ode0. 5onetheless( it is good to kno% ho% to !ake a ree plot.

)he +ollo%ing proedure to !ake a ree plot is based on this %ebpage. 6ou also need n4ators pakage.

e% <- eigen(cor(data lirary(n.actors ap <-

parallel(s$:ect=nrow(data,%ar=ncol(data,rep=3;;,cent=;2;* n <-

ncree(e%'%al$es, ap'eigen'e%pea plotncree(n

http://www.statmethods.net/advstats/factor.html






eknik &enga!bilan a!pel : 5onprobability a!pling &engertian 5onprobability a!pling atau ;e+inisi 5onprobability

a!pling adalah teknik penga!bilan sa!pel yang tidak !e!beri peluang atau kese!patan sa!a bagi setiap unsur atau

anggota populasi untuk dipilih !en"adi sa!pel. )eknik a!pling 5onprobality ini !eliputi :a!pling iste!atis( a!pling Kuota(

a!pling nsidental( &urposi$e a!pling( a!pling Jenuh( no%ball a!pling. 1. a!pling iste!atis &engertian a!pling

iste!atis atau ;e+inisi a!pling iste!atis adalah teknik penga!bilan sa!pel berdasarkan urutan dari anggota populasi yang

telah diberi no!or urut.'ontoh a!pling iste!atis( anggota populasi yang terdiri dari 1 orang( dari se!ua se!ua anggota

populasi itu diberi no!or urut 1 sa!pai 1. &enga!bilan sa!pel dapat dilakukan dengan !enga!bil no!or gan"il sa"a( genap

sa"a( atau kelipatan dari bilangan tertentu( !isalnya kelipatan dari bilangan li!a. Untuk itu !aka yang dia!bil sebagai sa!pel

adalah no!or urut 1( 7( 1( 17( 2 dan seterusnya sa!pai 1. 2. a!pling Kuota &engertian a!pling Kuota atau ;e+inisi

a!pling Kuota adalah teknik untuk !enentukan sa!pel dari populasi yang !e!punyai iri<iri tertentu sa!pai "u!lah kuota yang

diinginkan.'ontoh a!pling Kuota( akan !elakukan penelitian tentang Karies =igi( "u!lah sa!pel yang ditentukan 7 orang( "ika

pengu!pulan data belu! !e!enuhi kuota 7 orang tersebut( !aka penelitian dipandang belu! selesai. Bila pengu!pulan data

dilakukan seara kelo!pok yang terdiri atas 7 orang pengu!pul data( !aka setiap anggota kelo!pok harus dapat !enghubungi1 orang anggota sa!pel( atau 7 orang tersebut harus dapat !enari data dari 7 anggota sa!pel. 3. a!pling nsidental

&engertian a!pling nsidental atau ;e+inisi a!pling nsidental adalah teknik penentuan sa!pel berdasarkan kebetulan( yaitu

siapa sa"a yang seara kebetulan atau insidental berte!u dengan peneliti dapat digunakan sebagai sa!pel( bila dipandang orang

yang kebetulan dite!ui itu ook sebagai su!ber data. . &urposi$e a!pling &engertian &urposi$e a!pling atau ;e+inisi

&urposi$e a!pling adalah teknik penentuan sa!pel dengan perti!bangan tertentu. 'ontoh &urposi$e a!pling( akan !elakukan

penelitian tentang kualitas !akanan( !aka sa!pel su!ber datanya adalah orang yang ahli !akanan. a!pel ini lebih ook

digunakan untuk &enelitian Kualitati+ atau penelitian yang tidak !elakukan generalisasi. 7. a!pling Jenuh /ensus0 &engertian

a!pling Jenuh atau ;e+inisi a!pling Jenuh adalah teknik penentuan sa!pel bila se!ua anggota populasi digunakan sebagai

sa!pel. al ini sering dilakukan bila "u!lah populasi relati+ keil( kurang dari 3 orang( atau penelitian yang ingin !e!buat

generalisasi dengan kesalahan yang sangat keil. ?. no%ball a!pling &engertian no%ball a!pling atau ;e+inisi no%ball

a!pling adalah teknik penentuan sa!pel yang !ula<!ula "u!lahnya keil( ke!udian !e!besar. barat bola sal"u yang

!enggelinding yang la!a<la!a !en"adi besar. ;ala! penentuan sa!pel( perta!a<ta!a dipilih satu atau dua orang sa!pel( tetapi

karena dengan dua orang sa!pel ini belu! !erasa lengkap terhadap data yang diberikan( !aka peneliti !enari orang lain yang

dipandang lebih tahu dan dapat !elengkapi data yang diberikan oleh dua orang sa!pel sebelu!nya. Begitu seterusnya( sehingga

"u!lah sa!pel se!akin banyak. &ada penelitian kualitati+ banyak !enggunakan sa!pel &urposi$e dan no%ball. 'ontohnya akan!eneliti siapa pro$okasi kerusuhan( !aka akan ook !enggunakan &urposi$e a!pling dan no%ball a!pling. 'ara

&enga!bilan a!pel dengan &robabilitas a!pling Ada e!pat !aa! teknik penga!bilan sa!pel yang ter!asuk dala! teknik

penga!bilan sa!pel dengan probabilitas sa!pling. Kee!pat teknik tersebut( yaitu ara aak( strati+ikasi( klaster( dan siste!atis. 1.

a!pling Aak Ada beberapa na!a untuk !enyebutkan teknik pe!ilihan sa!pling ini. 5a!a tersebut ter!asuk di antaranya:

rando! sa!pling atau teknik aak. Apa pun na!anya teknik ini sangat populer dan banyak dian"urkan penggunaannya dala!

proses penelitian. &ada teknik aak ini( seara teoretis( se!ua anggota dala! populasi !e!punyai probabilitas atau kese!patan

yang sa!a untuk dipilih !en"adi sa!pel. Untuk !endapat responden yang hendak di"adikan sa!pel( satu hal penting yang harus

diketahui oleh para peneliti adalah bah%a perlunya bagi peneliti untuk !engetahui "u!lah responden yang ada dala! populasi.

)eknik !e!ilih seara aak dapat dilakukan baik dengan !anual atau tradisional !aupun dengan !enggunakan tabel rando!. a.

'ara )radisional 'ara tradisional ini dapat dilihat dala! ku!pulan ibu<ibu ketika arisan. )eknik aak ini dapat dilakukan dengan

langkah<langkah seperti berikut: tentukan "u!lah populasi yang dapat dite!ui@ da+tar se!ua anggota dala! populasi( !asukkan

dala! kotak yang telah diberi lubang penarikan@ kook kotak tersebut dan keluarkan le%at lubang pengeluaran yang telah dibuat@

no!or anggota yang keluar adalah !ereka yang ditun"uk sebagai sa!pel penelitian@ lakukan terus sa!pai "u!lah yang diinginkan

dapat diapai. b. Menggunakan )abel Aak &ada ara kedua ini( proses pe!ilihan sub"ek dilakukan dengan !enggunakan tabelyang dihasilkan oleh ko!puter dan telah diakui !an+aatnya dala! teori penelitian. )abel tersebut u!u!nya terdiri dari kolo! dan

angka li!a digit yang telah seara aak dihasilkan oleh ko!puter. ;engan !enggunakan tabel tersebut( angka<angka yang ada

digunakan untuk !e!ilih sa!pel dengan langkah sebagai berikut: identi+ikasi "u!lah total populasi@ tentukan "u!lah sa!pel yang

diinginkan@ da+tar se!ua anggota yang !asuk sebagai populasi@ berikan se!ua anggota dengan no!or kode yang di!inta(

!isalnya: <2 untuk populasi yang ber"u!lah 3 orang( atau < untuk "u!lah populasi 1 orang@ pilih seara aak

/!isalnya tutup !ata0 dengan !enggunakan penun"uk pada angka yang ada dala! tabel@ pada angka<angka yang terpilih( lihat

hanya angka digit yang tepat yang dipilih. Jika populasi 7 !aka hanya 3 digit dari akhir sa"a. Jika populasi !e!punyai anggota

!aka hanya diperlukan dua digit dari akhir sa"a@ "ika angka dikaitkan dengan angka terpilih untuk indi$idual dala! populasi

!en"adi indi$idu dala! sa!pel. ebagai ontoh( "ika populasinya ber"u!lah 7( !aka angka terpilih 37 !asuk sebagai indi$idu

sa!pel. ebaliknya "ika populasi hanya 3( !aka angka terpilih 37 tidak ter!asuk sebagai indi$idu sa!pel@ gerakan penun"uk



dala! kolo! atau angka lain@ ulangi langkah no!or C sa!pai "u!lah sa!pel yang diinginkan terapai. Ketika "u!lah sa!pel yang

diinginkan telah terapai !aka langkah selan"utnya adalah !e!bagi dala! kelo!pok kontrol dan kelo!pok perlakuan sesuai

dengan bentuk desain penelitian. 'ontoh Me!ilih a!pel dengan a!pling Aak eorang kepala sekolah ingin !elakukan studi

terhadap para sis%a yang ada di sekolah. &opulasi sis%a MK ternyata "u!lahnya ? orang. a!pel yang diinginkan adalah 18

dari populasi. ;ia ingin !enggunakan teknik aak( untuk !enapai hal itu( dia !enggunakan langkah<langkah untuk !e!ilih

sa!pel seperti berikut. &opulasi yang "u!lahnya ? orang diidenti+ikasi. a!pel yang diinginkan 18 x ? D ? orang. &opulasi

dida+tar dengan diberikan kode dari <7. )abel aak yang berisi angka rando! digunakan untuk !e!ilih data dengan

!enggerakkan data sepan"ang kolo! atau baris dari tabel. Misalnya diperoleh sederet angka seperti berikut: 7C 1 C7 2 ?32C C C #leh karena "u!lah populasi ? orang !aka dua angka terpilih !en"adi sa!pel yaitu: 7C dan 2C. 'oba langkah d

sa!pai diperoleh se!ua "u!lah ? responden. 2. )eknik trati+ikasi ;ala! penelitian pendidikan !aupun penelitian sosial lainnya(

sering kali dite!ui kondisi populasi yang ada terdiri dari beberapa lapisan atau kelo!pok indi$idual dengan karakteristik berbeda. ;i

sekolah( !isalnya ada kelas satu( kelas dua( dan kelas tiga. Mereka "uga dapat dibedakan !enurut "enis kela!in responden

!en"adi kelo!pok laki<laki dan kelo!pok pere!puan. ;i !asyarakat( populasi dapat berupa kelo!pok !asyarakat( !isalnya

petani( pedagang( pega%ai negeri( pega%ai s%asta( dan sebagainya. Keadaan populasi yang de!ikian akan tidak tepat dan tidak

ter%akili@ "ika digunakan teknik aak. Karena hasilnya !ungkin satu kelo!pok terlalu banyak yang terpilih sebagai sa!pel(

sebaliknya kelo!pok lain tidak ter%akili karena tidak !unul dala! proses pe!ilihan. )eknik yang paling tepat dan !e!punyai

akurasi tinggi adalah teknik sa!pling dengan ara strati+ikasi. )eknik strati+ikasi ini harus digunakan se"ak a%al( ketika peneliti

!engetahui bah%a kondisi populasi terdiri atas beberapa anggota yang !e!iliki strati+ikasi atau lapisan yang berbeda antara satu

dengan lainnya. Ketepatan teknik strati+ikasi "uga lebih dapat ditingkatkan dengan !enggunakan proporsional besar keilnya

anggota lapisan dari populasi ditentukan oleh besar keilnya "u!lah anggota populasi dala! lapisan yang ada. eperti halnya

teknik !e!ilih sa!pel seara aak( teknik strati+ikasi "uga !e!punyai langkah<langkah untuk !enentukan sa!pel yang diinginkan.

,angkah<langkah tersebut dapat dilihat seperti berikut : denti+ikasi "u!lah total populasi. )entukan "u!lah sa!pel yang diinginkan.

;a+tar se!ua anggota yang ter!asuk sebagai populasi. &isahkan anggota populasi sesuai dengan karakteristik lapisan yang

di!iliki. &ilih sa!pel dengan !enggunakan prinsip aak seperti yang telah dilakukan dala! teknik rando! di atas. ,akukan langkah

pe!ilihan pada setiap lapisan yang ada. a!pai "u!lah sa!pel dapat diapai. 'ontoh !enentukan sa!pel dengan teknik

strati+ikasi eorang peneliti ingin !elakukan studi dari suatu populasi guru MK yang "u!lahnya orang( sa!pel yang diinginkan

adalah 18 dari populasi. ;ala! anggota populasi ada tiga lapisan guru( !ereka adalah yang !e!punyai golongan dua( golongan

tiga( dan golongan e!pat. ;ia ingin !e!ilih sa!pel dengan !enggunakan teknik strati+ikasi. )erangkan langkah<langkah guna

!enga!bil sa!pel dengan !enggunakan teknik strati+ikasi tersebut. Ja%abannya adalah sebagai berikut. Ju!lah total populasi

adalah orang. ;a+tar se!ua anggota yang ter!asuk sebagai populasi dengan no!or <C. Bagi populasi !en"adi tiga lapis(

dengan setiap lapis terdiri 3 orang. Undilah sa!pel yang diinginkan 38 x D 2 orang. etiap lapis !e!punyai anggota

orang. untuk lapisan perta!a gerakan penun"uk /pensil0 dala! tabel aak. ;an pilih dari angka tersebut dan a!bil yang !e!iliki

nilai lebih keil dari angka C sa!pai akhirnya diperoleh sub"ek. ,akukan langkah ? dan untuk apis kedua dan ketiga sa!pai

total sa!pel diperoleh "u!lah 2 orang. 3. )eknik Klaster )eknik klaster !erupakan teknik !e!ilih sa!pel lainnya dengan

!enggunakan prinsip probabilitas. )eknik ini !e!punyai sedikit perbedaan "ika dibandingkan dengan kedua teknik yang telah

dibahas di atas. )eknik klaster atau 'luster a!Epling ini !e!ilih sa!pel bukan didasarkan pada indi$idual( tetapi lebih didasarkan

pada kelo!pok( daerah( atau kelo!pok sub"ek yang seara ala!i berku!pul bersa!a. )eknik klaster sering digunakan oleh parapeneliti di lapangan yang %ilayahnya !ungkin luas. ;engan !enggunakan teknik klaster ini( !ereka lebih dapat !enghe!at biaya

dan tenaga dala! !ene!ui responden yang !en"adi sub"ek atau ob"ek penelitian. Me!ilih sa!pel dengan !enggunakan teknik

klaster ini !e!punyai beberapa langkah seperti berikut. denti+ikasi populasi yang hendak digunakan dala! studi. b. )entukan

besar sa!pel yang diinginkan. )entukan dasar logika untuk !enentukan klaster. &erkirakan "u!lah rata<rata sub"ek yang ada pada

setiap klaster. ;a+tar se!ua sub"ek dala! setiap klaster dengan !e!bagi antara "urnlah sa!pel dengan "u!lah klaster yang ada.

eara rando!( pilih "u!lah angggota sa!pel yang diinginkan untuk setiap klaster. Ju!lah sa!pel adalah "u!lah klaster dikalikan

"u!lah anggota populasi per klaster. 'ontoh terapan pe!ilihan sa!pel dengan !enggunakan teknik klasterMisalkan seorang

peneliti hendak !elakukan studi pada populasi yang "u!lahnya . guru dala! 1 sekolah yang ada. Fa!pel yang diinginkan

adalah orang. 'ara yang digunakan adalah teknik sa!pel seara klaster dengan sekolah sebagai dasar penentuan logis klaster

yang ada. Bagai!anakah langkah !enentukan sa!pel tersebut Ja%abannya adalah sebagai berikut. )otal populasi adalah .

orang. Ju!lah sa!pel yang diinginkan orang. ;asar logis klaster adalah sekolah yang "u!lahnya ada 1. ;ala! populasi(

setiap sekolah adalah .G1 D guru setiap sekolah. Ju!lah klaster yang ada adalah G D 1. #leh karena itu( 1

sekolah di antara 1 sekolah dipilih seara rando!. Jadi( se!ua guru yang ada dala! 1 sekolah sa!a dengan "u!lah sa!pel

yang diinginkan. . )eknik eara iste!atis )eknik !e!ilih sa!pel yang kee!pat adalah teknik siste!atis atau syste!ati

sa!pling. )eknik pe!ilihan ini !enggunakan prinsip proporsional. 'aranya ialah dengan !enentukan pilihan sa!pel pada setiap

1Gk( di !ana k adalah suatu angka pe!bagi yang telah ditentukan !isalnya 7(? atau 1. yarat yang perlu diperhatikan oleh para

peneliti adalah adanya da+tar atau list se!ua anggota populasi. Untuk populasi yang dida+tar atas dasar urutan ab"ad pe!akaian

!etode !enggunakan teknik siste!atis "uga dapat diterapkan. *alaupun !ungkin sa"a ter"adi bah%a suatu na!a seperti na!a

yang bera%alan su( sri dala! bahasa ndonesia akan ter"adi pengu!pulan na!a dala! a%alan tersebut. isternatis proporsional k

dapat !e!ilih dengan baik. )eknik obser$asi lapangan khusus untuk penelitian di lokasi ta!bang &engu!pulan ;ata penelitian

)eknik ini dilakukan dengan ara !elakukan penga!atan langsung di lapangan. Menga!ati tidak hanya !elihat( !elainkan

!ereka!( !enghitung( !engukur( dan !enatat ke"adian yang ada di lapangan. )eknik ini ada dua !aa!( yaitu obser$asi

langsung /obser$asi partisipasi0 yaitu apabila pengu!pulan data !elalui penga!atan dan penatatan ge"alage"ala pada ob"ek yang

dilakukan seara langsung di te!pat ke"adian( dan obser$asi tidak langsung /obser$asi non<partisipasi0 yaitu pengu!pulan data



!elalui penga!atan dan penatatan ge"ala<ge"ala pada ob"ek tidak seara langsung di lapangan. Beberapa ara yang biasa

dilakukan dala! obser$asi adalah sebagai berikut: 10 Me!buat atatan anekdot /anedotal reord0( yaitu atatan in+or!al yang

digunakan pada %aktu !elakukan obser$asi. 'atatan ini berisi +eno!ena atau peristi%a yang ter"adi saat obser$asi. 20 Me!buat

da+tar ek /heklist0( yaitu da+tar yang berisi atatan setiap +aktor seara siste!atis. ;a+tar ek ini biasanya dibuat sebelu!

obser$asi dan sesuai dengan tu"uan obser$asi. 30 Me!buat skala penilaian /rating sale0( yaitu skala yang digunakan untuk

!enetapkan penilaian seara bertingkat untuk !enga!ati kondisi data seara kualitiati+. 0 Menatat dengan !enggunakan alat

/!ehanial de$ie0( yaitu penatatan yang dilakukan !elalui penga!atan dengan !enggunakan alat( !isalnya slide( ka!era(

ko!puter( dan alat pereka! suara. #bser$asi tersebut dapat terbentang !ulai dari kegiatan pengu!pulan data yang +or!al hinggayang tidak +or!al. Bukti obser$asi seringkali ber!an+aat untuk !e!berikan in+or!asi ta!bahan tentang topik yang akan diteliti.

#bser$asi dapat !ena!bah di!ensi<di!ensi baru untuk pe!aha!an konteks !aupun +eno!ena yang akan diteliti. #bser$asi

tersebut bisa begitu berharga sehingga peneliti bisa !enga!bil +oto<+oto pada situs studi kasus untuk !ena!bah keabsahan

penelitian /;abbs( 1?: 1130.

'opy the BH) )raders and Make Money /#ne 'lik0 : http:GGo%.lyGK5'I

http://ow.ly/KNICZ

http://ow.ly/KNICZ

Analisis Komponen Utama Dg R

Documents