This Repository My Website My Projects My Github My LinkedIn

library(aaRon)
library(psych)
library(lavaan)

aaRon

aaRon is a package created to consolidate the self-defined functions that I use in my daily work. The functions in this package are nothing ground-breaking, but they do make daily life a little easier.

To install the package directly from R/RStudio:

devtools::install_github("Aaron0696/aaRon")

As I am psychology-trained, most of these functions are related to common analysis/manipulation in psychology. E.g, code.reverse() made reverse-coding items easier, aggregateScale() is a single function to calculate the sum/mean while ensuring that respondents with a certain percentage of missing items are coded as NA.

Some of these functions are built on existing functions from other packages such as:

lavaan
knitr
htmltools
tidyr
psych

This document is meant to be a showcase of these functions, coupled with short explanations on their usage. This will be updated when more functions are added. More details about the arguments of each function can be found in the help manual when the package is installed.

?aggregateScale

use.style()

When knitting html files from R Markdown, this function calls a CSS from aaRon which styles the knitted html document. This document is styled this way.

There are two options to choose from, work and casual. Main difference between the two are the colors, casual uses more flashy blue/green compared to work.

aaRon::use.style(mode = "casual")

# aaRon::use.style(mode = "work")

fct2num()

A function that transforms a factor vector to a numeric. The advantage over simply using as.numeric() is that:

There is an additional argument start that can be used to determine the numeric value of the lowest level. In Psychology, we frequently have scales that start from zero, as.numeric() will always recode the lowest level as 1.

a <- factor(c("Agree","Disagree","Agree","Agree","Disagree"))
fct2num(a)

## Levels: Agree, Disagree

## [1] 1 2 1 1 2

fct2num(a, start = 0) # if likert scales begin at 0

## Levels: Agree, Disagree

## [1] 0 1 0 0 1

It prints out the order of the factor levels. Sometimes we might make a mistake by using as.numeric() on a factor without checking the levels of the factor, so we might end up mistakenly coding Neutral as 3, Disagree as 2 and Agree as 1. Printing out the order of the factor alerts us to any such occurrences.

b <- factor(c("Neutral","Disagree","Agree","Agree","Disagree"))
as.numeric(b) # as.numeric is silent, so this mistake might go undetected

## [1] 3 2 1 1 2

fct2num(b) # noisy, tells you that you are wrong

## Levels: Agree, Disagree, Neutral

## [1] 3 2 1 1 2

efa.diag()

This function bundles the calculation of KMO and Bartlett’s Test of Spehericity into one. The code for the function was adapted from a research methods module conducted by Professor Mike Cheung from the National University of Singapore.

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy estimates the proportion of variance in the sample that can be estimated by underlying factors. Higher values (>0.5) indicate that the data is suitable for factor analysis.

Bartlett’s Test of Sphericity is a statistical test of the independence of the variables, the null hypothesis is that all variables are independent. Thus, p values below 0.05 rejects the null hypothesis, suggests that the variables are correlated and factor analysis is meaningful.

efa.diag(HolzingerSwineford1939[,7:15])

## KMO:  0.75224 
## Bartlett's Test Of Sphericity:  904.1 , df =  36 , pvalue =  1.9121e-166

tall.loadings()

Factor loadings from factor analysis functions such as psych::fa() produce factor loading matrices in wide-format.
Tall versions of factor loading complement the wide matrices and allow us to better inspect the items and factor loadings of each individual factor.
This is a function created to automate the conversion to tall format, eliminating factor loadings below a certain threshold and sorting the output by factors or items.

myefa <- fa(Harman74.cor$cov, 4, fm = "wls")

## Loading required namespace: GPArotation

tall.loadings(myefa$Structure) # structure matrix

## # A tibble: 85 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 VisualPerception      WLS1  0.36 
##  2 Cubes                 WLS1  0.24 
##  3 PaperFormBoard        WLS1  0.290
##  4 Flags                 WLS1  0.37 
##  5 GeneralInformation    WLS1  0.79 
##  6 PargraphComprehension WLS1  0.82 
##  7 SentenceCompletion    WLS1  0.84 
##  8 WordClassification    WLS1  0.68 
##  9 WordMeaning           WLS1  0.86 
## 10 Addition              WLS1  0.28 
## # ... with 75 more rows

tall.loadings(myefa$loadings) # pattern matrix

## # A tibble: 38 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 GeneralInformation    WLS1  0.76 
##  2 PargraphComprehension WLS1  0.8  
##  3 SentenceCompletion    WLS1  0.87 
##  4 WordClassification    WLS1  0.56 
##  5 WordMeaning           WLS1  0.86 
##  6 Deduction             WLS1  0.33 
##  7 ProblemReasoning      WLS1  0.31 
##  8 SeriesCompletion      WLS1  0.3  
##  9 ArithmeticProblems    WLS1  0.290
## 10 Addition              WLS2  0.86 
## # ... with 28 more rows

# define another threshold for factor loadings
tall.loadings(myefa$loadings, cut = 0.3)

## # A tibble: 30 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 GeneralInformation    WLS1   0.76
##  2 PargraphComprehension WLS1   0.8 
##  3 SentenceCompletion    WLS1   0.87
##  4 WordClassification    WLS1   0.56
##  5 WordMeaning           WLS1   0.86
##  6 Deduction             WLS1   0.33
##  7 ProblemReasoning      WLS1   0.31
##  8 SeriesCompletion      WLS1   0.3 
##  9 Addition              WLS2   0.86
## 10 Code                  WLS2   0.48
## # ... with 20 more rows

# plug it into a datatable for easy sorting and filtering
DT::datatable(tall.loadings(myefa$loadings),
              filter = "top")

aggregateScale()

Calculates the mean or sum of each row of a dataframe using rowMeans(), and only aggregating rows where rowwise-missingess is below a certain threshold.

In some scales, there are conditions that if a participant answers less than half the number items, his total score should be coded as missing. This can be achieved by setting NA.threshold = 0.5, only calculate sum scores for those with less than 0.5 missingness. 0.5 is the default.
This function bundles that feature along with rowMeans() or rowSums().

mydata <- data.frame(a = c(1:8,NA,NA), b = 1:10, c = c(1:2,rep(NA,8)))
mydata

##     a  b  c
## 1   1  1  1
## 2   2  2  2
## 3   3  3 NA
## 4   4  4 NA
## 5   5  5 NA
## 6   6  6 NA
## 7   7  7 NA
## 8   8  8 NA
## 9  NA  9 NA
## 10 NA 10 NA

aggregateScale(mydata, aggregate = "MEAN", NA.threshold = 0.5)

##  [1]  1  2  3  4  5  6  7  8 NA NA

# only calculate sum score for rows with less than 0.2 missingness
aggregateScale(mydata, aggregate = "SUM", NA.threshold = 0.2)

##  [1]  3  6 NA NA NA NA NA NA NA NA

vlookup()

We are familiar with the vlookup function from Excel but doing it in R is a little less straightforward. I use merge() to replicate the behavior of vlookup.

However, the arguments to merge() do not match well to the existing vlookup arguments that we know from Excel, so this function packages merge() into a function that resembles vlookup.

mydata <- data.frame(Type = rep(1:3,5))
mydata

##    Type
## 1     1
## 2     2
## 3     3
## 4     1
## 5     2
## 6     3
## 7     1
## 8     2
## 9     3
## 10    1
## 11    2
## 12    3
## 13    1
## 14    2
## 15    3

mytable <- data.frame(Type2 = 1:3, Magnitude = c("Low","Medium","High"))
mytable

##   Type2 Magnitude
## 1     1       Low
## 2     2    Medium
## 3     3      High

# create a new column in mydata that indicates whether
# the magnitude was low, medium or high 
# according to the type as indicated in mytable
vlookup(data = mydata, # the dataframe that will have the column appended
        lookup_table = mytable, # the table that contains the information to be added
        data_col = "Type", # which column in "data" should we match on
        lookup_col = "Type2", # which column in "lookup_table" should we match on
        lookup_value = "Magnitude") # which column in "lookup_table" should we extract the values from

##    Type Magnitude
## 1     1       Low
## 6     2    Medium
## 11    3      High
## 4     1       Low
## 9     2    Medium
## 14    3      High
## 3     1       Low
## 8     2    Medium
## 13    3      High
## 2     1       Low
## 7     2    Medium
## 12    3      High
## 5     1       Low
## 10    2    Medium
## 15    3      High

auto.class()

Sometimes we just want to quickly look through a dataset, but our data may have some variables which are wrongly classed, e.g. Age as a factor. This is a function that can be used to quickly loop through an entire dataset and change the class of each variable accordingly.

Takes in a vector as input, coerces the vector into a specific class using as.numeric(), factor() or as.character(). Coerced vector is returned as output. Output is determined based on the following conditions.

If the vector contains elements which are non-numeric and have less than X unique values, classify as factor.
If the vector contains elements which are non-numeric and have more than X unique values, classify as character.
If the vector contains elements which are all numeric and have more than X unique values, classify as numeric.
If the vector contains elements which are all numeric and have less than X unique values, classify as factor.

X can be manually defined by the user through the unique.thres.fac argument, defaults to 20.

# return numeric
auto.class(as.character(1:100))

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

# return factor
auto.class(rep(1:7,10))

##  [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
## [39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
## Levels: 1 2 3 4 5 6 7

# return character
auto.class(c(1:100,"hello"))

##   [1] "1"     "2"     "3"     "4"     "5"     "6"     "7"     "8"     "9"    
##  [10] "10"    "11"    "12"    "13"    "14"    "15"    "16"    "17"    "18"   
##  [19] "19"    "20"    "21"    "22"    "23"    "24"    "25"    "26"    "27"   
##  [28] "28"    "29"    "30"    "31"    "32"    "33"    "34"    "35"    "36"   
##  [37] "37"    "38"    "39"    "40"    "41"    "42"    "43"    "44"    "45"   
##  [46] "46"    "47"    "48"    "49"    "50"    "51"    "52"    "53"    "54"   
##  [55] "55"    "56"    "57"    "58"    "59"    "60"    "61"    "62"    "63"   
##  [64] "64"    "65"    "66"    "67"    "68"    "69"    "70"    "71"    "72"   
##  [73] "73"    "74"    "75"    "76"    "77"    "78"    "79"    "80"    "81"   
##  [82] "82"    "83"    "84"    "85"    "86"    "87"    "88"    "89"    "90"   
##  [91] "91"    "92"    "93"    "94"    "95"    "96"    "97"    "98"    "99"   
## [100] "100"   "hello"

# return factor
auto.class(c("Strongly Agree","Agree","Disagree","Strongly Disagree"))

## [1] Strongly Agree    Agree             Disagree          Strongly Disagree
## Levels: Agree Disagree Strongly Agree Strongly Disagree

# use with lapply to apply across entire dataframe
mydata <- data.frame(A = 1:100, B = c(1:99,"hello"))
mydata <- data.frame(lapply(mydata, FUN = auto.class))
str(mydata)

## 'data.frame':    100 obs. of  2 variables:
##  $ A: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ B: chr  "1" "2" "3" "4" ...

code.reverse()

Reverse-coding of items using factor(). One advantage over psych::reverse.code() is that input can be non-numeric.

mydata <- data.frame(myscale = c("Disagree", "Neutral", "Agree", "Neutral","Agree"))
mydata$myscale

## [1] "Disagree" "Neutral"  "Agree"    "Neutral"  "Agree"

code.reverse(vector = mydata$myscale, 
             original_levels = c("Disagree","Neutral","Agree"))

## [1] Agree    Neutral  Disagree Neutral  Disagree
## Levels: Disagree Neutral Agree

prettyalpha()

I found myself frequently using for loops and results = "asis" to automate the creation of various data reports. But results from psych::alpha() get messed up when alpha() is called within an R Markdown chunk with the option results = "asis". This function takes the output of alpha(), formats it, and feeds it to kable() to print out prettier tables.

myalpha <- alpha(HolzingerSwineford1939[,7:15])
# set chunk option to results = "asis"
prettyalpha(myalpha)

Cronbach Alpha:

	RAW_ALPHA	STD.ALPHA	G6(SMC)	AVERAGE_R	S/N	ASE	MEAN	SD	MEDIAN_R
	0.76	0.76	0.81	0.26	3.17	0.02	4.22	0.66	0.21

Alpha Values If Certain Items Were Dropped:

	RAW_ALPHA	STD.ALPHA	G6(SMC)	AVERAGE_R	S/N	ALPHA SE	VAR.R	MED.R
x1	0.72	0.73	0.78	0.25	2.64	0.02	0.04	0.19
x2	0.76	0.76	0.81	0.29	3.22	0.02	0.04	0.22
x3	0.75	0.75	0.80	0.27	2.97	0.02	0.04	0.21
x4	0.72	0.72	0.76	0.24	2.55	0.03	0.03	0.21
x5	0.72	0.73	0.76	0.25	2.64	0.02	0.03	0.21
x6	0.71	0.72	0.76	0.24	2.53	0.03	0.03	0.21
x7	0.77	0.76	0.80	0.29	3.25	0.02	0.03	0.22
x8	0.75	0.75	0.79	0.27	2.95	0.02	0.04	0.21
x9	0.73	0.73	0.78	0.25	2.68	0.02	0.04	0.18

Item-Level Statistics:

	N	RAW.R	STD.R	R.COR	R.DROP	MEAN	SD
x1	301	0.66	0.65	0.59	0.52	4.94	1.17
x2	301	0.45	0.44	0.32	0.28	6.09	1.18
x3	301	0.53	0.53	0.44	0.37	2.25	1.13
x4	301	0.70	0.69	0.68	0.58	3.06	1.16
x5	301	0.67	0.65	0.65	0.53	4.34	1.29
x6	301	0.71	0.69	0.69	0.60	2.19	1.10
x7	301	0.41	0.43	0.34	0.25	4.19	1.09
x8	301	0.51	0.54	0.46	0.37	5.53	1.01
x9	301	0.62	0.64	0.57	0.49	5.37	1.01

prettylavaan()

Similar to alpha() above, results from lavaan get messed up when summary(fit) is called within a for loop within an R Markdown chunk with the option results = "asis". This function is an alternative to summary() that can print out a pretty output when results = "asis".

# the famous Holzinger and Swineford (1939) example
HS.model <-  "visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9"
fit <- cfa(HS.model, data = HolzingerSwineford1939)
prettylavaan(fit, output_format = "asis")

Estimator: ML

Converged: TRUE

Iterations: 35

Original Sample Size: 301

Effective Sample Size: 301

Fit Indices:

	NPAR	CHISQ	DF	PVALUE	CFI	TLI	NNFI	NFI	RMSEA	SRMR	AIC	BIC
Values	21	85.306	24	0	0.931	0.896	0.896	0.907	0.092	0.065	7517.5	7595.3

Parameter Estimates:

LHS	OP	RHS	STD.ALL	EST	SE	Z	PVALUE
visual	=~	x1	0.772	1.000	0.000	NA	NA
visual	=~	x2	0.424	0.554	0.100	5.554	0
visual	=~	x3	0.581	0.729	0.109	6.685	0
textual	=~	x4	0.852	1.000	0.000	NA	NA
textual	=~	x5	0.855	1.113	0.065	17.014	0
textual	=~	x6	0.838	0.926	0.055	16.703	0
speed	=~	x7	0.570	1.000	0.000	NA	NA
speed	=~	x8	0.723	1.180	0.165	7.152	0
speed	=~	x9	0.665	1.082	0.151	7.155	0
x1	~~	x1	0.404	0.549	0.114	4.833	0
x2	~~	x2	0.821	1.134	0.102	11.146	0
x3	~~	x3	0.662	0.844	0.091	9.317	0
x4	~~	x4	0.275	0.371	0.048	7.779	0
x5	~~	x5	0.269	0.446	0.058	7.642	0
x6	~~	x6	0.298	0.356	0.043	8.277	0
x7	~~	x7	0.676	0.799	0.081	9.823	0
x8	~~	x8	0.477	0.488	0.074	6.573	0
x9	~~	x9	0.558	0.566	0.071	8.003	0
visual	~~	visual	1.000	0.809	0.145	5.564	0
textual	~~	textual	1.000	0.979	0.112	8.737	0
speed	~~	speed	1.000	0.384	0.086	4.451	0
visual	~~	textual	0.459	0.408	0.074	5.552	0
visual	~~	speed	0.471	0.262	0.056	4.660	0
textual	~~	speed	0.283	0.173	0.049	3.518	0

# using robust estimators
robustfit <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "MLM")
# request for robust fit indices
prettylavaan(robustfit, output_format = "asis", robust = TRUE)

Estimator: MLM