This Repository My Website My Projects My Github My LinkedIn

library(aaRon)
library(psych)
library(lavaan)

aaRon

aaRon is a package created to consolidate the self-defined functions that I use in my daily work. The functions in this package are nothing ground-breaking, but they do make daily life a little easier.

To install the package directly from R/RStudio:

devtools::install_github("Aaron0696/aaRon")

As I am psychology-trained, most of these functions are related to common analysis/manipulation in psychology. E.g, code.reverse() made reverse-coding items easier, aggregateScale() is a single function to calculate the sum/mean while ensuring that respondents with a certain percentage of missing items are coded as NA.

Some of these functions are built on existing functions from other packages such as:

  • lavaan
  • knitr
  • htmltools
  • tidyr
  • psych

This document is meant to be a showcase of these functions, coupled with short explanations on their usage. This will be updated when more functions are added. More details about the arguments of each function can be found in the help manual when the package is installed.

?aggregateScale

use.style()

When knitting html files from R Markdown, this function calls a CSS from aaRon which styles the knitted html document. This document is styled this way.

There are two options to choose from, work and casual. Main difference between the two are the colors, casual uses more flashy blue/green compared to work.

aaRon::use.style(mode = "casual")
# aaRon::use.style(mode = "work")

fct2num()

A function that transforms a factor vector to a numeric. The advantage over simply using as.numeric() is that:

  1. There is an additional argument start that can be used to determine the numeric value of the lowest level. In Psychology, we frequently have scales that start from zero, as.numeric() will always recode the lowest level as 1.
a <- factor(c("Agree","Disagree","Agree","Agree","Disagree"))
fct2num(a)
## Levels: Agree, Disagree
## [1] 1 2 1 1 2
fct2num(a, start = 0) # if likert scales begin at 0
## Levels: Agree, Disagree
## [1] 0 1 0 0 1
  1. It prints out the order of the factor levels. Sometimes we might make a mistake by using as.numeric() on a factor without checking the levels of the factor, so we might end up mistakenly coding Neutral as 3, Disagree as 2 and Agree as 1. Printing out the order of the factor alerts us to any such occurrences.
b <- factor(c("Neutral","Disagree","Agree","Agree","Disagree"))
as.numeric(b) # as.numeric is silent, so this mistake might go undetected
## [1] 3 2 1 1 2
fct2num(b) # noisy, tells you that you are wrong
## Levels: Agree, Disagree, Neutral
## [1] 3 2 1 1 2

efa.diag()

This function bundles the calculation of KMO and Bartlett’s Test of Spehericity into one. The code for the function was adapted from a research methods module conducted by Professor Mike Cheung from the National University of Singapore.

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy estimates the proportion of variance in the sample that can be estimated by underlying factors. Higher values (>0.5) indicate that the data is suitable for factor analysis.

Bartlett’s Test of Sphericity is a statistical test of the independence of the variables, the null hypothesis is that all variables are independent. Thus, p values below 0.05 rejects the null hypothesis, suggests that the variables are correlated and factor analysis is meaningful.

efa.diag(HolzingerSwineford1939[,7:15])
## KMO:  0.75224 
## Bartlett's Test Of Sphericity:  904.1 , df =  36 , pvalue =  1.9121e-166

tall.loadings()

Factor loadings from factor analysis functions such as psych::fa() produce factor loading matrices in wide-format.
Tall versions of factor loading complement the wide matrices and allow us to better inspect the items and factor loadings of each individual factor.
This is a function created to automate the conversion to tall format, eliminating factor loadings below a certain threshold and sorting the output by factors or items.

myefa <- fa(Harman74.cor$cov, 4, fm = "wls")
## Loading required namespace: GPArotation
tall.loadings(myefa$Structure) # structure matrix
## # A tibble: 85 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 VisualPerception      WLS1  0.36 
##  2 Cubes                 WLS1  0.24 
##  3 PaperFormBoard        WLS1  0.290
##  4 Flags                 WLS1  0.37 
##  5 GeneralInformation    WLS1  0.79 
##  6 PargraphComprehension WLS1  0.82 
##  7 SentenceCompletion    WLS1  0.84 
##  8 WordClassification    WLS1  0.68 
##  9 WordMeaning           WLS1  0.86 
## 10 Addition              WLS1  0.28 
## # ... with 75 more rows
tall.loadings(myefa$loadings) # pattern matrix
## # A tibble: 38 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 GeneralInformation    WLS1  0.76 
##  2 PargraphComprehension WLS1  0.8  
##  3 SentenceCompletion    WLS1  0.87 
##  4 WordClassification    WLS1  0.56 
##  5 WordMeaning           WLS1  0.86 
##  6 Deduction             WLS1  0.33 
##  7 ProblemReasoning      WLS1  0.31 
##  8 SeriesCompletion      WLS1  0.3  
##  9 ArithmeticProblems    WLS1  0.290
## 10 Addition              WLS2  0.86 
## # ... with 28 more rows
# define another threshold for factor loadings
tall.loadings(myefa$loadings, cut = 0.3)
## # A tibble: 30 x 3
##    ITEM                  NAME  VALUE
##    <chr>                 <chr> <dbl>
##  1 GeneralInformation    WLS1   0.76
##  2 PargraphComprehension WLS1   0.8 
##  3 SentenceCompletion    WLS1   0.87
##  4 WordClassification    WLS1   0.56
##  5 WordMeaning           WLS1   0.86
##  6 Deduction             WLS1   0.33
##  7 ProblemReasoning      WLS1   0.31
##  8 SeriesCompletion      WLS1   0.3 
##  9 Addition              WLS2   0.86
## 10 Code                  WLS2   0.48
## # ... with 20 more rows
# plug it into a datatable for easy sorting and filtering
DT::datatable(tall.loadings(myefa$loadings),
              filter = "top")

aggregateScale()

Calculates the mean or sum of each row of a dataframe using rowMeans(), and only aggregating rows where rowwise-missingess is below a certain threshold.

  • In some scales, there are conditions that if a participant answers less than half the number items, his total score should be coded as missing. This can be achieved by setting NA.threshold = 0.5, only calculate sum scores for those with less than 0.5 missingness. 0.5 is the default.
  • This function bundles that feature along with rowMeans() or rowSums().
mydata <- data.frame(a = c(1:8,NA,NA), b = 1:10, c = c(1:2,rep(NA,8)))
mydata
##     a  b  c
## 1   1  1  1
## 2   2  2  2
## 3   3  3 NA
## 4   4  4 NA
## 5   5  5 NA
## 6   6  6 NA
## 7   7  7 NA
## 8   8  8 NA
## 9  NA  9 NA
## 10 NA 10 NA
aggregateScale(mydata, aggregate = "MEAN", NA.threshold = 0.5)
##  [1]  1  2  3  4  5  6  7  8 NA NA
# only calculate sum score for rows with less than 0.2 missingness
aggregateScale(mydata, aggregate = "SUM", NA.threshold = 0.2)
##  [1]  3  6 NA NA NA NA NA NA NA NA

vlookup()

We are familiar with the vlookup function from Excel but doing it in R is a little less straightforward. I use merge() to replicate the behavior of vlookup.

However, the arguments to merge() do not match well to the existing vlookup arguments that we know from Excel, so this function packages merge() into a function that resembles vlookup.

mydata <- data.frame(Type = rep(1:3,5))
mydata
##    Type
## 1     1
## 2     2
## 3     3
## 4     1
## 5     2
## 6     3
## 7     1
## 8     2
## 9     3
## 10    1
## 11    2
## 12    3
## 13    1
## 14    2
## 15    3
mytable <- data.frame(Type2 = 1:3, Magnitude = c("Low","Medium","High"))
mytable
##   Type2 Magnitude
## 1     1       Low
## 2     2    Medium
## 3     3      High
# create a new column in mydata that indicates whether
# the magnitude was low, medium or high 
# according to the type as indicated in mytable
vlookup(data = mydata, # the dataframe that will have the column appended
        lookup_table = mytable, # the table that contains the information to be added
        data_col = "Type", # which column in "data" should we match on
        lookup_col = "Type2", # which column in "lookup_table" should we match on
        lookup_value = "Magnitude") # which column in "lookup_table" should we extract the values from
##    Type Magnitude
## 1     1       Low
## 6     2    Medium
## 11    3      High
## 4     1       Low
## 9     2    Medium
## 14    3      High
## 3     1       Low
## 8     2    Medium
## 13    3      High
## 2     1       Low
## 7     2    Medium
## 12    3      High
## 5     1       Low
## 10    2    Medium
## 15    3      High

auto.class()

Sometimes we just want to quickly look through a dataset, but our data may have some variables which are wrongly classed, e.g. Age as a factor. This is a function that can be used to quickly loop through an entire dataset and change the class of each variable accordingly.

Takes in a vector as input, coerces the vector into a specific class using as.numeric(), factor() or as.character(). Coerced vector is returned as output. Output is determined based on the following conditions.

  • If the vector contains elements which are non-numeric and have less than X unique values, classify as factor.
  • If the vector contains elements which are non-numeric and have more than X unique values, classify as character.
  • If the vector contains elements which are all numeric and have more than X unique values, classify as numeric.
  • If the vector contains elements which are all numeric and have less than X unique values, classify as factor.

X can be manually defined by the user through the unique.thres.fac argument, defaults to 20.

# return numeric
auto.class(as.character(1:100))
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100
# return factor
auto.class(rep(1:7,10))
##  [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
## [39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
## Levels: 1 2 3 4 5 6 7
# return character
auto.class(c(1:100,"hello"))
##   [1] "1"     "2"     "3"     "4"     "5"     "6"     "7"     "8"     "9"    
##  [10] "10"    "11"    "12"    "13"    "14"    "15"    "16"    "17"    "18"   
##  [19] "19"    "20"    "21"    "22"    "23"    "24"    "25"    "26"    "27"   
##  [28] "28"    "29"    "30"    "31"    "32"    "33"    "34"    "35"    "36"   
##  [37] "37"    "38"    "39"    "40"    "41"    "42"    "43"    "44"    "45"   
##  [46] "46"    "47"    "48"    "49"    "50"    "51"    "52"    "53"    "54"   
##  [55] "55"    "56"    "57"    "58"    "59"    "60"    "61"    "62"    "63"   
##  [64] "64"    "65"    "66"    "67"    "68"    "69"    "70"    "71"    "72"   
##  [73] "73"    "74"    "75"    "76"    "77"    "78"    "79"    "80"    "81"   
##  [82] "82"    "83"    "84"    "85"    "86"    "87"    "88"    "89"    "90"   
##  [91] "91"    "92"    "93"    "94"    "95"    "96"    "97"    "98"    "99"   
## [100] "100"   "hello"
# return factor
auto.class(c("Strongly Agree","Agree","Disagree","Strongly Disagree"))
## [1] Strongly Agree    Agree             Disagree          Strongly Disagree
## Levels: Agree Disagree Strongly Agree Strongly Disagree
# use with lapply to apply across entire dataframe
mydata <- data.frame(A = 1:100, B = c(1:99,"hello"))
mydata <- data.frame(lapply(mydata, FUN = auto.class))
str(mydata)
## 'data.frame':    100 obs. of  2 variables:
##  $ A: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ B: chr  "1" "2" "3" "4" ...

code.reverse()

Reverse-coding of items using factor(). One advantage over psych::reverse.code() is that input can be non-numeric.

mydata <- data.frame(myscale = c("Disagree", "Neutral", "Agree", "Neutral","Agree"))
mydata$myscale
## [1] "Disagree" "Neutral"  "Agree"    "Neutral"  "Agree"
code.reverse(vector = mydata$myscale, 
             original_levels = c("Disagree","Neutral","Agree"))
## [1] Agree    Neutral  Disagree Neutral  Disagree
## Levels: Disagree Neutral Agree

prettyalpha()

I found myself frequently using for loops and results = "asis" to automate the creation of various data reports. But results from psych::alpha() get messed up when alpha() is called within an R Markdown chunk with the option results = "asis". This function takes the output of alpha(), formats it, and feeds it to kable() to print out prettier tables.

myalpha <- alpha(HolzingerSwineford1939[,7:15])
# set chunk option to results = "asis"
prettyalpha(myalpha)

Cronbach Alpha:

RAW_ALPHA STD.ALPHA G6(SMC) AVERAGE_R S/N ASE MEAN SD MEDIAN_R
0.76 0.76 0.81 0.26 3.17 0.02 4.22 0.66 0.21

Alpha Values If Certain Items Were Dropped:

RAW_ALPHA STD.ALPHA G6(SMC) AVERAGE_R S/N ALPHA SE VAR.R MED.R
x1 0.72 0.73 0.78 0.25 2.64 0.02 0.04 0.19
x2 0.76 0.76 0.81 0.29 3.22 0.02 0.04 0.22
x3 0.75 0.75 0.80 0.27 2.97 0.02 0.04 0.21
x4 0.72 0.72 0.76 0.24 2.55 0.03 0.03 0.21
x5 0.72 0.73 0.76 0.25 2.64 0.02 0.03 0.21
x6 0.71 0.72 0.76 0.24 2.53 0.03 0.03 0.21
x7 0.77 0.76 0.80 0.29 3.25 0.02 0.03 0.22
x8 0.75 0.75 0.79 0.27 2.95 0.02 0.04 0.21
x9 0.73 0.73 0.78 0.25 2.68 0.02 0.04 0.18

Item-Level Statistics:

N RAW.R STD.R R.COR R.DROP MEAN SD
x1 301 0.66 0.65 0.59 0.52 4.94 1.17
x2 301 0.45 0.44 0.32 0.28 6.09 1.18
x3 301 0.53 0.53 0.44 0.37 2.25 1.13
x4 301 0.70 0.69 0.68 0.58 3.06 1.16
x5 301 0.67 0.65 0.65 0.53 4.34 1.29
x6 301 0.71 0.69 0.69 0.60 2.19 1.10
x7 301 0.41 0.43 0.34 0.25 4.19 1.09
x8 301 0.51 0.54 0.46 0.37 5.53 1.01
x9 301 0.62 0.64 0.57 0.49 5.37 1.01

prettylavaan()

Similar to alpha() above, results from lavaan get messed up when summary(fit) is called within a for loop within an R Markdown chunk with the option results = "asis". This function is an alternative to summary() that can print out a pretty output when results = "asis".

# the famous Holzinger and Swineford (1939) example
HS.model <-  "visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9"
fit <- cfa(HS.model, data = HolzingerSwineford1939)
prettylavaan(fit, output_format = "asis")

Estimator: ML

Converged: TRUE

Iterations: 35

Original Sample Size: 301

Effective Sample Size: 301

Fit Indices:

NPAR CHISQ DF PVALUE CFI TLI NNFI NFI RMSEA SRMR AIC BIC
Values 21 85.306 24 0 0.931 0.896 0.896 0.907 0.092 0.065 7517.5 7595.3

Parameter Estimates:

LHS OP RHS STD.ALL EST SE Z PVALUE
visual =~ x1 0.772 1.000 0.000 NA NA
visual =~ x2 0.424 0.554 0.100 5.554 0
visual =~ x3 0.581 0.729 0.109 6.685 0
textual =~ x4 0.852 1.000 0.000 NA NA
textual =~ x5 0.855 1.113 0.065 17.014 0
textual =~ x6 0.838 0.926 0.055 16.703 0
speed =~ x7 0.570 1.000 0.000 NA NA
speed =~ x8 0.723 1.180 0.165 7.152 0
speed =~ x9 0.665 1.082 0.151 7.155 0
x1 ~~ x1 0.404 0.549 0.114 4.833 0
x2 ~~ x2 0.821 1.134 0.102 11.146 0
x3 ~~ x3 0.662 0.844 0.091 9.317 0
x4 ~~ x4 0.275 0.371 0.048 7.779 0
x5 ~~ x5 0.269 0.446 0.058 7.642 0
x6 ~~ x6 0.298 0.356 0.043 8.277 0
x7 ~~ x7 0.676 0.799 0.081 9.823 0
x8 ~~ x8 0.477 0.488 0.074 6.573 0
x9 ~~ x9 0.558 0.566 0.071 8.003 0
visual ~~ visual 1.000 0.809 0.145 5.564 0
textual ~~ textual 1.000 0.979 0.112 8.737 0
speed ~~ speed 1.000 0.384 0.086 4.451 0
visual ~~ textual 0.459 0.408 0.074 5.552 0
visual ~~ speed 0.471 0.262 0.056 4.660 0
textual ~~ speed 0.283 0.173 0.049 3.518 0
# using robust estimators
robustfit <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "MLM")
# request for robust fit indices
prettylavaan(robustfit, output_format = "asis", robust = TRUE)

Estimator: MLM

Converged: TRUE

Iterations: 35

Original Sample Size: 301

Effective Sample Size: 301

Fit Indices:

NPAR CHISQ DF PVALUE CFI TLI NNFI NFI RMSEA SRMR AIC BIC
Naive 21 85.306 24 0 0.931 0.896 0.896 0.907 0.092 0.065 7517.5 7595.3
.Scaled 21 80.872 24 0 0.925 0.887 0.887 0.898 0.089 NA NA NA
.Robust 21 NA NA NA 0.932 0.897 0.897 NA 0.091 NA NA NA

Parameter Estimates:

LHS OP RHS STD.ALL EST SE Z PVALUE
visual =~ x1 0.772 1.000 0.000 NA NA
visual =~ x2 0.424 0.554 0.103 5.359 0.000
visual =~ x3 0.581 0.729 0.115 6.367 0.000
textual =~ x4 0.852 1.000 0.000 NA NA
textual =~ x5 0.855 1.113 0.066 16.762 0.000
textual =~ x6 0.838 0.926 0.060 15.497 0.000
speed =~ x7 0.570 1.000 0.000 NA NA
speed =~ x8 0.723 1.180 0.152 7.758 0.000
speed =~ x9 0.665 1.082 0.132 8.169 0.000
x1 ~~ x1 0.404 0.549 0.138 3.968 0.000
x2 ~~ x2 0.821 1.134 0.107 10.554 0.000
x3 ~~ x3 0.662 0.844 0.085 9.985 0.000
x4 ~~ x4 0.275 0.371 0.050 7.423 0.000
x5 ~~ x5 0.269 0.446 0.058 7.688 0.000
x6 ~~ x6 0.298 0.356 0.046 7.700 0.000
x7 ~~ x7 0.676 0.799 0.079 10.168 0.000
x8 ~~ x8 0.477 0.488 0.074 6.567 0.000
x9 ~~ x9 0.558 0.566 0.068 8.332 0.000
visual ~~ visual 1.000 0.809 0.167 4.837 0.000
textual ~~ textual 1.000 0.979 0.121 8.109 0.000
speed ~~ speed 1.000 0.384 0.083 4.635 0.000
visual ~~ textual 0.459 0.408 0.082 4.966 0.000
visual ~~ speed 0.471 0.262 0.055 4.762 0.000
textual ~~ speed 0.283 0.173 0.055 3.139 0.002

There is also an option to output a datatable for the parameters by specifying output_format = "datatable"

prettylavaan(robustfit, output_format = "datatable", robust = TRUE)
## Estimator: MLM 
## Converged: TRUE 
## Iterations: 35 
## Original Sample Size: 301 
## Effective Sample Size: 301
## 
## Fit Indices:
##         NPAR  CHISQ DF PVALUE   CFI   TLI  NNFI   NFI RMSEA  SRMR    AIC    BIC
## Naive     21 85.306 24      0 0.931 0.896 0.896 0.907 0.092 0.065 7517.5 7595.3
## .Scaled   21 80.872 24      0 0.925 0.887 0.887 0.898 0.089    NA     NA     NA
## .Robust   21     NA NA     NA 0.932 0.897 0.897    NA 0.091    NA     NA     NA
## 
## 
## 
## Parameter Estimates:

prettytable()

Automatically append relative proportions/percentages to raw counts calculated from table(). Currently only works for 1 and 2 way frequency tables.

mydata <- data.frame(x1 = rep(c("Male","Female","Male"), 100),
                     x2 = factor(rep(c("High","Medium","Low","Medium"),75),
                                 levels = c("Low","Medium","High")))
# two way frequency table
a <- table(mydata$x1,mydata$x2)
prettytable(a)
##         
##          Low      Medium    High    
##   Female 25 (25%) 50 (50%)  25 (25%)
##   Male   50 (25%) 100 (50%) 50 (25%)
# feed it into kable for a prettier table
knitr::kable(prettytable(a), align = "c")
Low Medium High
Female 25 (25%) 50 (50%) 25 (25%)
Male 50 (25%) 100 (50%) 50 (25%)
# one way frequency table
a <- table(mydata$x1)
prettytable(a)
## 
##    Female      Male 
## 100 (33%) 200 (67%)