This Repository My Website My Projects My Github My LinkedIn
aaRon
is a package created to consolidate the self-defined functions that I use in my daily work. The functions in this package are nothing ground-breaking, but they do make daily life a little easier.
To install the package directly from R/RStudio:
As I am psychology-trained, most of these functions are related to common analysis/manipulation in psychology. E.g, code.reverse()
made reverse-coding items easier, aggregateScale()
is a single function to calculate the sum/mean while ensuring that respondents with a certain percentage of missing items are coded as NA.
Some of these functions are built on existing functions from other packages such as:
lavaan
knitr
htmltools
tidyr
psych
This document is meant to be a showcase of these functions, coupled with short explanations on their usage. This will be updated when more functions are added. More details about the arguments of each function can be found in the help manual when the package is installed.
When knitting html files from R Markdown, this function calls a CSS from aaRon
which styles the knitted html document. This document is styled this way.
There are two options to choose from, work and casual. Main difference between the two are the colors, casual uses more flashy blue/green compared to work.
A function that transforms a factor vector to a numeric. The advantage over simply using as.numeric()
is that:
start
that can be used to determine the numeric value of the lowest level. In Psychology, we frequently have scales that start from zero, as.numeric()
will always recode the lowest level as 1.## Levels: Agree, Disagree
## [1] 1 2 1 1 2
## Levels: Agree, Disagree
## [1] 0 1 0 0 1
as.numeric()
on a factor without checking the levels of the factor, so we might end up mistakenly coding Neutral as 3, Disagree as 2 and Agree as 1. Printing out the order of the factor alerts us to any such occurrences.b <- factor(c("Neutral","Disagree","Agree","Agree","Disagree"))
as.numeric(b) # as.numeric is silent, so this mistake might go undetected
## [1] 3 2 1 1 2
## Levels: Agree, Disagree, Neutral
## [1] 3 2 1 1 2
This function bundles the calculation of KMO and Bartlett’s Test of Spehericity into one. The code for the function was adapted from a research methods module conducted by Professor Mike Cheung from the National University of Singapore.
The Kaiser-Meyer-Olkin Measure of Sampling Adequacy estimates the proportion of variance in the sample that can be estimated by underlying factors. Higher values (>0.5) indicate that the data is suitable for factor analysis.
Bartlett’s Test of Sphericity is a statistical test of the independence of the variables, the null hypothesis is that all variables are independent. Thus, p values below 0.05 rejects the null hypothesis, suggests that the variables are correlated and factor analysis is meaningful.
## KMO: 0.75224
## Bartlett's Test Of Sphericity: 904.1 , df = 36 , pvalue = 1.9121e-166
Factor loadings from factor analysis functions such as psych::fa()
produce factor loading matrices in wide-format.
Tall versions of factor loading complement the wide matrices and allow us to better inspect the items and factor loadings of each individual factor.
This is a function created to automate the conversion to tall format, eliminating factor loadings below a certain threshold and sorting the output by factors or items.
## Loading required namespace: GPArotation
## # A tibble: 85 x 3
## ITEM NAME VALUE
## <chr> <chr> <dbl>
## 1 VisualPerception WLS1 0.36
## 2 Cubes WLS1 0.24
## 3 PaperFormBoard WLS1 0.290
## 4 Flags WLS1 0.37
## 5 GeneralInformation WLS1 0.79
## 6 PargraphComprehension WLS1 0.82
## 7 SentenceCompletion WLS1 0.84
## 8 WordClassification WLS1 0.68
## 9 WordMeaning WLS1 0.86
## 10 Addition WLS1 0.28
## # ... with 75 more rows
## # A tibble: 38 x 3
## ITEM NAME VALUE
## <chr> <chr> <dbl>
## 1 GeneralInformation WLS1 0.76
## 2 PargraphComprehension WLS1 0.8
## 3 SentenceCompletion WLS1 0.87
## 4 WordClassification WLS1 0.56
## 5 WordMeaning WLS1 0.86
## 6 Deduction WLS1 0.33
## 7 ProblemReasoning WLS1 0.31
## 8 SeriesCompletion WLS1 0.3
## 9 ArithmeticProblems WLS1 0.290
## 10 Addition WLS2 0.86
## # ... with 28 more rows
## # A tibble: 30 x 3
## ITEM NAME VALUE
## <chr> <chr> <dbl>
## 1 GeneralInformation WLS1 0.76
## 2 PargraphComprehension WLS1 0.8
## 3 SentenceCompletion WLS1 0.87
## 4 WordClassification WLS1 0.56
## 5 WordMeaning WLS1 0.86
## 6 Deduction WLS1 0.33
## 7 ProblemReasoning WLS1 0.31
## 8 SeriesCompletion WLS1 0.3
## 9 Addition WLS2 0.86
## 10 Code WLS2 0.48
## # ... with 20 more rows
Calculates the mean or sum of each row of a dataframe using rowMeans()
, and only aggregating rows where rowwise-missingess is below a certain threshold.
NA.threshold = 0.5
, only calculate sum scores for those with less than 0.5 missingness. 0.5 is the default.rowMeans()
or rowSums()
.## a b c
## 1 1 1 1
## 2 2 2 2
## 3 3 3 NA
## 4 4 4 NA
## 5 5 5 NA
## 6 6 6 NA
## 7 7 7 NA
## 8 8 8 NA
## 9 NA 9 NA
## 10 NA 10 NA
## [1] 1 2 3 4 5 6 7 8 NA NA
# only calculate sum score for rows with less than 0.2 missingness
aggregateScale(mydata, aggregate = "SUM", NA.threshold = 0.2)
## [1] 3 6 NA NA NA NA NA NA NA NA
We are familiar with the vlookup function from Excel but doing it in R is a little less straightforward. I use merge()
to replicate the behavior of vlookup.
However, the arguments to merge()
do not match well to the existing vlookup arguments that we know from Excel, so this function packages merge()
into a function that resembles vlookup.
## Type
## 1 1
## 2 2
## 3 3
## 4 1
## 5 2
## 6 3
## 7 1
## 8 2
## 9 3
## 10 1
## 11 2
## 12 3
## 13 1
## 14 2
## 15 3
## Type2 Magnitude
## 1 1 Low
## 2 2 Medium
## 3 3 High
# create a new column in mydata that indicates whether
# the magnitude was low, medium or high
# according to the type as indicated in mytable
vlookup(data = mydata, # the dataframe that will have the column appended
lookup_table = mytable, # the table that contains the information to be added
data_col = "Type", # which column in "data" should we match on
lookup_col = "Type2", # which column in "lookup_table" should we match on
lookup_value = "Magnitude") # which column in "lookup_table" should we extract the values from
## Type Magnitude
## 1 1 Low
## 6 2 Medium
## 11 3 High
## 4 1 Low
## 9 2 Medium
## 14 3 High
## 3 1 Low
## 8 2 Medium
## 13 3 High
## 2 1 Low
## 7 2 Medium
## 12 3 High
## 5 1 Low
## 10 2 Medium
## 15 3 High
Sometimes we just want to quickly look through a dataset, but our data may have some variables which are wrongly classed, e.g. Age as a factor. This is a function that can be used to quickly loop through an entire dataset and change the class of each variable accordingly.
Takes in a vector as input, coerces the vector into a specific class using as.numeric()
, factor()
or as.character()
. Coerced vector is returned as output. Output is determined based on the following conditions.
X
unique values, classify as factor.X
unique values, classify as character.X
unique values, classify as numeric.X
unique values, classify as factor.X
can be manually defined by the user through the unique.thres.fac
argument, defaults to 20.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
## [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
## [39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
## Levels: 1 2 3 4 5 6 7
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9"
## [10] "10" "11" "12" "13" "14" "15" "16" "17" "18"
## [19] "19" "20" "21" "22" "23" "24" "25" "26" "27"
## [28] "28" "29" "30" "31" "32" "33" "34" "35" "36"
## [37] "37" "38" "39" "40" "41" "42" "43" "44" "45"
## [46] "46" "47" "48" "49" "50" "51" "52" "53" "54"
## [55] "55" "56" "57" "58" "59" "60" "61" "62" "63"
## [64] "64" "65" "66" "67" "68" "69" "70" "71" "72"
## [73] "73" "74" "75" "76" "77" "78" "79" "80" "81"
## [82] "82" "83" "84" "85" "86" "87" "88" "89" "90"
## [91] "91" "92" "93" "94" "95" "96" "97" "98" "99"
## [100] "100" "hello"
## [1] Strongly Agree Agree Disagree Strongly Disagree
## Levels: Agree Disagree Strongly Agree Strongly Disagree
# use with lapply to apply across entire dataframe
mydata <- data.frame(A = 1:100, B = c(1:99,"hello"))
mydata <- data.frame(lapply(mydata, FUN = auto.class))
str(mydata)
## 'data.frame': 100 obs. of 2 variables:
## $ A: num 1 2 3 4 5 6 7 8 9 10 ...
## $ B: chr "1" "2" "3" "4" ...
Reverse-coding of items using factor()
. One advantage over psych::reverse.code()
is that input can be non-numeric.
## [1] "Disagree" "Neutral" "Agree" "Neutral" "Agree"
## [1] Agree Neutral Disagree Neutral Disagree
## Levels: Disagree Neutral Agree
I found myself frequently using for
loops and results = "asis"
to automate the creation of various data reports. But results from psych::alpha()
get messed up when alpha()
is called within an R Markdown chunk with the option results = "asis"
. This function takes the output of alpha()
, formats it, and feeds it to kable()
to print out prettier tables.
myalpha <- alpha(HolzingerSwineford1939[,7:15])
# set chunk option to results = "asis"
prettyalpha(myalpha)
Cronbach Alpha:
RAW_ALPHA | STD.ALPHA | G6(SMC) | AVERAGE_R | S/N | ASE | MEAN | SD | MEDIAN_R | |
---|---|---|---|---|---|---|---|---|---|
0.76 | 0.76 | 0.81 | 0.26 | 3.17 | 0.02 | 4.22 | 0.66 | 0.21 |
Alpha Values If Certain Items Were Dropped:
RAW_ALPHA | STD.ALPHA | G6(SMC) | AVERAGE_R | S/N | ALPHA SE | VAR.R | MED.R | |
---|---|---|---|---|---|---|---|---|
x1 | 0.72 | 0.73 | 0.78 | 0.25 | 2.64 | 0.02 | 0.04 | 0.19 |
x2 | 0.76 | 0.76 | 0.81 | 0.29 | 3.22 | 0.02 | 0.04 | 0.22 |
x3 | 0.75 | 0.75 | 0.80 | 0.27 | 2.97 | 0.02 | 0.04 | 0.21 |
x4 | 0.72 | 0.72 | 0.76 | 0.24 | 2.55 | 0.03 | 0.03 | 0.21 |
x5 | 0.72 | 0.73 | 0.76 | 0.25 | 2.64 | 0.02 | 0.03 | 0.21 |
x6 | 0.71 | 0.72 | 0.76 | 0.24 | 2.53 | 0.03 | 0.03 | 0.21 |
x7 | 0.77 | 0.76 | 0.80 | 0.29 | 3.25 | 0.02 | 0.03 | 0.22 |
x8 | 0.75 | 0.75 | 0.79 | 0.27 | 2.95 | 0.02 | 0.04 | 0.21 |
x9 | 0.73 | 0.73 | 0.78 | 0.25 | 2.68 | 0.02 | 0.04 | 0.18 |
Item-Level Statistics:
N | RAW.R | STD.R | R.COR | R.DROP | MEAN | SD | |
---|---|---|---|---|---|---|---|
x1 | 301 | 0.66 | 0.65 | 0.59 | 0.52 | 4.94 | 1.17 |
x2 | 301 | 0.45 | 0.44 | 0.32 | 0.28 | 6.09 | 1.18 |
x3 | 301 | 0.53 | 0.53 | 0.44 | 0.37 | 2.25 | 1.13 |
x4 | 301 | 0.70 | 0.69 | 0.68 | 0.58 | 3.06 | 1.16 |
x5 | 301 | 0.67 | 0.65 | 0.65 | 0.53 | 4.34 | 1.29 |
x6 | 301 | 0.71 | 0.69 | 0.69 | 0.60 | 2.19 | 1.10 |
x7 | 301 | 0.41 | 0.43 | 0.34 | 0.25 | 4.19 | 1.09 |
x8 | 301 | 0.51 | 0.54 | 0.46 | 0.37 | 5.53 | 1.01 |
x9 | 301 | 0.62 | 0.64 | 0.57 | 0.49 | 5.37 | 1.01 |
Similar to alpha()
above, results from lavaan
get messed up when summary(fit)
is called within a for
loop within an R Markdown chunk with the option results = "asis"
. This function is an alternative to summary()
that can print out a pretty output when results = "asis"
.
# the famous Holzinger and Swineford (1939) example
HS.model <- "visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9"
fit <- cfa(HS.model, data = HolzingerSwineford1939)
prettylavaan(fit, output_format = "asis")
Estimator: ML
Converged: TRUE
Iterations: 35
Original Sample Size: 301
Effective Sample Size: 301
Fit Indices:
NPAR | CHISQ | DF | PVALUE | CFI | TLI | NNFI | NFI | RMSEA | SRMR | AIC | BIC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Values | 21 | 85.306 | 24 | 0 | 0.931 | 0.896 | 0.896 | 0.907 | 0.092 | 0.065 | 7517.5 | 7595.3 |
Parameter Estimates:
LHS | OP | RHS | STD.ALL | EST | SE | Z | PVALUE |
---|---|---|---|---|---|---|---|
visual | =~ | x1 | 0.772 | 1.000 | 0.000 | NA | NA |
visual | =~ | x2 | 0.424 | 0.554 | 0.100 | 5.554 | 0 |
visual | =~ | x3 | 0.581 | 0.729 | 0.109 | 6.685 | 0 |
textual | =~ | x4 | 0.852 | 1.000 | 0.000 | NA | NA |
textual | =~ | x5 | 0.855 | 1.113 | 0.065 | 17.014 | 0 |
textual | =~ | x6 | 0.838 | 0.926 | 0.055 | 16.703 | 0 |
speed | =~ | x7 | 0.570 | 1.000 | 0.000 | NA | NA |
speed | =~ | x8 | 0.723 | 1.180 | 0.165 | 7.152 | 0 |
speed | =~ | x9 | 0.665 | 1.082 | 0.151 | 7.155 | 0 |
x1 | ~~ | x1 | 0.404 | 0.549 | 0.114 | 4.833 | 0 |
x2 | ~~ | x2 | 0.821 | 1.134 | 0.102 | 11.146 | 0 |
x3 | ~~ | x3 | 0.662 | 0.844 | 0.091 | 9.317 | 0 |
x4 | ~~ | x4 | 0.275 | 0.371 | 0.048 | 7.779 | 0 |
x5 | ~~ | x5 | 0.269 | 0.446 | 0.058 | 7.642 | 0 |
x6 | ~~ | x6 | 0.298 | 0.356 | 0.043 | 8.277 | 0 |
x7 | ~~ | x7 | 0.676 | 0.799 | 0.081 | 9.823 | 0 |
x8 | ~~ | x8 | 0.477 | 0.488 | 0.074 | 6.573 | 0 |
x9 | ~~ | x9 | 0.558 | 0.566 | 0.071 | 8.003 | 0 |
visual | ~~ | visual | 1.000 | 0.809 | 0.145 | 5.564 | 0 |
textual | ~~ | textual | 1.000 | 0.979 | 0.112 | 8.737 | 0 |
speed | ~~ | speed | 1.000 | 0.384 | 0.086 | 4.451 | 0 |
visual | ~~ | textual | 0.459 | 0.408 | 0.074 | 5.552 | 0 |
visual | ~~ | speed | 0.471 | 0.262 | 0.056 | 4.660 | 0 |
textual | ~~ | speed | 0.283 | 0.173 | 0.049 | 3.518 | 0 |
# using robust estimators
robustfit <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "MLM")
# request for robust fit indices
prettylavaan(robustfit, output_format = "asis", robust = TRUE)
Estimator: MLM
Converged: TRUE
Iterations: 35
Original Sample Size: 301
Effective Sample Size: 301
Fit Indices:
NPAR | CHISQ | DF | PVALUE | CFI | TLI | NNFI | NFI | RMSEA | SRMR | AIC | BIC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Naive | 21 | 85.306 | 24 | 0 | 0.931 | 0.896 | 0.896 | 0.907 | 0.092 | 0.065 | 7517.5 | 7595.3 |
.Scaled | 21 | 80.872 | 24 | 0 | 0.925 | 0.887 | 0.887 | 0.898 | 0.089 | NA | NA | NA |
.Robust | 21 | NA | NA | NA | 0.932 | 0.897 | 0.897 | NA | 0.091 | NA | NA | NA |
Parameter Estimates:
LHS | OP | RHS | STD.ALL | EST | SE | Z | PVALUE |
---|---|---|---|---|---|---|---|
visual | =~ | x1 | 0.772 | 1.000 | 0.000 | NA | NA |
visual | =~ | x2 | 0.424 | 0.554 | 0.103 | 5.359 | 0.000 |
visual | =~ | x3 | 0.581 | 0.729 | 0.115 | 6.367 | 0.000 |
textual | =~ | x4 | 0.852 | 1.000 | 0.000 | NA | NA |
textual | =~ | x5 | 0.855 | 1.113 | 0.066 | 16.762 | 0.000 |
textual | =~ | x6 | 0.838 | 0.926 | 0.060 | 15.497 | 0.000 |
speed | =~ | x7 | 0.570 | 1.000 | 0.000 | NA | NA |
speed | =~ | x8 | 0.723 | 1.180 | 0.152 | 7.758 | 0.000 |
speed | =~ | x9 | 0.665 | 1.082 | 0.132 | 8.169 | 0.000 |
x1 | ~~ | x1 | 0.404 | 0.549 | 0.138 | 3.968 | 0.000 |
x2 | ~~ | x2 | 0.821 | 1.134 | 0.107 | 10.554 | 0.000 |
x3 | ~~ | x3 | 0.662 | 0.844 | 0.085 | 9.985 | 0.000 |
x4 | ~~ | x4 | 0.275 | 0.371 | 0.050 | 7.423 | 0.000 |
x5 | ~~ | x5 | 0.269 | 0.446 | 0.058 | 7.688 | 0.000 |
x6 | ~~ | x6 | 0.298 | 0.356 | 0.046 | 7.700 | 0.000 |
x7 | ~~ | x7 | 0.676 | 0.799 | 0.079 | 10.168 | 0.000 |
x8 | ~~ | x8 | 0.477 | 0.488 | 0.074 | 6.567 | 0.000 |
x9 | ~~ | x9 | 0.558 | 0.566 | 0.068 | 8.332 | 0.000 |
visual | ~~ | visual | 1.000 | 0.809 | 0.167 | 4.837 | 0.000 |
textual | ~~ | textual | 1.000 | 0.979 | 0.121 | 8.109 | 0.000 |
speed | ~~ | speed | 1.000 | 0.384 | 0.083 | 4.635 | 0.000 |
visual | ~~ | textual | 0.459 | 0.408 | 0.082 | 4.966 | 0.000 |
visual | ~~ | speed | 0.471 | 0.262 | 0.055 | 4.762 | 0.000 |
textual | ~~ | speed | 0.283 | 0.173 | 0.055 | 3.139 | 0.002 |
There is also an option to output a datatable
for the parameters by specifying output_format = "datatable"
## Estimator: MLM
## Converged: TRUE
## Iterations: 35
## Original Sample Size: 301
## Effective Sample Size: 301
##
## Fit Indices:
## NPAR CHISQ DF PVALUE CFI TLI NNFI NFI RMSEA SRMR AIC BIC
## Naive 21 85.306 24 0 0.931 0.896 0.896 0.907 0.092 0.065 7517.5 7595.3
## .Scaled 21 80.872 24 0 0.925 0.887 0.887 0.898 0.089 NA NA NA
## .Robust 21 NA NA NA 0.932 0.897 0.897 NA 0.091 NA NA NA
##
##
##
## Parameter Estimates:
Automatically append relative proportions/percentages to raw counts calculated from table()
. Currently only works for 1 and 2 way frequency tables.
mydata <- data.frame(x1 = rep(c("Male","Female","Male"), 100),
x2 = factor(rep(c("High","Medium","Low","Medium"),75),
levels = c("Low","Medium","High")))
# two way frequency table
a <- table(mydata$x1,mydata$x2)
prettytable(a)
##
## Low Medium High
## Female 25 (25%) 50 (50%) 25 (25%)
## Male 50 (25%) 100 (50%) 50 (25%)
Low | Medium | High | |
---|---|---|---|
Female | 25 (25%) | 50 (50%) | 25 (25%) |
Male | 50 (25%) | 100 (50%) | 50 (25%) |
##
## Female Male
## 100 (33%) 200 (67%)