Find correlations with a specific variable via CalculateAllCorrelations
What is this?
In healthcare (as in other fields) it's often helpful to understand the relationships between the variables in one's dataset. This provides that functionality by finding the correlations for all numeric columns in a particular dataset.
Why is it helpful?
You can quickly see the relationships present in your data.
So, how do we do it?
- First, we'll load healthcareai, create a fake dataset on which to work, and look at it:
library(healthcareai)
df <- data.frame(a=c(1,2,3,4,5,6),
b=c(6,5,4,3,2,1),
c=c(3,4,2,1,3,5),
d=c('M','F','F','F','M','F')) #<- is ignored
head(df)
- Next, we'll find the correlations between all numeric columns in the dataset represented by
df.
res <- calculateAllCorrelations(df)
res
Function specs for CalculateAllCorrelations
- Return: a data frame of same length as input data frame, but three columns wide.
- Arguments:
- df: a data frame. This dataset contains at least two numeric columns.
We use the Pearson correlation coefficient.
Full example code
library(healthcareai)
df <- data.frame(a=c(1,2,3,4,5,6),
b=c(6,5,4,3,2,1),
c=c(3,4,2,1,3,5),
d=c('M','F','F','F','M','F')) #<- is ignored
head(df)
res <- calculateAllCorrelations(df)
res