Correlation coefficient to measure the linearity of two variables
Correlation coefficient is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It is denoted by the symbol "r" and ranges from -1 to 1, with values closer to -1 indicating a negative correlation (inverse relationship), values closer to 1 indicating a positive correlation (direct relationship), and a value of 0 indicating no correlation.
Correlation coefficient is calculated using the formula:
r = (n sum(xy) - sum(x) sum(y)) / sqrt((n sum(x^2) - sum(x)^2) (n * sum(y^2) - sum(y)^2))
where n is the number of observations, x and y are the two variables being correlated, and sum() represents the sum of the values for that variable.
To calculate correlation coefficient using this formula, we need to first calculate the sum of the products of the two variables (xy), the sum of the values for each variable (x and y), and the sum of the squares of each variable (x^2 and y^2).
Once we have calculated the correlation coefficient, we can interpret its value to understand the relationship between the two variables. A value of -1 indicates a perfect negative correlation, where one variable decreases as the other increases. A value of 1 indicates a perfect positive correlation, where both variables increase or decrease together. A value of 0 indicates no linear relationship between the variables.