The CORREL function is used to find the correlation between two arrays. Because CORREL ignores text, you can use an array formula as one of the arguments of CORREL to limit it to a subset of the data.

To demonstrate, we’ll need some data. I fired up QuickSampleData to make fifty rows with a continent name in the first column and a two-digit number in the second.

Next I created the second array using RANDBETWEEN to give me data that had a relatively high correlation.

The formula needed to limit the correlation to Africa, with “Africa” in B1, is:

Enter that with Control+Shift+Enter, not just enter and Excel will put the curly braces in for you. I only needed to turn one of the arguments to text to get CORREL to ignore that row.

How to do regression or correlation on data subsets is a question that comes up fairly often in forums. Since Linest and Trend will just produce errors if there is any non-numeric data they cannot be used easily for this. The suggestions offered usually require convoluted workarounds to redimension the arrays or writing custom UDFs.

However, as shown by the example in this post, other functions such as Correl, Slope, Intercept and Forecast will ignore rows containing text or missing data which allows one to use simple array formulas for this purpose. And if you don’t want to use the annoying ctrl+shift+enter keystroke you can replace A3:A52=B1 with Mmult((A3:A52=B1)*1,1) in the posted formula.

To check results, one can filter data on the required subset and plot a linear trendline to see if Correl^2=R^2.

Btw, what’s the difference with Correl and Pearson? They look completely identical as far as I can tell and if so, why did MS create two separate functions to do the same thing?

What about correlation with two conditionals? How to do regression or correlation on data subsets with two conditionals?.

Thanks

Casuarz

What about correlation with two conditionals? 2

:)

Thank You for posting this!