Beta
Intro to Statistics - Part 3: Correlation Coefficients
Loading description...
Statistics
Algorithms
Data Science
View
This comment has been reported as {{ abuseKindText }}.
Show
This comment has been hidden. You can view it now .
This comment can not be viewed.
- |
- Reply
- Edit
- View Solution
- Expand 1 Reply Expand {{ comments?.length }} replies
- Collapse
- Spoiler
- Remove
- Remove comment & replies
- Report
{{ fetchSolutionsError }}
-
-
Your rendered github-flavored markdown will appear here.
-
Label this discussion...
-
No Label
Keep the comment unlabeled if none of the below applies.
-
Issue
Use the issue label when reporting problems with the kata.
Be sure to explain the problem clearly and include the steps to reproduce. -
Suggestion
Use the suggestion label if you have feedback on how this kata can be improved.
-
Question
Use the question label if you have questions and/or need help solving the kata.
Don't forget to mention the language you're using, and mark as having spoiler if you include your solution.
-
No Label
- Cancel
Commenting is not allowed on this discussion
You cannot view this solution
There is no solution to show
Please sign in or sign up to leave a comment.
If you want to actually test our knowledge of
is_correlation_causal
, you can't just put in a return value in initial code that can pass the tests: it doesn't teach anything if we don't even to touch it in the first place. At least it should be something likepass
.Even the linked Wikipedia article contains multiple definitions of correlation coefficient, so it's unclear which one is expected.
In general, what would really make description and examples clearer is making the DIMENSIONS of the intermediate values clearly visible. Some things are arrays equals in dimension to the original data, others are just scalars. Also, some values are calculated separately for each source data sequence, and some combine data from both sequences. Making these things clear for m, d, v, cd, cv, pd, cc would really help.
I've tried to add more detail without actually supplying the algorithm.
Any better?
Better, but still strange. It seems that either I'm reading your words wrong, or there's a flaw in the description.
Here's a sample of data from my calculation that is performed (or it seems to me so) according to your description, and yet the results are clearly wrong:
This comment has been hidden.
Any joy?
Hmm, the trouble is I don't see your previous 'spoiler' comment that is probably answer to my 'spoiler' comment. :(
Could you please 'unspoiler' it for some time for me to read?
Update: oh, now I see it. It's pretty clear, thanks.
Well, in the kata description you say: "Variance [v]: A squared deviation. One per value". If in fact is should be per-variable, could you please correct it?
Oh, I thought I had, my mistake, wait a moment!
OK, all donw now!
And the Co-variance definition also seems wrong therefore: "The product of the variance of two variables. One per pair"
In examples there's also a couple of strange lines:
What all those x[0] and y[0] mean??
And also
If co-variance is a product of variances, it's a single value. So what is a SUM of co-varianceS?
OK, I think we are there now! :)
This comment has been hidden.
This line in the example is really unclear:
cd(1, [1, 2, 3, 4, 5], 5, [1, 2, 3, 4, 5]) = -4
In the description it is said:
Co-deviation [cd]: The product of the deviation of two variables
Which seems to mean that if deviations for two variables are (da1, da2, ..., dan) and (db1, db2, ..., dbn) correspondingly, than co-deviation is (da1 * db1, da2 * db2, ..., dan * dvn). I don't see how this relates to the
cd
example above.What are the parameters of
cd
function? Why the same [1, 2, 3, 4, 5] is passed there twice? What are 1 and 5, passed additionally?Ok I've simplified this, and made it clear its two different values from two different sample sets.