Minimum exact p
Description:
Different lists?
Here are two lists. They each represent a sample of observations of some value taken from some larger population of values:
g1 = [124, 118, 78, 123, 124, 124]
g2 = [127, 125, 125, 125, 172, 125]
Did these come from the same population? It's hard to say. We don't know what they are. Maybe the number of people enrolled in an introductory statistics class? Maybe systolic blood pressure measurements? Maybe my golf scores?
Some things to notice
You'll notice the smallest value in g2 is larger than the biggest in g1, but the variance of each sample is pretty big, so again, it's hard to say. Let's focus on the mean values: the mean of g1 is about 117.17
, for g2 it's 136.5
, and the absolute difference is about 19.33
.
Don't do it this way
We could compute a two-sided t-test on the sample means, but we'd be out on a limb with that one, since, again, we can't say anything about the distribution of the values in the population they come from. Lets do that (DON'T DO THAT): assuming equal variances (they're about equal), we get a p-value of 0.1265
. So they look a little different, but really not that different, and not different enough to satisfy the good folks at Guinness in 1908.
Do it this way
Lets try this: if they are from the same population, i.e., if the measurement is independent of the group assignment, then we can assign measurement values to whichever group we'd like. It's like probability with counting. We can partition the 12 measurements above, 6 in group g1 and 6 in g2, in all the different ways possible, and then count how many times the sample statistic (absolute difference between sample means) is equal or greater than the sample statistic (absolute difference between sample means) in observation we were given to begin with. This is what is known as a permutation test. Do that. Make it exact.
Task:
Write a function, exact_p(g1, g2)
that takes two samples g1 and g2, and computes a p-value for the difference in sample means using a permutation test.
Some things to remember:
- Given all possible partitions of the data into equal size groups,
p
is the proportion of those partitions with an ABSOLUTE difference in sample means EQUAL or GREATER than the original partition. - As implied by item 1,
0 < p < 1
- As with the example, samples g1 and g2 will always have the same length, have non-overlapping ranges, and you can assume equal variances.
- Your function will be tested with samples of length 2 to 9 (no empty lists, no lists of length 1)
- Results are rounded to 4 decimal places when tested, so I suppose you have a chance with a Monte Carlo approach.
Examples
exact_p([124, 118, 78, 123, 124, 124], [127, 125, 125, 125, 172, 125]) => 0.0021645021645021645
exact_p([12705, 12264, 12003, 12536], [13524, 13478, 12845, 13351]) => 0.02857142857142857
Note: rounding the result may result in some random tests expecting a value of 0. That doesn't mean it expects your p value to be 0, it just expects it to be less than 0.00005. The p value of an exact permutation test can never be 0. Understanding that will help you solve this kata!
If you want to try finding an exact p for two groups with overlapping ranges, try this one Exact p
Similar Kata:
Stats:
Created | Sep 13, 2017 |
Published | Sep 13, 2017 |
Warriors Trained | 708 |
Total Skips | 69 |
Total Code Submissions | 499 |
Total Times Completed | 67 |
Python Completions | 51 |
R Completions | 20 |
Total Stars | 28 |
% of votes with a positive feedback rating | 73% of 24 |
Total "Very Satisfied" Votes | 15 |
Total "Somewhat Satisfied" Votes | 5 |
Total "Not Satisfied" Votes | 4 |
Total Rank Assessments | 3 |
Average Assessed Rank | 5 kyu |
Highest Assessed Rank | 5 kyu |
Lowest Assessed Rank | 6 kyu |