5 kyu
Scraping: Codewars Top 500 Users
88 of 87510XL
Loading description...
Fundamentals
View
This comment has been reported as {{ abuseKindText }}.
Show
This comment has been hidden. You can view it now .
This comment can not be viewed.
- |
- Reply
- Edit
- View Solution
- Expand 1 Reply Expand {{ comments?.length }} replies
- Collapse
- Spoiler
- Remove
- Remove comment & replies
- Report
{{ fetchSolutionsError }}
-
-
Your rendered github-flavored markdown will appear here.
-
Label this discussion...
-
No Label
Keep the comment unlabeled if none of the below applies.
-
Issue
Use the issue label when reporting problems with the kata.
Be sure to explain the problem clearly and include the steps to reproduce. -
Suggestion
Use the suggestion label if you have feedback on how this kata can be improved.
-
Question
Use the question label if you have questions and/or need help solving the kata.
Don't forget to mention the language you're using, and mark as having spoiler if you include your solution.
-
No Label
- Cancel
Commenting is not allowed on this discussion
You cannot view this solution
There is no solution to show
Please sign in or sign up to leave a comment.
I liked this kata. The sample tests are out of date, however. The 1-based indexing threw me for a while.
Thanks for your comment. The sample tests are static/hard coded. They'll always become outdated. and sorry for the indexing :)
Is this kata still completable?
I tried doing it on JS, and I got this error:
Thanks for your comment. I've updated the JS version and it should be completable now.
Description mentions "the ith ranked User" but does not specify how to rank users. Since there is a rank inside the html, and this rank isn't used, this is a real issue.
It's mentioned to parse the html in the first sentence of the description. I've added a note to only use the Overall leaderboard table
You don't specify how to rank users. There is honor, a rank inside the html, and the oridinal position of the results.
Read the description. You're supposed to scrape the html. Why are you being vindictive because of your own frustration? Stop opening false issues.
"the ith ranked User" -> but you don't specify how to rank .. I'm going to have to log an issue about this
@10XL : there are actually bazillion of leaderboards in that page now (overall, completed kata, translations, rank, and rank per every single language option). So I'd say it's a good idea to add to the description that the user is supposed to work with the overall leaderboard
hi B4B. Thanks for letting me know. I've added that to the description.
You don't specify the type of "position". It could be interpreted as array, not as object, as intended.
That's a "learning element" of this kata. It's also hinted at in the description and in the comments. It would be easier for people to find out if you don't keep spamming issues.
How are these tests even made? I pass random tests all of a sudden, without passing sample tests.
sample tests are hardcoded and not aligned with real results, you might as well have no sample tests at all :/
this is one of the many reasons CW should NOT allow code scraping kata's
Not an issue. The sample tests are there as a template. TDD is a part of CW.
Not an issue? Remind me to never waste any more time on your kata's. You have 0 affinity with user experience or maintainability of kata's.
Users can have the same rank:
vintrom #209 Valefar #209
How do we determine rank ??????
so we are to ignore the rank as stipulated in the scraped content, and just use the order in the html elements as is :/
This comment has been hidden.
I strongly suggest you stay away from authoring kata's, you do more harm than good
so position is an object, not an array, figures..
user error, not an issue
bad specification, you don't mention the type of "position", it could be either array or object
This kata in JavaScript can't be done using Cheerio only. I've done it with Puppeteer but, unfortunately, I wasn't allowed to import Puppeteer here in order to submit my solution.
I'm also unable to solve this kata in JS. Not sure it is still possible.
You can use Axios to get the data. But anyway, this kata should have never been published. How on earth can a kata like this be maintained, it relies solely on the hopes the html structure of the web page does not change. Complete rubbish kata
This comment has been hidden.
I got following error in this kata: /usr/local/lib/ruby/2.5.0/net/protocol.rb:44:in
connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (unspecified certificate verification error) (OpenSSL::SSL::SSLError)
I took look at your code and the monkey patching of
Array#size
breaks things. There are some comments below on how to handle to the length issue.This comment has been hidden.
This comment has been hidden.
This kata was extremely efficient and useful! Thanks!!
Excellent Kata.
Markdown formatting is broken.
Stuck with that 1 based index for Ruby man. Need help on that :(
Great Kata @10XL! A Kata that covers various parts of the language, modules and also data structures XD
I don't understand, I can only pass all the cases on the attempt if I return a leaderboard with size 501 but I get an out of bounds error with size 500. How do I return a list of size 500 with 500 objects but the indexes have to start at 1?
Also, Im getting this error:
The code that caused this warning is on line 52 of the file main.py. To get rid of this warning, change code that looks like this:
BeautifulSoup(YOUR_MARKUP})
to this:
BeautifulSoup(YOUR_MARKUP, "html.parser")
markup_type=markup_type))
my code looks like this: soup = BeautifulSoup(page.content, "html.parser") why is that wrong?
Thanks
Check this comment thread. Hint: maybe there's another data structure you can try using.
In case somebody reads the comments before trying this Kata - the test files reference out-dated stats and thus make this impossible to pass!!!
I can't update sample tests every second to keep the stats up to date. You can fill these out yourself or run the random tests via the attempt button.
Then don't publish such a kata :/
I did this Kata using only built-in python modules. When attempting the Kata, I ran into very strange encoding issues. On some attempts, everything would match up fine and others, I would end up with values like "\xe4\xb8\xad\xe5\x9b\xbd". I ended up hardcoding a few of the non standard characters which allowed the Kata to complete successfully after a few random attempts. I would be interested to know if anyone knows why the random encoding/decoding occurs?
https://www.codewars.com/kata/reviews/5880131582f49ba990000df5/groups/5ed1c5fae88cf00001481d66 https://www.codewars.com/kata/reviews/5880131582f49ba990000df5/groups/5ed1c61d8792a0000199162e
It's really hard to have this in JS because you cant rely on the user resolving the promise.
This comment has been hidden.
No response... I'm going to say that's a cheating problem and not necessarily an issue with the kata.
Sorry, I dont get notifications for spoiler comments. They wouldnt pass without the expect functions because then no tests are done. In that case, the CW runner calls it failed.
test is running green but attempt gives me this : main.rb:65:in
process_data_sol': undefined method
map' for nil:NilClass (NoMethodError) from main.rb:74:inbuild_leaderboard' from main.rb:81:in
It wasn't anything on your side, it looks like the reference solution broke due to the leaderboard page's html changing. It should be working now.
Can the issue be closed then?
Yay! Thanks! :) nice one by the way thanks
obsolete. ranking has changed. Tests failed :(
That's the example tests. In the "Attempt", it will fetch the values on the fly. So click that isntead of run sample tests.
You're right, now it passed. Thank you for your response.
This comment has been hidden.
This comment has been hidden.
This isn't really a issue of the kata; but a issue of the library.
Also, why aren't people marking this thread as spoilers?
If you're asking us to scrape 500 users from the leaderboard, then you should test all of them, not 150 (in the best case scenario).
It was like that at first but the chances of the leaderboard entries(on CW) being different increase, leading to playing more of the "keep resubmitting" game of this kata. The amount of tested users was reduced to avoid having to spam resubmit "too many" times.
len(res) == 500
check after making the actual tests not garbage.If you're not competent to fix the issues, it doesn't mean they have to be ignored. And the fact that you're downvoting critical comments only makes you miserable.
I'm not the one downvoting you. I closed them because I don't see those as issues. These points have been discussed before and I've already addressed them. Check the other comments.
The leaderboard is a list. It makes sense to use a
list
. Now, why on earth are you forbidding its use by forcing 1-indexing and asking for the answer length to be500
? I realized that I could have used adict
instead after seeing other people's solution, but that's overkill in this case.Is g964 #1 or #0? Leaderboards start with 1... and I can't remember why else I chose to do that. I get your point about it being overkill but it's not really an issue.
Sample tests don't correspond to the actual state of the leaderboard.
They'll always eventually become out of sync since leaderboard changes and sample tests are fixed. There's a comment suggesting to run the full tests if a user doesn't want to update the sample.
This comment has been hidden.
You don't need request-promise. Resolve a native Promise with your leaderboard object.
This comment has been hidden.
This comment has been hidden.
What is your
solution
method returning? It should return a promise. The first issue is that you're notreturn
ing theaxios.get().then()
chain. The second issue is, insideaxios.get().then()
you're loggingleaderboard
when you should bereturn
ingleaderboard
. With those aside, you're formattingleaderboard
incorrectly, these issues will become evident once you resolve the 2 issues above and can debug the failed tests.So:
return axios.get(...).then(...)
console.log(leaderboard)return leaderboard
in Fixed Tests, if there is no clan, you must output 'None', and in Randomized Tests, if there is no clan, instead of 'None' show
None should equal ''
or am I misunderstanding something?
Hi,
Yes, what you're currently missing is that g964 is a bit of a joker and actually set is clan name to the string
'None'
. Meaning he actually has a clan. ;) (look at the current state of the leaderboard, the field isn't empty, there)thanks for the hint, it was a cool Kata!
Python:
Indexing on 1...Give me something to google please!? I am starting to feel like the code challange is less about the scraping (which is simple and takes like 5 min) and more about that. Or am I way over thinking this?
Nevermind, hadn't had my coffee yet
do I have to create a custom List class with a modified getitem method?
use rather something more appropriate... ;)
edit
.
My solution (in jupyter notebooks in my local environment, used lxml. But when I try in the CodeWards environment i get an import error
ImportError: No module named 'lxml'
Any ideas for how I get round this?
https://github.com/Codewars/codewars.com/wiki/Language-Python Here you can see the supported packages.
lxml is not available on CW, you can use BeautifulSoup.
Well, and unlike most of the comments here, I worked on this kata to practice OOP as I just learned about it 3 days ago!😁
I submitted my code time and time again.10 minutes later,I finish it.so you need to wait.
Good kata. It was interesting, but test cases is very uncomfortable. In the first time, i turned off them that to see my console.log()
This comment has been hidden.
Thanks for bringing this to my attention. The commas on CW leaderboard page are new. I updated the test cases to account for commas, previous solutions are invalidated.
The leaderboard has changed!
On my end the leaderboard page hasn't changed. If you're talking about the sample test cases, that obviously will be the case since it's hard coded. You should be updating the sample test cases, or you can just run the final tests.
The forced in hash access for
position[n]
is really annoying, why not make it a methodposition(n)
. I basically had to add wrap Array in a class that sutracts 1 from the index. And why test against the live site, a frozen copy of the leaderboard would work just as well.This comment has been hidden.
Thanks, that array hack is a bit cleaner that mine.
The Array/Hash accessor
[key]
is very specific to those data structures. If I would read that code in a year, I'd have a hard time seeing at a glance what myUsers
or yourPosition
is for. It just seemed to force a particular implementation and restricting freedom in solutions a little.Very funny! I run the script from my box and all is working as expected then I try to run it through the website and it fails. After 10 minute troublshooting the issue it turns out that the codewars LDB page just now decided to give up the ghost!
This comment has been hidden.
Encoding should not be
ascii
.Bravo!!!! Well done, this was a great kata. I learned a lot, had never done any web scrapping, so bit of a right of passage. If you are thinking of whether to do this Kata, I could not recommend it more. Also looking at the Kata test cases after solving it is well worth your time, some great code.
Thanks, this was fun! Regards, suic
@10XL, awesome kata! Big thanks for mentoring and test cases fixes. Cheers!)
This comment has been hidden.
Try
BeautifulSoup(txt)
orBeautifulSoup(txt, 'html.parser')
. The tests use Python's built in parser.damn, so simple... x)
thx
This comment has been hidden.
No known compatibility issue, I've sent you a PM on gitter.
Is this right way for Ruby?
I have got a lot of errors like below, however locally everything works okay:
Thanks for the help!
You don't necessarily need to use such classes, just make sure that that you return an object with a
position
attribute/property. I don't know about those particular errors, could you PM me your code on gitter as it's possible to peak at comments even if they're marked as spoilers.I looked into this and I think it's related to https://github.com/Codewars/codewars-runner-cli/issues/484, commenting out the
it
block, displays the error correctly. Unfortunately, you can't alter the real tests.@10XL, thank you so much for your quick answer!!! I will try it;)
#484 is already fixed ;-)
JS translation is broken/unsupported.
Switched to cheerio.js instead of deprecated jsdom
This comment has been hidden.
Attempting implementation without understanding the requirements is a no-win situation. If something is unclear, post what it is so I can try to clarify.
Lol, thanks for the link. I'm just not really sure how to make the tests work with Ruby. They pass if I replace the "leaderboard" local variable with my Scraper class in the tests... my code properly scrapes the top 500. I just don't know how to make it so that leaderboard.position returns anything. Even when I set them up as hashes in Ruby, I still can't get it to access the proper data via dot notation. Even tried requiring 'hash_dot'.
leaderboard
just points to thesolution
method.I know. But my scraper returns an array of Ruby objects (e.g.
[#<Scraper:0x007f9e8b9600b0 @name="myjinxin2015", @position=1, @clan="ä¸å\u009B½ é\u0095¿å\u009E£", @honor="114116">, #<Scraper:0x007f9e8b939348 @name="g964", @position=2, @clan="None", @honor="112800">, #<Scraper:0x007f9e8b8aadf0 @name="Voile", @position=3, @clan="Gensokyo", @honor="58788">...
] etc.I can't really think of how to get it so that calling on .position of the return value of my method gives me access to an array. If it was accessed via [:hash] or ['hash'] (or I knew how to scrape in JavaScript lol) I could work around it.
The answer to your problem is in the description and it's much simpler.
Sample tests for ruby is invalid. Test are failing because
open(url)
performs the actual network request, getting the most recent data and tests is based on old data. Also if you try to submit solution (by pressing attempt button) even if sample test failing, it will be accepted (if it working properly), this is very confusing.I think the solution should be to stub network requests or use hidden tests (i'm not sure however is it possible to hide the body of
process_data_sol
andbuild_leaderboard
from user until he solves this kata).There are comments in the sample tests that explain this.
Sample tests are static, there is no network request in the sample tests. I can't update them dynamically without exposing a solution to the user so the user must update them manually or ignore them and use the real tests(
Attempt
button). The real tests aren't static so they do check against current leaderboard data, allowing you to pass if your code is correct. You can look at the test fixtures after you've passed.IMO its better to comment all sample tests (to give an example how test could be implemented) and leave the decision to write them or not to the user as it done in most katas.
Still when you press
Run Sample Tests
the tests will not be meaningful so I shall take the path of least resistance. Plus, I'm also hesitant to republish and trigger revalidation/python 2/3 issues.IIRC changing the sample tests will not trigger re-validation. Only changing the actual tests does (which is why it's locked after 500 solves).
Though I guess you should also blame me for stirring up the top 5 leaderboards ;-)
Lol I literally only just figured out that they don't want actual Ruby objects (which doesn't make much sense). No idea how to make dot notation work for Ruby with hashes..
Can someone point me to a good online resource for using jsdom as required in this kata?
The current readme of the jsdom repo states:
From checking the commit history, here is the last readme that is still relevant for version of jsdom that Codewars uses, which is v9.12.0(it's listed as a dev dependency in
/runner/package.json
). From there, you can check theHuman Contact
section, specificallyjsdom.env
.Thank you! I'll dig into that and see how far I can get with it :-)
This comment has been hidden.
Sorry for the late response, CW didn't send a notification.
leaderboard
is not exposed toresolve()
's scope. You defineleaderboard
injsdom.env
's callback but callresolve()
outside of the callback. The color ofleaderboard
inresolve(leaderboard)
in the CW editor is white instead of blue.Another issue is that even if you define
leaderboard
in the same scope whereresolve()
is called, the promise will get resolved with the 'empty'leaderboard
beforejsdom.env
's callback has a chance to build it.Some references:
This comment has been hidden.
This comment has been hidden.
For my taste it seems too similar to be the focal point of another kata but looks appropriate for a kumite/fork of a solution. For example, this is what it takes in ruby to get a map of clans and the associated users.
This kata was published before the kata/authored leaderboards were added. Another idea is to factor in data from the kata/authored leaderboards for each user on the overall leaderboard, and maybe even throw in user ranks. Giving a user something like: honor.total, honor.kata, honor.completed. Then you'd have a little more data to make charts or graphs(D3.js kata? :D ) of clans' or individuals' kata/authored honor. Ex: pie charts of a user's honor make up ...or ALL the honor from the leaderboard, myjinxin2015 and g964 are probably 20% of it LOL.
Aside from those possibilities, getting and processing html in JS might be more involved than in Ruby/Python but it's what I intended to be the main component. Once someone knows these basics, they have a clearer path to an implementation of their ideas. I feel that I'd be stretching the material thin if I made a series of katas, I did it just for fun, not necessarily for points.
Those are some intriguing ideas. I was also thinking about writing something that would enable me to search through kata solutions for a given feature of the language, just to see how CW users have used that feature. I could look at all of my own past solutions, or choose a particular kata and search through everyone's solution for the feature in question.
In any event, this has really opened my eyes to how easy it is to scrape a page if you're not going cross-origin. Now it's up to me to learn the HTML DOM better, so that I'm not writing about 50 lines of code where I would only need about 5 :D
How can I handle the 1-based indexing? The second element of the position list has to be the first on the leaderboard, and the 500th element must be the 501st on the list but there can only be 500 in the list? I don't see any way around this
This is only true for lists/arrays. Here's a map of python data structures.
Yeah, I'm stuck now here, too. I didn't think it was going to be a problem beause simplying
shift
ing the first scraped element worked for the sample tests... but it doesn't work for the main tests. I'm passing all but the one test, which is the 500th user's name. it is in there, but it's not at the 500th index for reasons apostonaut metnioned...Good kata, really enjoyed it
JS: I have the same problem using an array, either have 501 length or fail the 500 user name. Any pointers?
I am also having this problem with javascript. I put an empty object at the
leaderboard#position[0]
position and I pass every test except the 500 length. Mine is 501. I am going to try to remove a random user and hope to cheat by the tests. However, I am wondering if there is something I am missing or not aware of that will allow me to conform to the desired results. Other than that, cool kata - I had no clue that libraries likecheerio
could be brought in.benjaminadk, the link I posted in my comment above is also relevant for JS and Ruby.
I think that comment is blocked from me. Sorry I am still getting familiar with how this site works.
unmarked it as spoiler
The link posted by 10XL is not relevant for JS so i wouldn't waste my time there.
Python:
How can I deal with characters in clans like
(╯°□°)╯︵ ┻━┻
except doing like the emoji suggests? :D
Use textContent or an equivalent.
I've specified Python, not javascript.
Are you using beautifulsoup?
yes
Alright, then just use beautifulsoup's equivalent of textContent to get the clan/name and don't do any processing on it.
that's not really the issue here. I get:
[UPDATE] SOLVED THIS ISSUE: better to not use print...
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
Hi 10XL,
I think there may be another issue with your tests:
"Founders & Coders" is how this particular clan name displays on the leaderboard page. This is preventing me from completing the kata.
Or is this intentional?
This comment has been hidden.
Great, thank you!
Thanks for this novel kata. For JavaScript, cheerio would be nice to have as an alternative to jsdom.
I agree, it would be nice to have but it's not installed, so we can't use it. You can see for yourself what happens when you try to load cheerio.
I really enjoyed this kata. Thank you =]
Wow, great Kata! I've learned a lot! Thanks :)
What data structure should the python code return?
The instructions say "Return a 'Leaderboard' object with a position property", so when I just did it I just created my own Leaderboard class for it to return.
I have done that now. Think returning a simple list of dicts would be better for python.
Really cool stuff, I hadn't looked at Nokogiri's API properly before so I'm really glad you made this!
Hi 10XL,
Any idea about the following error within #build_leaderboard? It appears to be on your end within the test suite.
This comment has been hidden.
Would it be possible to add parameter signatures for User (name [string], clan [string], honor [integer]) and Leaderboard (leaderboard [Hash[rank => User]]) initialization to the instructions? My use of an options hash in User#initialize rather than a sequence of parameters (name, clan, honor) caused the discrepancy with the test suite.
Fun challenge, nice job writing it.
This comment has been hidden.
Marking as resolved.
There are other cases too which don't work. My list contains exaxtly 500 entries (as long as the leaderboard), but some are not at same place compared with your list (it's difficult to recognize the exact places, because your object misses position entries visible for tests). One actual example is:
If i open the leaderboard i only see an entry of 1353, but no entry contains honor points of 1352 (what you expect). So it's not possible to pass the tests. By the way codewars sometimes shows older boards, some minutes later refreshed boards - it changes sometimes (hope you know what i mean)...
could you post your solution? The leaderboard object is generated at the start of the test but this can happen because of the volatility of the leaderboard.
This comment has been hidden.
lol... I was checking against
preview.codewars
leaderboard while giving a url to the plain leaderboard in the initial solution. The tests will use the normal url now.Ok, thanks for checking;-)! So later on i will look again into my ugly code (just weekend evening, so less time):-)... Marked it as resolved...
Ok, no great corrections were necessary, ugly code works too, sorry only just posted nearly same code:-). Little bit lazy today... Good kata;-)!
Thanks! :)
This comment has been hidden.
Hi.
I forgot to change the name of the expected users/leaderboard. It was being created with the user defined objects instead of mine. Now that error shouldn't happen as long as user/leaderboard objects don't share the same names, which I doubt will occur from now. Thanks!