CAPTCHAFORUM
Administrator
he ruCaptcha.com service solves, as already mentioned, a wide variety of captchas , from reCaptcha of all types and versions to keyCaptcha , hCaptcha and FunCaptcha ; As a basis for the experiment, we will take the most probably currently popular solution on the web - reCaptcha v.2 , and this is the official demo from Google:
URL_RECAPTCHA = 'https://www.google.com/recaptcha/api2/demo'
By the way, instead of google.com, we could easily substitute the address of a page, for example, my blog, written on the basis of the Ruby on Rails framework , where reCaptcha works through the ambethia / recaptcha jam ; everything will turn out the same. I will not say that it is always like this: for example, reCaptcha on the pages of the blog that you are currently leafing through is called by the scripts of K2, a Joomla component , with which everything is probably not so unambiguous. But more often than not, the described scenario will work, which we will now check.
We read the ruCaptcha.com API documentation : so, first of all, we need the data-sitekey value on the captcha page. Ok, let's parse HTML and quickly find what we need:
Code:
def data_sitekey
# Parsing url and getting a data-sitekey of recaptcha
url = URL_RECAPTCHA
html = open(url)
doc = Nokogiri::HTML(html)
doc.xpath('//@data-sitekey')
end
Let me explain for those who are just making their first steps in OOP, object-oriented programming: now a call to the data_sitekey method gives us the desired data-sitekey value , which you can find yourself in the source code of the page containing reCaptcha . It's not difficult at all, huh? - yes, this is the magic of Nokogiri .
Now, having received the data-sitekey reCaptcha , we can already form the first request to the ruCaptcha.com API , as a response to which we will receive the identifier of the task set for the RuCaptcha.com service ...
Code:
def first_request
target = 'https://rucaptcha.com/in.php'
params = {
key: APIKEY,
method: 'userrecaptcha',
googlekey: data_sitekey,
pageurl: URL_RECAPTCHA
}
request(target, params)
end
... which we implement with another method, called by me request , to which we pass target and params by calling . The composition of params , I think, is now quite transparent: the access key issued by the service during registration, the method we are accessing (described in the documentation), the data-sitekey value just received, and the address of the captcha page.
Something like that:
Code:
def request(target, params)
uri = URI.parse(target)
uri.query = URI.encode_www_form(params)
uri.open.read
end
Having received the ruCaptcha.com API response , containing the ID and thus confirming that the task has been received and accepted for work, we wait 10 seconds and send the second request; if the answer already received for it does not include OK (most often CAPCHA_NOT_READY is returned, meaning that you need to wait a little longer), we repeat it with the same interval over and over again until the token we are looking for is finally returned:
Code:
target = 'https://rucaptcha.com/res.php'
params = {
key: APIKEY,
action: 'get',
id: answer.gsub('OK|', '')
}
1.times do
begin
sleep 10
request = request(target, params)
raise unless request.include? 'OK'
rescue StandardError
retry
end
end
What to do with the received answer? - hmm, well, it's a matter of taste. Alternatively, you can substitute it in the field hidden on the page with the id g-recaptcha-response , which is what the reCaptcha solution implies . But only this is a completely, completely different story ... go to the light, we will continue.
Documentation https://github.com/cmirnow/rucaptcha-client