Simple Captcha Solving

VRSnik · Oct 1, 2021

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:

I tried to the remove the noisy dots with some filters:

Code:

import cv2
import numpy as np
import pytesseract

img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))

Resulting tranformed images are:

Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?

Final Update:

As @Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.

And here are the new resulting images:

Simple Captcha Solving

VRSnik

New member