Simple Captcha Solving

VRSnik

New member
#1
I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:



I tried to the remove the noisy dots with some filters:
Code:
import cv2
import numpy as np
import pytesseract

img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:



Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?

Final Update:

As @Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.

And here are the new resulting images: