VRSnik
New member
I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As @Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named
And here are the new resulting images:
![](https://i.stack.imgur.com/Bojlj.png)
![](https://i.stack.imgur.com/Fs7Hz.png)
![](https://i.stack.imgur.com/YcJLY.png)
![](https://i.stack.imgur.com/Q1Tm1.png)
I tried to the remove the noisy dots with some filters:
Code:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
![](https://i.stack.imgur.com/XKg2q.png)
![](https://i.stack.imgur.com/eLNxR.png)
![](https://i.stack.imgur.com/uqJA6.png)
![](https://i.stack.imgur.com/qAus2.png)
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As @Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named
connectedComponentsWithStats
, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.And here are the new resulting images:
![](https://i.stack.imgur.com/katT8.png)
![](https://i.stack.imgur.com/Xcb0u.png)
![](https://i.stack.imgur.com/DgmsE.png)
![](https://i.stack.imgur.com/eJVTI.png)