Add new comment
|
I use Gimp 2.4.7, Acrobat Reader 9 for Linux. I had to convert a scanned pdf document to text so your howto help me a lot. Thanks for that. Just a couple of things I made a little diferent: 1. I selected and copied the text image from Acrobat in a new image from Gimp that I previously open with Advanced Option where I specified Grayscale and 200ppi. 2. I Edit > Paste and then scaled the Layer with Layer > Scale Layer until I could see very well the letters and I cleaned the picture out of dots, etc as you recommended. 3. I followed your direction in Image > Mode >Index but I didn't do the threshold thing, just Color > Brightness & Contrast. There I put Brightness at -127 and Contrast al 127 (of course this depends on how good you can see the image). 4. File > Save as... Here I put .tif at the end of my chosen name. Here it ask me to Export and you have to choose Flatten Image. 5. Then as you indicated: $ tesseract filename.tif result I
Reply |




just want to thank you again. 
Recent comments
1 hour 36 min ago
3 hours 19 min ago
5 hours 44 min ago
5 hours 51 min ago
10 hours 25 min ago
12 hours 25 min ago
15 hours 49 min ago
17 hours 55 min ago
18 hours 6 min ago
20 hours 33 min ago