Add new comment
|
I use Gimp 2.4.7, Acrobat Reader 9 for Linux. I had to convert a scanned pdf document to text so your howto help me a lot. Thanks for that. Just a couple of things I made a little diferent: 1. I selected and copied the text image from Acrobat in a new image from Gimp that I previously open with Advanced Option where I specified Grayscale and 200ppi. 2. I Edit > Paste and then scaled the Layer with Layer > Scale Layer until I could see very well the letters and I cleaned the picture out of dots, etc as you recommended. 3. I followed your direction in Image > Mode >Index but I didn't do the threshold thing, just Color > Brightness & Contrast. There I put Brightness at -127 and Contrast al 127 (of course this depends on how good you can see the image). 4. File > Save as... Here I put .tif at the end of my chosen name. Here it ask me to Export and you have to choose Flatten Image. 5. Then as you indicated: $ tesseract filename.tif result I
Reply |



just want to thank you again. 

Recent comments
2 hours 11 min ago
3 hours 5 min ago
6 hours 27 min ago
11 hours 47 min ago
12 hours 11 min ago
19 hours 52 min ago
20 hours 4 min ago
20 hours 22 min ago
22 hours 10 min ago
1 day 31 min ago