PDA

View Full Version : Converting PDF to .txt File


badben
7th December 2005, 21:14
Does anybody know if it is possible to convert a pdf file to a plain text file using php so that my site search engine can index it?

I can't seem to find anything.

falko
7th December 2005, 21:24
You can use xpdf to extract text from PDF files (pdftotext). http://www.foolabs.com/xpdf/

till
7th December 2005, 21:38
Or you use pdf2ps (http://www.csit.fsu.edu/~burkardt/g_src/pdf2ps/pdf2ps.html) to convert the pdf to ps file and then ps2ascii to extract the text (http://annys.eines.info/cgi-bin/man/man2html?ps2ascii+1).

sbovisjb1
25th March 2006, 17:10
Well idk if this would work... oh well
# Ex: matches [ -q ] string globpattern
# Does $1 match the glob expr $2 ?
# -q flag = set return status to 0 (true) or 1 (false)
# no -q flag = echo "1" (true) or "0" (false)
# Unfortunately, the return status is opposite from the echo'ed string
globmatches () {
if [ $1 = "-q" ]; then
shift
case "$1" in
$2 ) true ;;
* ) false ;;
esac
else
case "$1" in
$2 ) echo 1 ; true ;;
* ) echo 0 ; false ;;
esac
fi
}

if globmatches -q $file "*.txt" ; then
echo "Found a txt file"
elif globmatches -q $file "*pdf" ; then
echo "Found a pdf file"
if