Does anybody know if it is possible to convert a pdf file to a plain text file using php so that my site search engine can index it? I can't seem to find anything.
Or you use pdf2ps (http://www.csit.fsu.edu/~burkardt/g_src/pdf2ps/pdf2ps.html) to convert the pdf to ps file and then ps2ascii to extract the text (http://annys.eines.info/cgi-bin/man/man2html?ps2ascii+1).
Here is a script to index the stuff Well idk if this would work... oh well # Ex: matches [ -q ] string globpattern # Does $1 match the glob expr $2 ? # -q flag = set return status to 0 (true) or 1 (false) # no -q flag = echo "1" (true) or "0" (false) # Unfortunately, the return status is opposite from the echo'ed string globmatches () { if [ $1 = "-q" ]; then shift case "$1" in $2 ) true ;; * ) false ;; esac else case "$1" in $2 ) echo 1 ; true ;; * ) echo 0 ; false ;; esac fi } if globmatches -q $file "*.txt" ; then echo "Found a txt file" elif globmatches -q $file "*pdf" ; then echo "Found a pdf file" if