Xpdf
Posts  1 - 1  of  1
davidbenpinchas
I am using xpdf code to extract text from pdf files, and I am facing problems with some of the files.
1) In some files, numbers are not extracted. When I copy the text from pdf to MS Word, there are squares instead of digit characters. I can see the digits if I change font in MS Word to Adobe Jenson Pro, Adobe Caslon Pro, or Adobe Caslon Pro Bold. It seems like these fonts are not supported by xpdf.
Also, "Th" is not extracted. E.g. instead of "The", " e" is extracted. If I copy the text to MS Word, "> e" is displayed.
2) In other files, the text is extracted in some places without blank spaces, like
"The impact of John F. Kennedy's assassination caused memories to remainattheforefrontofeveryKennedyDetailagent'smindastheycontinued on with their careers and responsibilities."

I would be very grateful if somebody could help me solve at least some of these problems. If it is impossible, please just let me know it, too.

Thank you very much.
Save
Cancel
Reply
 
x
OK