Here is a bunch of tools that may help in some cases. Roughly, the idea is to from the PDF file, extract the current glyph<->Unicode mapping (if present), and the runs of text present (as character ...
In some cases a font may lack a glyph that is essential for its use in your application. Arabic fonts present special issues here, because the shape of the glyph depends not only on its position in ...