AssertionError: ('Unhandled', 6), only with some PDFs on some pages #994
Replies: 2 comments 3 replies
-
Thanks for flagging. Two questions:
|
Beta Was this translation helpful? Give feedback.
-
Hello jsvine, Thanks for your reply. I tried, but for the life of me, it will not find Ghostscript :/ Unfortunately, I cannot share a PDF because they are sensitive. I will try to make a dummy PDF, which I could share, but chances are slim, and that will take some time. If I get dummy PDFs, I can also try it on my personal PC, where I have admin rights. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to read the text in some PDFs. Usually PDFplumber works very well and this is my code for it:
temp_pdf = pdfplumber.open(path) temp_page = temp_pdf.pages[0] temp_content = temp_page.extract_words()
Now I'm running into an AssertionError: ('Unhandled', 6) on a subset of PDFs.
This happens with most functions, namely extract_words, extract_table, extract_tables, extract_text and with to_csv, to_dict, to_json.
The function 'im = temp_page.to_image()' works and shows a good image of the pdf! Everything else seems to fail :/
Only the first page is not readable, the other 5+ are working.
The pdf can be opened with Adobe ect. and the text can be copied without problems.
I also can't find any difference between working and non-functioning PDFs, all are created with the same software and have the same layout.
Maybe someone can help me how or why the pdf's are not working. What could I do with a "working" PDF to make it throw the same error?
Any help is welcome :)
Beta Was this translation helpful? Give feedback.
All reactions