Inconsistent table detection due to colours / layout #1119
jameskohjunwei
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
Happy to hear that |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey @jsvine ,
Thank you for creating this library, i've been a user for a while - here to ask a specific question about this pdf i came accross.
I've got a PDF where rows are visually segmented by color (a faint purple every alternate row). I assume pdfplumber detects this color to identify rows for extraction. The issue arises when some pages start without this purple row segmentation - in those cases, detection is missed, and the rows don't get extracted. Is there a solution to this without explicitly hardcoding horizontal lines?
PDF here jpy_statement1-redacted_removed.pdf
Any assistance would be greatly appreciated. Thank you!
Image 1: example of rows starting without purple colour row and gets missed out in the detection.
Image 2: example of row starting WITH purple colour row and gets picked up successfully.
My code below:
Beta Was this translation helpful? Give feedback.
All reactions