How do I fine-tune the extraction of the table from this PDF? #448
-
Hello again, But I am still not sure how to fine-tune the table settings, when I don't get the "table" structure I want. I have this PDF: https://cjp-rbi-icis.s3.eu-west-1.amazonaws.com/wp-content/uploads/sites/7/2021/03/18154706/2021-ICIS-Publishing_v5.pdf I want the information from page 3 (as an example) If I use the "default" settings, only the "header": When we publish is selected as a table. If I use: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @jakobdo For the first case where you are using the extraction strategy as {
"vertical_strategy": "text",
"horizontal_strategy": "lines",
"keep_blank_chars": True
} but are failing to get the last row, you may use the
It will give you the result as You may find all the available table extraction settings at https://github.com/jsvine/pdfplumber#table-extraction-settings. Unfortunately, I have not been able to come up with a table extraction strategy that works for this table without using the |
Beta Was this translation helpful? Give feedback.
Hi @jakobdo For the first case where you are using the extraction strategy as
but are failing to get the last row, you may use the
intersection_y_tolerance
asIt will give you the result as
You may find all the available table extraction settings at https://github.com/jsvine/pdfplumber#table-extraction-settings.
Unfortunately, I have not been able to come up with a table extraction strategy that works for this table without using the
explicit_vertical_…