-
Notifications
You must be signed in to change notification settings - Fork 691
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #936 from jsvine/develop
v0.10.0
- Loading branch information
Showing
29 changed files
with
516 additions
and
203 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Colors | ||
|
||
In the PDF specification, as well as in `pdfplumber`, most graphical objects can have two color attributes: | ||
|
||
- `stroking_color`: The color of the object's outline | ||
- `non_stroking_color`: The color of the object's interior, or "fill" | ||
|
||
In the PDF specification, colors have both a "color space" and a "color value". | ||
|
||
## Color Spaces | ||
|
||
Valid color spaces are grouped into three categories: | ||
|
||
- Device color spaces | ||
- `DeviceGray` | ||
- `DeviceRGB` | ||
- `DeviceCMYK` | ||
- CIE-based color spaces | ||
- `CalGray` | ||
- `CalRGB` | ||
- `Lab` | ||
- `ICCBased` | ||
- Special color spaces | ||
- `Indexed` | ||
- `Pattern` | ||
- `Separation` | ||
- `DeviceN` | ||
|
||
To read more about the differences between those color spaces, see section 4.5 [here](https://ghostscript.com/~robin/pdf_reference17.pdf). | ||
|
||
`pdfplumber` aims to expose those color spaces as `scs` (stroking color space) and `ncs` (non-stroking color space), represented as a __string__. | ||
|
||
__Caveat__: The only information `pdfplumber` can __currently__ expose is the non-stroking color space for `char` objects. The rest (stroking color space for `char` objects and either color space for the other types of objects) will require a pull request to `pdfminer.six`. | ||
|
||
## Color Values | ||
|
||
The color value determines *what specific color* in the color space should be used. With the exception of the "special color spaces," these color values are specified as a series of numbers. For `DeviceRGB`, for example, the color values are three numbers, representing the intensities of red, green, and blue. | ||
|
||
In `pdfplumber`, those color values are exposed as `stroking_color` and `non_stroking_color`, represented as a __tuple of numbers__. | ||
|
||
The pattern specified by the `Pattern` color space is exposed via the `non_stroking_pattern` and `stroking_pattern` attributes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Repairing Malformed PDFs | ||
|
||
Many parsing issues can be traced back to malformed PDFs. | ||
|
||
Malformed PDFs can often be [fixed via Ghostscript](https://superuser.com/questions/278562/how-can-i-fix-repair-a-corrupted-pdf-file). | ||
|
||
`pdfplumber` lets you automatically run those repairs, in several ways: | ||
|
||
- `pdfplumber.open(..., repair=True)` will repair your PDF on the fly (but not save the repaired version to disk). | ||
- `pdfplumber.repair(path_to_pdf)` will return a `BytesIO` object holding the bytes of a repaired version of the original file. | ||
- `pdfplumber.repair(path_to_pdf, outfile="path/to/repaired.pdf")` will write a repaired version of the original file to the indicated `outfile` path. |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
version_info = (0, 9, 0) | ||
version_info = (0, 10, 0) | ||
__version__ = ".".join(map(str, version_info)) |
Oops, something went wrong.