Releases: unidoc/unipdf
v3.5.0
Version 3.5.0 adds initial support for rendering, allowing converting PDF to image formats. In addition, there are a few bug fixes with font/encodings and indexed colorspace.
Pull requests included:
- #273 Parse ttf encoding subtable 31 after subtable 10 (#273) (@adrg)
- #272 Add basic glyph metrics support for Type 0 CID fonts (#272) (@adrg)
- #266 Add basic image rendering support (#266) (@adrg)
- #259 Fixed PdfColorspaceSpecialIndexed.ImageToRGB() (#259) (@peterwilliams97)
- #257 Add PdfFont text encoding methods (#257) (@adrg)
v3.4.1
Version 3.4.1 contains an important fix to avoid infinite loop when parsing PDF files with no EOF marker. Also fix for inline image parsing.
Bugfixes
v3.4.0
Version 3.4.0 adds support for pre-defined CMaps which improves text extraction significantly for many languages, notably for ones using Asian fonts. In addition there are many fixes and improvements.
Pull requests included:
- #246 Add predefined CMaps for Type 0 composite fonts (@adrg)
- #247 Ignore optional inline image field parameters (@gunnsth)
- #238 Add tolerance for seeking EOF marker between read data buffers (@adrg)
- #233 Improve outline destination parsing (@adrg)
- #232 Extend buffer used for searching EOF marker (@adrg)
- #231 Follow object indirections in PdfPage.GetMediaBox (@samuel)
- #229 Improve outline extraction (@adrg)
- #228 Improve traversal of outline item nodes in the GetOutlinesFlattened method (@adrg)
- #216 Unify and optimize number parsing (@samuel, @gunnsth, @adrg)
- #225 Attempt decryption for invalid crypt filter dictionary type (@adrg)
- #224 Fix Chapter component SetShowNumbering method (@adrg)
- #223 Use utf-16 encoding for serializing outline item titles (@adrg)
v3.3.1
Version 3.3.1 contains a few notable fixes and improvements.
Bug fixes and enhancements
- #215 Prevent recursive parsing of invalid circular outlines (@adrg)
- #203 Add basic support for UTF-16 text encodings (#203) (@adrg)
- #202 Preserve TOC line components style properties when setting links (#202) (@adrg)
- #199 Add NewPdfFontFromTTF(io.ReadSeeker) function. (#199) (@gabriel-vasile)
- #198 Make the Finalize method of the creator public (#198) (@adrg)
- #196 Prevent extractor panic for invalid PDF text objects (#196) (@adrg)
- #194 Changes to make the lazy reader work on the PaperCut corpus (#194) (@peterwilliams97)
v3.3.0
Version 3.3.0 contains a few fixes as well as new features.
Bug fixes and enhancements
- #188 Allows overwriting default line columns (@inoda)
- #181 Improve text chunk component (@adrg)
- #176 Copy action of annotation (@becoded)
- #175 Parse signature certificate arrays on signature validation (@adrg)
- #173 Fix TOC page numbering for chapters containing tables (@adrg)
- #172 Prevent recursion when parsing outlines (@adrg)
v3.2.0
Version 3.2.0 highlights large image memory optimizations, action support as well as multiple fixes.
Bug fixes and enhancements
- #164 Fix panic when loading composite fonts (@adrg)
- #162 Add Travis CI integration (@adrg)
- #159 Take decode arrays into account when processing grayscale images (@adrg)
- #161 Becoded action support (@becoded)
- #148 Issue #144 Fix - JBIG2 - Changed integer variables types (@kucjac)
- #153 Make PageText.sortPosition() sort order deterministic. (@peterwilliams97)
- #156 Fix drawing creator pages with no rendered blocks (@adrg)
- #149 Image memory optimizations (@adrg)
- #146 Add extract images test case, with memory profiling (@adrg)
v3.1.1
Version 3.1.1 has significant performance improvements in lazy-loading and the creator package as well as multiple other enhancements. An example for tabular data extraction has also been introduced in unipdf-examples: pdf_to_csv.go extracts tabular data and outputs in CSV format.
Bug fixes and enhancements
- #139 Append function to TextMarkArray for processing and grouping text (@gunnsth)
- #138 Table styled paragraph links (#138) (@adrg)
- #133 Inherit rotation when creating block from page using the creator (#133) (@adrg)
- #136 Creator optimize drawing blocks (#136) (@adrg)
- #135 Check for missing resource dict when generating field appearance (#135) (@adrg)
- #131 Lazy loading improvements (#131) (@gunnsth)
- #128 Add option to generate appearance dicts when filling form fields (#128) (@adrg)
v3.1.0
Version 3.1.0 adds decoding support for JBIG2 decoding as well as vectorized text extraction. In addition, there are various bug fixes and enhancements.
Example using vectorized text to mark up text locations: pdf_text_locations.go.
JBIG2Decoder can be demonstrated for example by extracting images from PDFs that utilize JBIG2: pdf_extract_images.go.
New features
- #67 JBIG2Decoder implementation (#67) (@kucjac)
- #109 Finding bounding boxes of substrings of extracted text. (#109) (@peterwilliams97)
Bug fixes and enhancements
- #118 Optimize: Use original if smaller than "compressed" (#118) (@gunnsth)
- #117 Avoid unnecessary allocations when converting gray scale image to RGB (#117) (@adrg)
- #115 Resolve references when adding page to writer from a lazy reader (#115) (@adrg)
- #110 Resolve page parents when adding page to writer (#110) (@adrg)
- #106 Allow adding an external outline tree to the creator (#106) (@adrg)
- #103 Fixed CalRGB -> RRB image conversion. (#103) (@peterwilliams97)
- #104 Add resources of blocks created from pages to the output page resources (@adrg)
v3.0.3
Bug fixes
- #101 Skip invalid outline nodes (#101) (@adrg)
- #100 Update go dep file (#100) (@adrg)
- #93 Fix annotation flatten when AcroForm does not exist (#93) (@adrg)
- #98 Add FDF merge test case for form filling and flattening with change detection (#98) (@gunnsth)
- #97 Resolve page Resources references on writer page add, if page reader is lazy (#97) (@adrg)
- #91 Attempt to parse invalid beginning lines of xref table subsections. (@adrg)
- #90 Skip invalid Metadata stream in form field Kids array (@adrg)
- #89 Handle improper usage of the array ending marker (@adrg)
v3.0.2
Bug fixes
- #86 Parse EOF markers missing the F character (#86) (@adrg)
- #85 Attempt to identify Pages nodes without Type (#85) (@adrg)
- #84 Skip loading outlines on invalid outline root node (#84) (@adrg)
- #77 Take references with negative object numbers into account (#77) (@adrg)
- #80 Consider files not encrypted when Encrypt object is null (#80) (@adrg)
- #78 Fix page resources not being loaded from parent nodes (#78) (@adrg)
- #73 Fix parsing names with containing the # character (#73) (@adrg)
- #72 Check for empty encoded byte buffer on Flate decode (@adrg)