Skip to content

This repository contains the source code of the Ballerina PDFBox library package

License

Notifications You must be signed in to change notification settings

DharshiBalasubramaniyam/module-pdfbox

 
 

Repository files navigation

module-pdfbox

This repository contains the source code of the Ballerina pdfbox library package

Build codecov GitHub Last Commit Github issues GraalVM Check

Overview

This module offers two core APIs: one for converting PDF documents into images and another for extracting text from PDF documents, providing efficient and versatile solutions for PDF processing.

Usage

Converting PDF documents into images

import xlibb/pdfbox;

public function main() returns error? {

    // Convert the PDF located at a file path into an array of Base64-encoded images.
    string[] base64ImagesForFilePath = check pdfbox:toImagesFromFile("C://path/to/file/myfile.pdf");

    // Convert the PDF available at a URL into an array of Base64-encoded images.
    string[] base64ImagesForURL = check pdfbox:toImagesFromURL("https://url/to/file/myfile.pdf");

    // Convert the PDF represented as a byte array into an array of Base64-encoded images.
    string[] base64ImagesForByteArr = check pdfbox:toImagesFromBytes([your, byte, array]);
}

Extracting text from PDF documents

import xlibb/pdfbox;

public function main() returns error? {

    // Extract text from the PDF located at a file path.
    string[] base64ImagesForFilePath = check pdfbox:toTextFromFile("C://path/to/file/myfile.pdf");

    // Extract text from the PDF available at a URL.
    string[] base64ImagesForURL = check pdfbox:toTextFromURL("https://url/to/file/myfile.pdf");

    // Extract text from the PDF represented as a byte array.
    string[] base64ImagesForByteArr = check pdfbox:toTextFromBytes([your, byte, array]);
}

Examples

The pdfbox library provides practical examples illustrating usage in various scenarios. Explore these examples, covering the following use cases:

  1. PDF to text.

  2. PDF-JSON Converter.

Build from the source

Setting up the prerequisites

  1. Download and install Java SE Development Kit (JDK) version 17. You can download it from either of the following sources:

    Note: After installation, remember to set the JAVA_HOME environment variable to the directory where JDK was installed.

  2. Download and install Ballerina Swan Lake.

  3. Download and install Docker.

    Note: Ensure that the Docker daemon is running before executing any tests.

  4. Export Github Personal access token with read package permissions as follows,

    export packageUser=<Username>
    export packagePAT=<Personal access token>

Build options

Execute the commands below to build from the source.

  1. To build the package:

    ./gradlew clean build
  2. To run the tests:

    ./gradlew clean test
  3. To build the without the tests:

    ./gradlew clean build -x test
  4. To run tests against different environments:

    ./gradlew clean test -Pgroups=<Comma separated groups/test cases>
  5. To debug the package with a remote debugger:

    ./gradlew clean build -Pdebug=<port>
  6. To debug with the Ballerina language:

    ./gradlew clean build -PbalJavaDebug=<port>
  7. Publish the generated artifacts to the local Ballerina Central repository:

    ./gradlew clean build -PpublishToLocalCentral=true
  8. Publish the generated artifacts to the Ballerina Central repository:

    ./gradlew clean build -PpublishToCentral=true

Contribute to Ballerina

As an open-source project, Ballerina welcomes contributions from the community.

For more information, go to the contribution guidelines.

Code of conduct

All the contributors are encouraged to read the Ballerina Code of Conduct.

About

This repository contains the source code of the Ballerina PDFBox library package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ballerina 53.1%
  • Java 46.4%
  • Dockerfile 0.5%