-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ospp/new llm extract text #19725
base: main
Are you sure you want to change the base?
Ospp/new llm extract text #19725
Conversation
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨Explore these optional code suggestions:
|
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #18664
What this PR does / why we need it:
As part of our document LLM support, we are introducing the
LLM_EXTRACT_TEXT
function. This function extracts text from PDF files and writes the extracted text to a specified text file, extractor type can be specified by the third argument.Usage:
llm_extract_text(<input PDF datalink>, <output text file datalink>, <extractor type string>);
Return Value: A boolean indicating whether the extraction and writing process was successful.
Note:
Example SQL:
Example return:
PR Type
Enhancement, Tests
Description
LLMExtractText
function to extract text from PDF files and write it to a specified text file.LLMExtractText
, including both valid and invalid test cases.github.com/ledongthuc/pdf
package for PDF processing.Changes walkthrough 📝
func_llm.go
Implement `LLMExtractText` function for PDF text extraction
pkg/sql/plan/function/func_llm.go
LLMExtractText
function to extract text from PDF files.pdf
package for reading PDF content.function_id.go
Register `LLM_EXTRACT_TEXT` function ID
pkg/sql/plan/function/function_id.go
LLM_EXTRACT_TEXT
function ID.list_builtIn.go
Add `LLM_EXTRACT_TEXT` to built-in functions
pkg/sql/plan/function/list_builtIn.go
LLM_EXTRACT_TEXT
to supported built-in functions.func_llm_test.go
Add unit tests for `LLMExtractText` function
pkg/sql/plan/function/func_llm_test.go
LLMExtractText
function.testify
for assertions.func_llm_extract_file.result
Add test results for `LLMExtractText`
test/distributed/cases/function/func_llm_extract_file.result
LLMExtractText
function.func_llm_extract_file.sql
Add SQL test cases for `LLMExtractText`
test/distributed/cases/function/func_llm_extract_file.sql
LLMExtractText
function.go.mod
Add PDF package dependency
go.mod
github.com/ledongthuc/pdf
dependency.go.sum
Update checksum for PDF package
go.sum
github.com/ledongthuc/pdf
.