-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tutorials for Amazon Rekognition #3290
base: main
Are you sure you want to change the base?
Add Tutorials for Amazon Rekognition #3290
Conversation
Signed-off-by: Mingshi Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool, left some minor comments on wording.
{ | ||
"parameters": { | ||
"response_filter": "$.TextDetections.*.DetectedText", | ||
"image_bytes": "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if you added a mini explanation of what the image was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [2] Invokes the Amazon Rekognition DetectText API providing the image_bytes parameter. | ||
- [3] Extracts values from the DetectText API response with JSON path. | ||
- [4] Inserts the extracted values into the text field. | ||
- [5] Removes the original image field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes original image field
may confuse users to think ML Inference Processor can do that. Maybe refactoring to
Step 6: Create Ingest pipeline
Explanation that the pipeline has two processors:
- ML Inference processor:
- Extracts values from the image field and passes the values to the image_bytes parameter.
- Invokes the Amazon Rekognition DetectText API providing the image_bytes parameter.
- Extracts values from the DetectText API response with JSON path.
- Inserts the extracted values from the API into a new field within the same document.
- Remove processor:
- Removes the base64 string field in the original document for clarity
For an idea
Sample response: | ||
```json | ||
{ | ||
"connector_id": "o52l5pMB6Ebhud5_ypxu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I look at other tutorials within the same folder Im seeing they follow the format of showing the connector id and model id. Maybe we should make them placeholders? such as your_connector_id
What do you think?
|
||
## Detect text with DetectText API | ||
|
||
This tutorial demonstrates how to create an OpenSearch connector for an Amazon Rekognition model that detects text in images using the DetectText API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wrote that this is for the blueprint but also added the ML Inference processor. Maybe add that here too? Other tutorials just mention how to deploy and then run inference so this important info could get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. will add this part
{ | ||
"persistent": { | ||
"plugins.ml_commons.trusted_connector_endpoints_regex": [ | ||
"^https://rekognition\\..*[a-z0-9-]\\.amazonaws\\.com$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you cut a PR to add this to trusted URL setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding to my queue. raised an issue to track this #3308
"session_token": "your_session_token" | ||
}, | ||
"parameters": { | ||
"region": "us-west-2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"region": "us-west-2", | |
"region": "your_aws_region_like_us-west-2", |
POST _plugins/_ml/models/pp2n5pMB6Ebhud5_oJwF/_predict | ||
{ | ||
"parameters": { | ||
"response_filter": "$.TextDetections.*.DetectedText", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not add response_filter
to show the raw model response first. Then introduce how to use response_filter
to filter target parts.
"Platform", | ||
"Search", | ||
"for", | ||
"anything" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These four results are words detected. They are duplicate with the first two results which are detected lines. Can we just return detected lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the the documentation example and it looks like the API returns lines and words lines represent how we read it (search for anything) and the words are individual parts (search, for, anything).
I asked chatgpt if it were possible to filter (using only LINE) on the fly not sure if this works but worth a try $..TextDetections[?(@.Type=='LINE')].DetectedText
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rekognition will detect the word or line of text recognized. Even though the words seems duplicated, the individual words can be used for tokenized word in search, they can be handy.
We can prefer keeping it this way, and suggest users to refer to rekognition API if they would like to do further filtering.
Description
Add Tutorials for Amazon Rekognition
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.