Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8858]Support ShowRecordLocationProcedure #12599

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

houyuting
Copy link
Contributor

Change Logs

Add show record location procedure feature , The pr is aim to locate the bucket id and then the hdfs data file through the record key, it is faster compared to select ${record_key} from table.

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@houyuting houyuting changed the title Support ShowRecordLocationProcedure [MINOR]Support ShowRecordLocationProcedure Jan 8, 2025
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 8, 2025
@houyuting houyuting marked this pull request as ready for review January 8, 2025 04:21
@hudi-bot
Copy link

hudi-bot commented Jan 8, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yuzhaojing
Copy link
Contributor

Thanks for your contribution, left some comments.

import scala.collection.JavaConverters._
import scala.util.{Failure, Success, Try}

class ShowRecordLocationProcedure extends BaseProcedure with ProcedureBuilder with Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this procedure only applicable to the bucket index? Can we extend it to all index types? Alternatively, we can add the verification of the bucket index in this pull request (PR) and deal with the remaining index types directly in subsequent pull requests.

private val PARAMETERS = Array[ProcedureParameter](
ProcedureParameter.required(0, "table", DataTypes.StringType),
ProcedureParameter.required(1, "partition_path", DataTypes.StringType),
ProcedureParameter.required(2, "record_key", DataTypes.StringType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it support multiple record keys?

@danny0405
Copy link
Contributor

The task is not minor, can you create a JIRA task to trace it.

@TheR1sing3un
Copy link
Member

TheR1sing3un commented Jan 9, 2025

Nice work!
But I think if we could implement a more general procedure instead of just using it for bucket indexes? Instead of passing in the record_key, we pass in the primary key/index columns and then we tag the table based on its own index to find the corresponding file group or even file name.
Thanks for your contribution, this procedure can be combined with cli tools later, which can greatly improve our daily troubleshooting efficiency.

@houyuting houyuting changed the title [MINOR]Support ShowRecordLocationProcedure [HUDI-8858]Support ShowRecordLocationProcedure Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants