Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32946 Capture and report lookahead timings for hash distributor #19272

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Nov 5, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

github-actions bot commented Nov 5, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32946

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@shamser shamser marked this pull request as draft November 6, 2024 12:11
@shamser shamser marked this pull request as ready for review November 6, 2024 12:15
@shamser shamser requested a review from jakesmith November 6, 2024 12:15
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right, the stream it's counting lookahead time may be the input that you want to count look ahead time, but createHashDistributor could be distributing anything, it's a general utility.

It would better to count the time it spent reading it's input, then let the calling activity gather it's time if it needed to.

@shamser shamser force-pushed the issue32946 branch 3 times, most recently from 49a68f7 to 683cc20 Compare November 13, 2024 11:31
@shamser shamser changed the title HPCC-32946 Capture and report lookahead timings for hash distribute HPCC-32946 Capture and report lookahead timings for hash distributer Nov 13, 2024
@shamser shamser requested a review from jakesmith November 13, 2024 11:36
@shamser
Copy link
Contributor Author

shamser commented Nov 13, 2024

@jakesmith None of the code from the earlier PR remains, so I haven't retained the earlier commit in this PR.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments

thorlcr/activities/diskread/thdiskreadslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/fetch/thfetchslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/hashdistrib/thhashdistribslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/hashdistrib/thhashdistribslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/hashdistrib/thhashdistribslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/indexread/thindexreadslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/lookupjoin/thlookupjoinslave.cpp Outdated Show resolved Hide resolved
@jakesmith jakesmith changed the title HPCC-32946 Capture and report lookahead timings for hash distributer HPCC-32946 Capture and report lookahead timings for hash distributor Dec 6, 2024
@shamser shamser requested a review from jakesmith December 9, 2024 15:07
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good. Please squash

@shamser shamser force-pushed the issue32946 branch 2 times, most recently from 509e2b5 to d72519f Compare December 20, 2024 10:16
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser
As far as I can see distributor->queryLookAheadCycles() (and rhsDistributor) is never merged into the slaveTimerStats.
I think it means the local time will be adjusted correctly, but they will not be reported for the activity.
I am not sure of the best solution. Should local stats be calculated inside gatherActiveStats and should that sync distributed lookup stats before calculating?
@jakesmith any thoughts, especially as a general pattern.

@jakesmith
Copy link
Member

@shamser As far as I can see distributor->queryLookAheadCycles() (and rhsDistributor) is never merged into the slaveTimerStats. I think it means the local time will be adjusted correctly, but they will not be reported for the activity. I am not sure of the best solution. Should local stats be calculated inside gatherActiveStats and should that sync distributed lookup stats before calculating? @jakesmith any thoughts, especially as a general pattern.

I see what you mean, the time is not folded into slaveTimerStats's ActivityTimeAccumulator::lookAheadCycles.
That is also true of totalCycles I think (e.g. CThorStrandedActivity::queryTotalCycles() const).

but they will not be reported for the activity

but the reporting (via CSlaveActivity::serializeStats) uses the virtual (that calls through to the base COuputTiming::queryLookAheadCycles()), it doesn't read slaveTimerStats.lookAheadCycles directly.
What am I missing?

@shamser
Copy link
Contributor Author

shamser commented Jan 7, 2025

hat you mean, the time is not folded into slaveTimerStats's ActivityTimeAccumulator::lookAheadCycles. That is also true of totalCycles I think (e.g. CThorStrandedActivity::queryTotalCycles() const).

but they will not be reported for the activity

but the reporting (via CSlaveActivity::serializeStats) uses the virtual (that calls through to the base COuputTiming::queryLookAheadCycles()), it doesn't read slaveTimerStats.lookAheadCycles directly. What am I missing?

Also, the value of lookAheadCycles is needed to calculate local execute time (see CSlaveActivity::queryLocalCycles). If it's merged to slaveTimerStats, there will need to be a lot more refactoring and it will not be a trivial code change.

@ghalliday
Copy link
Member

PR https://github.com/hpcc-systems/HPCC-Platform/pull/19345/files changed that.

That change will need regressing as far as lookahead cycles goes for this to work.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash.

I think the clean way to implement this is

  • restore the code to add the lookahead stats from thorcommon
  • Add lookahead stats for distributors etc in gatherActiveStats() for the activity
  • After all stats have been gathered and added, calculate the localExecuteTime stat from serializedStats and the inputs.

That way all lookahead stats does not need special casing - all active stats are handled in the same way.

@shamser
Copy link
Contributor Author

shamser commented Jan 15, 2025

That way all lookahead stats does not need special casing - all active stats are handled in the same way.

There are other activities that also handles lookahead like in this PR, so I've
I've created a jira to do this separately: https://hpccsystems.atlassian.net/browse/HPCC-33241

@ghalliday ghalliday changed the base branch from master to candidate-9.10.x January 20, 2025 11:46
@ghalliday ghalliday merged commit b49e7e0 into hpcc-systems:candidate-9.10.x Jan 20, 2025
48 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants