Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Minor update to DictionaryVector toString #12025

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peterenescu
Copy link

Summary:
Fixes error raised in Github issue 10594 (#10594)

Error details blindspot in toString function for lazy vectors where they are able to be logged; however they still show as unitialized as setInternalState was never executed.

I propose that function loadedVector be called to start the method as it will quit early for initialized vectors and serve virtually no additional overhead, while unloaded vectors will be properly initialized

Differential Revision: D67870420

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67870420

Copy link

netlify bot commented Jan 6, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 4c4d106
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/678197a95f3bc20008405fa4

@peterenescu peterenescu changed the title Minor update to DictionaryVector toString fix: minor update to DictionaryVector toString Jan 6, 2025
@peterenescu peterenescu changed the title fix: minor update to DictionaryVector toString fix: Minor update to DictionaryVector toString Jan 6, 2025
Comment on lines 4960 to 4963
// log vector
for (const auto& child : rowVector->children()) {
LOG(INFO) << child->toString(/*recursive=*/true) << std::endl;
}
for (vector_size_t i = 0; i < rowVector->size(); ++i) {
LOG(INFO) << rowVector->toString(i) << std::endl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this code from an existing utility method, so it calls toString of both child and rowVector. In your unit test, you probably only need one of these loops to trigger the error.

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 7, 2025
Summary:

Fixes error raised in Github issue 10594 (facebookincubator#10594)

Error details blindspot in toString function for lazy vectors where they are able to be logged; however they still show as unitialized as setInternalState was never executed.

I propose that function loadedVector be called to start the method as it will quit early for initialized vectors and serve virtually no additional overhead, while unloaded vectors will be properly initialized

Differential Revision: D67870420
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67870420

rowVector->toString(i);
}

// verify outer dictionary layers are properly initialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: verify -> Verify. Also, please add a period at the end of the comment.

@@ -186,6 +186,7 @@ class DictionaryVector : public SimpleVector<T> {
}

std::string toString(vector_size_t index) const override {
loadedVector(); // load lazy vector if needed
Copy link
Contributor

@kagamiori kagamiori Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please move the comment to a separate line above the code and capitalize the first character and add a period at the end (because comments are supposed to be sentences).

@bikramSingh91, could you take a look too? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterenescu while this would work, it will have the side effect of loading the vector which can be a cost when used in production code or can change the conditions of a unit test rendering it ineffective. In that case, we would have to make sure that any test that might rely on these conditions retains them if it is also using toString() in any capacity (like in ExpressionVerifier).
I discussed this offline with Wei, and we agreed that it would be better to return an exception if toString() is called on an unloaded vector. Since the primary use case is to use toString() for debugging, this would make it obvious to the user what is happening here instead of silently loading the vector.

@@ -4942,6 +4942,30 @@ TEST_F(ExprTest, disableMemoization) {
ASSERT_EQ(stats["plus"].numProcessedRows, 3 * flatSize);
}

TEST_F(
ExprTest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the change is limited to DictionaryVector and the effects can be easily testes via its API, testing it via expression eval would be an overkill. I would recommend writing a test in VectorTest instead

@@ -186,6 +186,7 @@ class DictionaryVector : public SimpleVector<T> {
}

std::string toString(vector_size_t index) const override {
loadedVector(); // load lazy vector if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterenescu while this would work, it will have the side effect of loading the vector which can be a cost when used in production code or can change the conditions of a unit test rendering it ineffective. In that case, we would have to make sure that any test that might rely on these conditions retains them if it is also using toString() in any capacity (like in ExpressionVerifier).
I discussed this offline with Wei, and we agreed that it would be better to return an exception if toString() is called on an unloaded vector. Since the primary use case is to use toString() for debugging, this would make it obvious to the user what is happening here instead of silently loading the vector.

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 10, 2025
Summary:

Fixes error raised in Github issue 10594 (facebookincubator#10594)

Error details blindspot in toString function for lazy vectors where they are able to be logged; however they still show as unitialized as setInternalState was never executed.

I propose that function loadedVector be called to start the method as it will quit early for initialized vectors and serve virtually no additional overhead, while unloaded vectors will be properly initialized

Differential Revision: D67870420
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67870420

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 10, 2025
…ng uninitialized lazy loaded vectors (facebookincubator#12025)

Summary:
Problem: There exists a blindspot in DictionaryVector's toString function, where lazy-loaded or otherwise unitialized vectors will be properly logged when calling toString, but may fail downstream elsewhere like during an expression eval because, although the logging suggests it was loaded, it was never properly intialized.

Solution: After discussing with Wei and Bikram, we should add logic to confirm the vector is loaded before and error out with a detailed error message before any string creating done. This will notify the user the vector is not-loaded and should be properly loaded before any logging is done.

Added additional test function to VectorTest class to cover this case.

(fixes facebookincubator#10594)

Differential Revision: D67870420
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67870420

…ng uninitialized lazy loaded vectors (facebookincubator#12025)

Summary:
Problem: There exists a blindspot in DictionaryVector's toString function, where lazy-loaded or otherwise unitialized vectors will be properly logged when calling toString, but may fail downstream elsewhere like during an expression eval because, although the logging suggests it was loaded, it was never properly initialized.

Solution: After discussing with Wei and Bikram, we should add logic to confirm the vector is loaded before and error out with a detailed error message before any string creating done. This will notify the user the vector is not-loaded and should be properly loaded before any logging is done.

Added additional test function to VectorTest class to cover this case.

(fixes facebookincubator#10594)

Differential Revision: D67870420
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67870420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants