-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CPU path for from_utc_timestamp function with timezone #9689
Conversation
Signed-off-by: Ferdinand Xu <[email protected]>
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/TimeZoneDB.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
build |
Per discussed, introduced a configuration to hide CPU based kernel. @NVnavkumar please help take a look. Thanks! |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
|
||
if (timezone != utc) { | ||
willNotWorkOnGpu("only timezones equivalent to UTC are supported") | ||
if (TimeZoneDB.isSupportedTimezone(timezoneShortID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal Spark config should actually be wired into here. The logic should be:
If the timezone is UTC -> Always stay on the GPU (no-op)
If the timezone is not UTC -> if the config is set to true, use the CPU POC as long as the timezone is supported, otherwise fallback to CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal Spark config (spark.rapids.test.CPU.timezone
) was wired inside isSupportedTimezone method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's better to wire the config directly here. I find that putting it in the logic of isSupportedTimezone a bit confusing to understand, even though it might make the migration process simpler in a way. It's not actually hard to remove the config when we migrate to GPU timezone DB. Plus I'm not sure depending on the review if that will make it in 23.12, so I think we should stay safe with this code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please take a further look.
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/TimeZoneDB.scala
Outdated
Show resolved
Hide resolved
build |
This closes #9804 |
PR includes:
from_utc_timestamp
using Spark existing implement (running in CPU)Note: this depends on GPU kernel implement from Spark-Rapids-JNI to make it really running on GPU.