Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement TableCacheQueryStageExec on GPU #9876

Open
razajafri opened this issue Nov 28, 2023 · 3 comments
Open

[FEA] Implement TableCacheQueryStageExec on GPU #9876

razajafri opened this issue Nov 28, 2023 · 3 comments
Assignees
Labels
feature request New feature or request Spark 3.5+ Spark 3.5+ issues

Comments

@razajafri
Copy link
Collaborator

We need a GPU version of TableCacheQueryStageExec which is used by AQE

When running the following on Spark 3.5.0 we fallback to the CPU

>>> from decimal import Decimal
>>> from datetime import date
>>> from datetime import datetime
>>> from pyspark.sql.types import ByteType, StringType, IntegerType, LongType, FloatType, DoubleType, BooleanType, DateType, TimestampType, DecimalType, ArrayType, StructField, StructType
>>> schema_nested_struct_no_map = StructType([
...     StructField("structF", StructType([
...         StructField("structF11", StructType([
...             StructField("structF111", StructType([
...                 StructField("byteF", ByteType()),
...                 StructField("strF", StringType()),
...                 StructField("intF", IntegerType()),
...                 StructField("longF",LongType()),
...                 StructField("floatF", FloatType()),
...                 StructField("doubleF", DoubleType()),
...                 StructField("boolF", BooleanType()),
...                 StructField("dateF", DateType()),
...                 StructField("timestampF", TimestampType()),
...                 StructField("decimal_precision_8", DecimalType(8, 3)),
...                 StructField("arrayF", ArrayType(StringType()))
...             ]))
...         ])),
...         StructField("structF12", StructType([
...             StructField("byteF12", ByteType()),
...             StructField("strF12", StringType())
...         ]))
...     ]))
... ])
>>> data_nested_struct_no_map = [
...     ((((50, None, 1000, 20000, 123.12, 123.123456, True, date(1990, 10, 12), datetime(2020,2,1,12,1,1),
...         Decimal('12345.123'),["spark", "adobe","microsoft"]),), (60, "amd1234"),),),
...     ((((60, "nvida&^g3", None, 22000, 123.121, 456.123456, False,  date(1991, 10, 12), datetime(2021,2,1,12,1,1),
...         Decimal('12345.456'),["spark456", "adobe","microsoft123"]),), (70, "amd4567"),),),
...     ((((None, "nvida&^g3", 3000, None, 123.122, 78.123456, True,  date(1990, 10, 12), datetime(2020,2,1,12,1,1),
...         Decimal('12345.789'),["spark", "adobe123","microsoft"]),), (70, "amd1234"),),),
...     ((((80, "nvida&^g3", 4000, 20030, None, 123.123456, False,  date(1991, 10, 12), datetime(2020,2,1,12,1,10),
...         Decimal('12345.123'),["spark123", "adobe","microsoft"]),), (60, "amd4567"),),)
... ]
>>> df = spark.createDataFrame(data_nested_struct_no_map, schema_nested_struct_no_map)
>>> df = df.cache()
23/11/28 20:49:44 WARN GpuOverrides: 
! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
  @Expression <AttributeReference> structF#0 could run on GPU

>>> df.count()
23/11/28 20:49:46 WARN GpuOverrides:                              (0 + 64) / 64]
      ! <TableCacheQueryStageExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.adaptive.TableCacheQueryStageExec

23/11/28 20:49:46 WARN GpuOverrides: 
    ! <TableCacheQueryStageExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.adaptive.TableCacheQueryStageExec

4                                                                               
@razajafri razajafri added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 28, 2023
@mattahrens mattahrens added the Spark 3.5+ Spark 3.5+ issues label Nov 28, 2023
@mattahrens
Copy link
Collaborator

@razajafri: can you check if our integration tests for cache are passing for Spark 3.5 as they should be hitting this same issue?

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 28, 2023
@razajafri
Copy link
Collaborator Author

I ran the tests but forgot to post here. All of our current cache tests are passing. There is something else in the plan that's causing the TableCacheQueryStageExec to kick in. Will post my findings once I have investigated this more

@razajafri
Copy link
Collaborator Author

According to @revans2 this is not a trivial task because internally Spark is looking for InMemoryTableScanExec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark 3.5+ Spark 3.5+ issues
Projects
None yet
Development

No branches or pull requests

2 participants