Generated on 2022-08-20
#6081 | [FEA] Update spark2 code for 22.08 |
#5508 | [FEA] collect_set on struct[Array] |
#5222 | [FEA] Support function array_except |
#5228 | [FEA] Support array_union |
#5188 | [FEA] Support arrays_overlap |
#4932 | [FEA] Support ArrayIntersect on at least Arrays of String |
#4005 | [FEA] Support First() in windowing context with Integer type |
#5061 | [FEA] Support last in windowing context for Integer type. |
#6059 | [FEA] Add SQL table to Qualification's app-details view |
#5617 | [FEA] Qualification tool support parsing expressions (part 1) |
#4719 | [FEA] GpuStringSplit: Add support for line and string anchors in regular expressions |
#5502 | [FEA] Qualification tool should use SQL ID of each Application ID like profiling tool |
#5524 | [FEA] Automatically adjust spark.rapids.sql.format.parquet.multiThreadedRead.numThreads to the same as spark.executor.cores |
#4817 | [FEA] Support Iceberg batch reads |
#5510 | [FEA] Support Iceberg for data INSERT, DELETE operations |
#5890 | [FEA] Mount the alluxio buckets/paths on the fly when the query is being executed |
#6018 | [FEA] Support Spark 3.2.2 |
#5417 | [FEA] Fully support reading parquet binary as string |
#4283 | [FEA] Implement regexp_extract_all on GPU for idx > 0 |
#4353 | [FEA] Implement regexp_extract_all on GPU for idx = 0 |
#5813 | [FEA] Set sql.json.read.double.enabled and sql.csv.read.double.enabled to true by default |
#4720 | [FEA] GpuStringSplit: Add support for limit = 0 and limit =1 |
#5953 | [FEA] Support Rocky Linux release |
#5204 | [FEA] Support Key vectors for GetMapValue and ElementAt for maps. |
#4323 | [FEA] Profiling tool add option to filter based on filesystem date |
#5846 | [FEA] Support null characters in regular expressions |
#5904 | [FEA] Add support for negated POSIX character classes in regular expressions |
#5702 | [FEA] Set spark.rapids.sql.explain=NOT_ON_GPU by default |
#5867 | [FEA] Add shim for Spark 3.3.1 |
#5628 | [FEA] Enable Application detailed view in Qualification UI |
#5831 | [FEA] Update default speedup factors used for qualification tool |
#4519 | [FEA] Add regular expression support for Form Feed, Alert, and Escape control characters |
#4040 | [FEA] Support spark.sql.parquet.binaryAsString=true |
#5797 | [FEA] Support RoundCeil and RoundFloor when scale is zero |
#4468 | [FEA] Support repetition quantifiers ? and * with regexp_replace |
#5679 | [FEA] Support MMyyyy date/timestamp format |
#4413 | [FEA] Add support for POSIX characters in regular expressions |
#4289 | [FEA] Regexp: Add support for word and non-word boundaries in regexp pattern |
#4517 | [FEA] Add support for word boundaries \b and \B in regular expressions |
#6060 | [FEA] Add experimental multi-threaded BypassMergeSortShuffleWriter |
#5453 | [FEA] Support runtime filters for BatchScanExec |
#5075 | Performance can be very slow when reading just a few columns out of many on parquet |
#5624 | [FEA] Let CPU handle Delta table's metadata related queries |
#4837 | [FEA] Optimize JSON reading of floating-point values |
#6112 | [BUG] UCX ubuntu dockerfile build failed |
#6146 | [BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location |
#6281 | [BUG] Reading binary columns from nested types does not work. |
#6282 | [BUG] Missing CPU fallback for GetMapValue on scalar map, vector key |
#6208 | [BUG] test_array_intersect failed in databricks 10.4 runtime and Spark 3.3+ |
#6249 | [BUG] test_array_union_before_spark313 failed in UCX job |
#6232 | [BUG] Query failed with java.lang.NullPointerException when doing GpuSubqueryBroadcastExec |
#6230 | [BUG] AQE does not respect entirePlanWillNotWork |
#6131 | [BUG] count() in avro failed when reader_types is coalescing |
#6220 | [BUG] Host buffer leak occurred when executing count with Avro multi-threaded reader |
#6160 | [BUG] When Hive table's actual data has varchar, but the DDL is string, then query fails to do varchar to string conversion |
#6183 | [BUG] Qualification UI uses single precision floating point |
#6005 | [BUG] When old Hive partition has different schema than new partition& Hive Schema, read old partition fails with "Found no metadata for schema index" |
#6158 | [BUG] AQE being used on Databricks even when its disabled |
#6179 | [BUG] Qualfication tool per sql output --num-output-rows option broken |
#6157 | [BUG] Pandas UDF hang in Databricks |
#6167 | [BUG] iceberg_test failed in nightly |
#6128 | [BUG] Can not ansi cast decimal type to long type while fetching decimal column from data table |
#6029 | [BUG] Query failed if reading a Hive partition table with partition key column is a Boolean data type, and if spark.rapids.alluxio.pathsToReplace is set |
#6054 | [BUG] Test Parquet nested unsigned int: uint8, uint16, uint32 FAILED in spark 320+ |
#6086 | [BUG] checkValue does not work in RapidsConf |
#6127 | [BUG] regex_test failed in nightly |
#6026 | [BUG] Failed to cast value false to BooleanType for partition column k1 |
#5984 | [BUG] DATABRICKS: NullPointerException: format is null in 22.08 (works fine with 22.06) |
#6089 | [BUG] orc_test is failing on Spark 3.2+ |
#5892 | [BUG] When using Alluxio+Spark RAPIDS, if the S3 bucket is not mounted, then query will return nothing |
#6056 | [BUG] zstd integration tests failed for orc on Cloudera |
#5957 | [BUG] Exception calling collect() when partitioning using with arrays with null values using array_union(...) |
#6017 | [BUG] test_parquet_read_round_trip hanging forever in spark 32x standalone mode |
#6035 | [BUG] cache tests throws ClassCastException on Databricks |
#6032 | [BUG] Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec failure |
#6028 | [BUG] regexp_test is failing in nightly tests |
#3677 | [BUG] PCBS does not fully follow the pattern for public classes |
#6022 | [BUG] test_iceberg_fallback_not_unsafe_row failed in databricks 10.4 runtime |
#109 | [BUG] GPU degreees function does not overflow |
#5959 | [BUG] test_parquet_read_encryption fails |
#5493 | [BUG] test_parquet_read_merge_schema failed w/ TITAN V |
#5521 | [BUG] Investigate regexp failures with unicode input |
#5629 | [BUG] regexp unicode tests require LANG=en_US.UTF-8 to pass |
#5448 | [BUG] partitioned writes require single batches and sorting, causing gpu OOM in some cases |
#6003 | [BUG] join_test failed in integration tests |
#5979 | [BUG] executors shutdown intermittently during integrations test parallel run |
#5948 | [BUG] GPU ORC reading fails when positional schema is enabled and more columns are required. |
#5909 | [BUG] Null characters do not work in regular expression character classes |
#5956 | [BUG] Warnings in build for GpuRegExpUtils with group_index |
#4676 | [BUG] Research associating MemoryCleaner to Spark's ShutdownHookManager |
#5854 | [BUG] Memory leaked in some test cases |
#5937 | [BUG] test_get_map_value_string_col_keys_ansi_fail in databricks321 runtime |
#5891 | [BUG] GpuShuffleCoalesce op time metric doesn't include concat batch time |
#5896 | [BUG] Profiling tool on taking a really long time for integration tests |
#5939 | [BUG] Qualification tool UI. Read Schema column is broken |
#5711 | [BUG] regexp: Build fails on CI when more characters added to fuzzer but not locally |
#5929 | [BUG] test_sorted_groupby_first_last failed in nightly tests |
#5914 | [BUG] test_parquet_compress_read_round_trip tests failed in spark320+ |
#5859 | [BUG] Qualification tools csv order is not in sync |
#5648 | [BUG] compile-time references to classes potentially unavailable at run time |
#5838 | [BUG] Qualification ui output goes to wrong folder |
#5855 | [BUG] MortgageSparkSuite.scala set spark.rapids.sql.explain as true, which is invalid |
#5630 | [BUG] Qualification UI cannot render long strings |
#5732 | [BUG] fix estimated speed-up for not-applicable apps in Qualification results |
#5788 | [BUG] Qualification UI Sanitize template content |
#5836 | [BUG] string_test.py::test_re_replace_repetition failed IT |
#5837 | [BUG] test_parquet_read_round_trip_binary_as_string failures on YARN and Dataproc |
#5726 | [BUG] CastChecks.sparkIntegralSig has BINARY in it twice |
#5775 | [BUG] TimestampSuite is run on Spark 3.3.0 only |
#5678 | [BUG] Inconsistency between the time zone in the fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId |
#5688 | [BUG] AnsiCast is merged into Cast in Spark 340, failing the 340 build |
#5480 | [BUG] Some arithmetic tests are failing on Spark 3.4.0 |
#5777 | [BUG] repeated runs of mvn package without clean lead to missing spark-rapids-jni-version-info.properties in dist jar |
#5456 | [BUG] Handle regexp_replace inconsistency from https://issues.apache.org/jira/browse/SPARK-39107 |
#5683 | [BUG] test_cast_neg_to_decimal_err failed in recent 22.08 tests |
#5525 | [BUG] Investigate more edge cases in regexp support |
#5744 | [BUG] Compile failure with Spark 3.2.2 |
#5707 | [BUG] Fix shim-related bugs |
#6367 | Revert "Enable Strings as a supported type for GpuColumnarToRow transitions" |
#6354 | Update 22.08 changelog to latest [skip ci] |
#6348 | Update plugin jni version to released 22.08.0 |
#6234 | [Doc] Add 22.08 docs' links [skip ci] |
#6288 | CPU fallback for Map scalars with key vectors |
#6292 | Fix parquet binary reads to do the transformation in the plugin |
#6257 | Fallback to CPU for Parquet reads with _databricks_internal columns |
#6274 | Use schema instead of row field count during columnar conversion |
#6268 | Apply BroadcastMode key projections before interpreting key expressions in subqueries |
#6250 | Fix bug where AQE does not respect entirePlanWillNotWork |
#6248 | Fix some issues with reading binary from parquet |
#6239 | Add rocky Dockerfiles and refine docker documentation |
#6079 | Add support for nested types to collect_set(...) on the GPU |
#6215 | Update Spark2 Explain API code for 22.08 |
#6161 | Added binary read support for Parquet [Databricks] |
#6222 | Init 22.08 changelog [skip ci] |
#6225 | Fix count() in avro failed when reader_types is coalescing |
#6216 | [Doc] Update 22.08 documentation |
#6223 | Temporary fix for test_array_intersect failures on Spark 3.3.0 |
#6221 | Release host buffers when Avro read schema is empty |
#6132 | [DOC]update outofdate mortgage notebooks and update docs for xgboost161 jar[skip ci] |
#6188 | Allow ORC conversion from VARCHAR to STRING |
#6013 | Add fixed issues to regex fuzzer |
#5958 | Add set based operations for arrays: array_intersect , array_union , array_except , and arrays_overlap for running on GPU |
#6189 | Qualification UI change floating precision [skip ci] |
#6063 | Fix Parquet schema evolution when missing column is in a nested type |
#6159 | Workaround for Databricks using AQE even when disabled |
#6181 | Fix the qualification tool per sql number output rows option |
#6166 | Update the configs used to choose the Python runner for flat-map Pandas UDF |
#6169 | Fix IcebergProvider classname in unshim exceptions |
#6103 | Fix crash when casting decimals to long |
#6071 | Update test_add_overflow_with_ansi_enabled and test_subtraction_overflow_with_ansi_enabled to check the exception type for Integral case. |
#6136 | Fix Alluxio inferring partitions for BooleanType with Hive |
#6027 | Re-enable "transpile complex regex 2" scala test |
#6140 | Update profile names in unit tests docs [skip ci] |
#6141 | Fixes threaded shuffle writer test mocks for spark 3.3.0+ |
#6147 | Revert "Temporarily disable Parquet unsigned int test in ParquetScanS… |
#6133 | [DOC]update getting started guide doc for aws-emr670 release[skip ci] |
#6007 | Add doc for parsing expressions in qualification tool [skip ci] |
#6125 | Add SQL table to Qualification's app-details view [skip ci] |
#6116 | Fix: check validity before setting the default value |
#6120 | Qualification Tool add test for SQL Description escaping commas for csv |
#6106 | Qualification tool: Parse expressions in WindowExec |
#6040 | Enable anchors in regexp string split |
#6052 | Multi-threaded shuffle writer for RapidsShuffleManager |
#5998 | Enable Strings as a supported type for GpuColumnarToRow transitions |
#6092 | Qualification tool output recommendations on a per sql query basis |
#6104 | Revert to only supporting Apache Iceberg 0.13.x |
#6111 | Fix missed gnupg2 in ucx example dockerfiles [skip ci] |
#6107 | Disable snapshot shims build in 22.08 |
#6016 | Automatically adjust spark.rapids.sql.multiThreadedRead.numThreads to the same as spark.executor.cores |
#6098 | Support Apache Iceberg 0.14.0 |
#6097 | Fix 3.3 shim to include castTo handling AnyTimestampType and minor spacing |
#6057 | Tag GpuWindow child expressions for GPU execution |
#6090 | Add missing is_spark_321cdh import in orc_test |
#6048 | Port whole parsePartitions method from Spark3.3 to Gpu side |
#5941 | GPU accelerate Apache Iceberg reads |
#5925 | Add Alluxio auto mount feature |
#6004 | Check the existence of alluxio path |
#6082 | Enable auto-merge from branch-22.08 to branch-22.10 [skip ci] |
#6058 | Disable zstd orc tests in cdh |
#6078 | Temporarily disable Parquet unsigned int test in ParquetScanSuite |
#6049 | Fix test hang caused by parquet hadoop test jar log4j file |
#6042 | Qualification tool: Parse expressions in Aggregates and Sort execs. |
#6041 | Improve check for UTF-8 in integration tests by testing from the JVM |
#5970 | Address feedback in "Improve regular expression error messages" PR |
#6000 | Support nth_value, first and last in window context |
#6031 | Update spark322shim dependency to released lib |
#6033 | Refactor: Fix PCBS does not fully follow the pattern for public classes |
#6019 | Update the interval division to throw same type exceptions as Spark |
#6030 | Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers |
#5988 | [FEA] Add a progress bar in Qualification tool when it is running |
#6020 | Unify test modes in databricks test script |
#6025 | Skip Iceberg tests on Databricks |
#5983 | Adding AUTO native parquet support and legacy tests |
#6010 | Update docs to better explain limitations of Dataset support |
#5996 | Fix GPU degrees function does not overflow |
#5994 | Skip Parquet encryption read tests if Parquet version is less than 1.12 |
#5776 | Enable regular expression support based on whether UTF-8 is in the current locale |
#6009 | Fix issue where spark-tests was producing an unintended error code |
#5903 | Avoid requiring single batch when using out-of-core sort |
#6008 | Rename test modes in spark-tests.sh [skip ci] |
#5991 | Enable zstd integration tests for parquet and orc |
#5997 | support testing parquet encryption |
#5968 | Add support for regexp_extract_all on GPU |
#5995 | Fix a minor potential issue when rebatching for GpuArrowEvalPythonExec |
#5960 | Set up the framework of type casting for ORC reading |
#5987 | Document how to check if finalized plan on GPU from user code / REPLs [skip ci] |
#5982 | Use the new native parquet footer API instead of the old one |
#5972 | [DOC] add app-details to qualification tools doc [skip ci] |
#5976 | Enable null in regex character classes |
#5974 | Remove scaladoc warning |
#5912 | Fall back to CPU for Delta Lake metadata queries |
#5955 | Fix fake memory leaks in some test cases |
#5915 | Make the error message of changing decimal type the same as Spark's |
#5971 | Append new authorized user to blossom-ci whitelist [skip ci] |
#5967 | [Doc]In Databricks doc, disable DPP config[skip ci] |
#5871 | Improve regular expression error messages |
#5952 | Qualification tool: Parse expressions in ProjectExec |
#5961 | Don't set spark.sql.ansi.strictIndexOperator to false for array subscript test |
#5935 | Enable reading double values on GPU when reading CSV and JSON |
#5950 | Fix GpuShuffleCoalesce op time metric doesn't include concat batch time |
#5932 | Add string split support for limit = 0 and limit =1 |
#5951 | Fix issue with Profiling tool taking a long time due to finding stage ids that maps to sql nodes |
#5954 | Add IT dockerfile for rockylinux8 [skip ci] |
#5949 | Update GpuAdd and GpuSubtract to throw same type exception as Spark |
#5878 | Fix misleading documentation for approx_percentile and some other functions |
#5913 | Update gcp cluster init option [skip ci] |
#5940 | Qualification tool UI. fix Read-Schema column broken [skip ci] |
#5938 | Fix leaks in the test cases of CachedBatchWriterSuite |
#5934 | Add underscore to regexp fuzzer |
#5936 | [BUG] Fix databricks test report location |
#5883 | Add support for element_at and GetMapValue |
#5918 | Filter profiling tool based on start time. |
#5926 | Collect databricks test report |
#5924 | Changes made to the Audit process for prioritizing the commits [skip-ci] |
#5834 | Add support for null characters in regular expressions |
#5930 | Make first/last test for sorted deterministic |
#5917 | Improve sort removal heuristic for sort aggregate |
#5916 | Revert "Enable testing zstd for spark releases 3.2.0 and later (#5898)" |
#5686 | Add GpuMapConcat support for nested-type values |
#5905 | Add support for negated POSIX character classes \P |
#5898 | Enable testing parquet with zstd for spark releases 3.2.0 and later |
#5900 | Optimize some common if/else cases |
#5869 | Qualification: fix sorting and add unit-tests script |
#5819 | Modify the default value of spark.rapids.sql.explain as NOT_ON_GPU |
#5723 | Dynamically load hive and avro using reflection to avoid potential class not found exception |
#5886 | Avoid serializing plan in GpuCoalesceBatches, GpuHashAggregateExec, and GpuTopN |
#5897 | GpuBatchScanExec partitions should be marked transient |
#5894 | [Doc]fix a typo with double "("[skip ci] |
#5880 | Qualification tool: Parse expressions in FilterExec |
#5885 | [Doc] Fix alluxio doc link issue[skip ci] |
#5879 | Avoid duplicate sanitization step when reading JSON floats |
#5877 | Add Apache Spark 3.3.1-SNAPSHOT Shims |
#5783 | assertMinValueOverflow should throw same type of exception as Spark |
#5875 | Qualification ui output goes to wrong folder |
#5870 | Use a common thread pool across formats for multithreaded reads |
#5868 | Profiling tool add wholestagecodegen to execs mapping, sql to stage info and job end time |
#5873 | Correct the value of spark.rapids.sql.explain |
#5695 | Verify DPP over LIKE ANY/ALL expression |
#5856 | Update unit test doc |
#5866 | Fix CsvScanForIntervalSuite leak issues |
#5810 | Qualification UI - add application details view |
#5860 | [Doc]Add Spark3.3 support in doc[skip ci] |
#5858 | Remove SNAPSHOT support from Spark 3.3.0 shim |
#5857 | Remove user sperlingxx[skip ci] |
#5841 | Enable regexp empty string short circuit on shim version 3.1.3 |
#5853 | Fix auto merge conflict 5850 |
#5845 | Update Parquet binaryAsString integration to use a static parquet file |
#5842 | Update default speedup factors for qualification tool |
#5829 | Add regexp support for Alert, and Escape control characters |
#5833 | Add test for GpuCast canonicalization with timezone |
#5822 | Configure log4j version 2.x for test cases |
#5830 | Enable the spark.sql.parquet.binaryAsString=true configuration option on the GPU |
#5805 | [Issue 5726] Removing duplicate BINARY keyword |
#5828 | Update tools module to latest Hadoop version |
#5809 | Disable Spark 3.4.0 premerge for 22.08 and enable for 22.10 |
#5767 | Fix the time zone check issue |
#5814 | Fix auto merge conflict 5812 [skip ci] |
#5804 | Support RoundCeil and RoundFloor when scale is zero |
#5696 | Support Parquet field IDs |
#5749 | Add shims for AnsiCast |
#5780 | Append new authorized user to blossom-ci whitelist [skip ci] |
#5350 | Halt Spark executor when encountering unrecoverable CUDA errors |
#5779 | Fix repeated runs mvn package without clean lead to missing spark-rapids spark-rapids-jni-version-info.properties in dist jar |
#5800 | Fix auto merge conflict 5799 |
#5794 | Fix auto merge conflict 5789 |
#5740 | Handle regexp_replace inconsistency with empty strings and zero-repetition patterns |
#5790 | Fix auto merge conflict 5789 |
#5690 | Update the error checking of test_cast_neg_to_decimal_err |
#5774 | Fix merge conflict with branch-22.06 |
#5768 | Support MMyyyy date/timestamp format |
#5692 | Add support for POSIX predefined character classes |
#5762 | Fix auto merge conflict 5759 |
#5754 | Fix auto merge conflict 5752 |
#5450 | Handle ? , * , {0,} and {0,n} based repetitions in regexp_replace on the GPU |
#5479 | Add support for word boundaries \b and \B |
#5745 | Move RapidsErrorUtils to org.apache.spark.sql.shims package |
#5610 | Fall back to CPU for unsupported regular expression edge cases with end of line/string anchors and newlines |
#5725 | Fix auto merge conflict 5724 |
#5687 | Minor: Clean up GpuConcat |
#5710 | Fix auto merge conflict 5709 |
#5708 | Fix shim-related bugs |
#5700 | Fix auto merge conflict 5699 |
#5675 | Update the error messages for the failing arithmetic tests. |
#5689 | Disable 340 for premerge and nightly |
#5603 | Skip unshim and dedup of external spark-rapids-jni and jucx |
#5472 | Add shims for Spark 3.4.0 |
#5647 | Init version 22.08.0-SNAPSHOT |
#5451 | [FEA] Update Spark2 explain code for 22.06 |
#5261 | [FEA] Create MIG with Cgroups on YARN Dataproc scripts |
#5476 | [FEA] extend concat on arrays to all nested types. |
#5113 | [FEA] ANSI mode: Support CAST between types |
#5112 | [FEA] ANSI mode: allow casting between numeric type and timestamp type |
#5323 | [FEA] Enable floating point by default |
#4518 | [FEA] Add support for escaped unicode hex in regular expressions |
#5405 | [FEA] Support map_concat function |
#5547 | [FEA] Regexp: Can we transpile \W and \D to Java's definition so we can support on GPU? |
#5512 | [FEA] Qualification tool, hook up final output and output execs table |
#5507 | [FEA] Support GpuRaiseError |
#5325 | [FEA] Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#3682 | [FEA] Use conventional jar layout in dist jar if there is only one input shim |
#1556 | [FEA] Implement ANSI mode tests for string to timestamp functions |
#4425 | [FEA] Support line anchor $ and string anchors \z and \Z in regexp_replace |
#5176 | [FEA] Qualification tool UI |
#5111 | [FEA] ANSI mode: CAST between ANSI intervals and IntegralType |
#4605 | [FEA] Add regular expression support for new character classes introduced in Java 8 |
#5273 | [FEA] Support map_filter |
#1557 | [FEA] Enable ANSI mode for CAST string to date |
#5446 | [FEA] Remove hasNans check for array_contains |
#5445 | [FEA] Support reading Int as Byte/Short/Date from parquet |
#5449 | [FEA] QualificationTool. Add speedup information to AppSummaryInfo |
#5322 | [FEA] remove hasNans for Pivot |
#4800 | [FEA] Enable support for more regular expressions with \A and \Z |
#5404 | [FEA] Add Shim for the Spark version shipped with Cloudera CDH 7.1.7 |
#5226 | [FEA] Support array_repeat |
#5229 | [FEA] Support arrays_zip |
#5119 | [FEA] Support ANSI mode for SQL functions/operators |
#4532 | [FEA] Re-enable support for \Z in regular expressions |
#3985 | [FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown |
#5034 | [FEA] Implement ExistenceJoin for BroadcastNestedLoopJoin Exec |
#4533 | [FEA] Re-enable support for $ in regular expressions |
#5263 | [FEA] Write out operator mapping from plugin to CSV file for use in qualification tool |
#5095 | [FEA] Support collect_set on struct in reduction context |
#4811 | [FEA] Support ANSI intervals for Cast and Sample |
#2062 | [FEA] support collect aggregations |
#5060 | [FEA] Support Count on Struct of [ Struct of [String, Map(String,String)], Array(String), Map(String,String) ] |
#4528 | [FEA] Add support for regular expressions containing \s and \S |
#4557 | [FEA] Add support for regexp_replace with back-references |
#5148 | Add the MULTI-THREADED reading support for avro |
#5304 | [FEA] Optimize remote Avro reading for a PartitionFile |
#5257 | [FEA][Audit] - [SPARK-34863][SQL] Support complex types for Parquet vectorized reader |
#5149 | Add the COALESCING reading support for avro |
#5769 | [BUG] arithmetic ops tests failing on Spark 3.3.0 |
#5785 | [BUG] Tests module build failed in OrcEncryptionSuite for 321cdh |
#5765 | [BUG] Container decimal overflow when casting float/double to decimal |
#5246 | Verify Parquet columnar encryption is handled safely |
#5770 | [BUG] test_buckets failed |
#5733 | [BUG] Integration test test_orc_write_encryption_fallback fail |
#5719 | [BUG] test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark330 |
#5739 | [BUG] Spark 3.3 build failure - QueryExecutionErrors package scope changed |
#5670 | [BUG] Job failed when parsing "java.lang.reflect.InvocationTargetException: org.apache.spark.sql.catalyst.parser.ParseException:" |
#4860 | [BUG] GPU writing ORC columns statistics |
#5717 | [BUG] div_by_zero test is failing on Spark 330 on 22.06 |
#5632 | [BUG] udf_cudf tests failed: EOFException DataInputStream.readInt(DataInputStream.java:392) |
#5672 | [BUG] Read exception occurs when clipped schema is empty |
#5694 | [BUG] Inconsistent behavior with Spark when reading a non-existent column from Parquet |
#5562 | [BUG] read ORC file with various file schemas |
#5654 | [BUG] Transpiler produces regex pattern that cuDF cannot compile |
#5655 | [BUG] Regular expression pattern [&&1] produces incorrect results on GPU |
#4862 | [FEA] Add support for regular expressions containing octal digits inside character classes , eg[\0177] |
#5615 | [BUG] GpuBatchScanExec only reports output row metrics |
#4505 | [BUG] RegExp parse fails to parse character ranges containing escaped characters |
#4865 | [BUG] Add support for regular expressions containing hexadecimal digits inside character classes, eg [\x7f] |
#5513 | [BUG] NoClassDefFoundError with caller classloader off in GpuShuffleCoalesceIterator in local-cluster |
#5530 | [BUG] regexp: \d , \w inconsistencies with non-latin unicode input |
#5594 | [BUG] 3.3 test_div_overflow_exception_when_ansi test failures |
#5596 | [BUG] Shim service provider failure when using jar built with -DallowConventionalDistJar |
#5582 | [BUG] Nightly CI failed with : 'dist/target/rapids-4-spark_2.12-22.06.0-SNAPSHOT.jar' not exists |
#5577 | [BUG] test_cast_neg_to_decimal_err failing in databricks |
#5557 | [BUG] dist jar does not contain reduced pom, creates an unnecessary jar |
#5474 | [BUG] Spark 3.2.1 arithmetic_ops_test failures |
#5497 | [BUG] 3 tests in IntervalSuite are faling on 330 |
#5544 | [BUG] GpuCreateMap needs to set hasSideEffects in some cases |
#5469 | [BUG] NPE during serialization for shuffle in array-aggregation-with-limit query |
#5496 | [BUG] avg literals bools is failing on 330 |
#5511 | [BUG] orc_test failures on 321cdh |
#5439 | [BUG] Encrypted Parquet writes are being replaced with a GPU unencrypted write |
#5108 | [BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists |
#5492 | [BUG] com.nvidia.spark.rapids.RegexCharacterClass cannot be cast to com.nvidia.spark.rapids.RegexCharacterClassComponent |
#4818 | [BUG] ASYNC: the spill store needs to synchronize on spills against the allocating stream |
#5481 | [BUG] test_parquet_check_schema_compatibility failed in databricks runtimes |
#5482 | [BUG] test_cast_string_date_invalid_ansi_before_320 failed in databricks runtime |
#5457 | [BUG] 330 AnsiCastOpSuite Unit tests failed 22 cases |
#5098 | [BUG] Harden calls to RapidsBuffer.free |
#5464 | [BUG] Query failure with java.lang.AssertionError when using partitioned Iceberg tables |
#4746 | [FEA] Add support for regular expressions containing octal digits in range \200 to 377 |
#5200 | [BUG] More detailed logs to show which parquet file and which data type has mismatch. |
#4866 | [BUG] Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5140 | [BUG] NPE on array_max of transformed empty array |
#5444 | [BUG] build failed on Databricks |
#5357 | [BUG] Spark 3.3 cache_test test_passing_gpuExpr_as_Expr[failures |
#5429 | [BUG] test_cache_expand_exec fails on Spark 3.3 |
#5312 | [BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. |
#5415 | [BUG] Regular Expressions: matching the dot . doesn't fully exclude all unicode line terminator characters |
#5413 | [BUG] Databricks 321 build fails - not found: type OrcShims320untilAllBase |
#5286 | [BUG] assert failed test_struct_self_join and test_computation_in_grpby_columns |
#5351 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5260 | [BUG] map_test failures on Spark 3.3.0 |
#5189 | [BUG] Reading from iceberg table will fail. |
#5130 | [BUG] string_split does not respect spark.rapids.sql.regexp.enabled config |
#5267 | [BUG] markdown link check failed issue |
#5295 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5264 | [BUG] Delete unused generic type. |
#5275 | [BUG] rlike cannot run on GPU because invalid or unsupported escape character ']' near index 14 |
#5278 | [BUG] build 311cdh failed: unable to find valid certification path to requested target |
#5211 | [BUG] csv_test:test_basic_csv_read FAILED |
#5244 | [BUG] Spark 3.3 integration test failures logic_test.py::test_logical_with_side_effect |
#5041 | [BUG] Implement hasSideEffects for all expressions that have side-effects |
#4980 | [BUG] window_function_test FAILED on PASCAL GPU |
#5240 | [BUG] EGX integration test_collect_list_reductions failures |
#5242 | [BUG] Executor falls back to cudaMalloc if the pool can't be initialized |
#5215 | [BUG] Coalescing reading is not working for v2 parquet/orc datasource |
#5104 | [BUG] Unconditional warning in UDF Plugin "The compiler is disabled by default" |
#5099 | [BUG] Profiling tool should not sum gettingResultTime |
#5182 | [BUG] Spark 3.3 integration tests arithmetic_ops_test.py::test_div_overflow_exception_when_ansi failures |
#5147 | [BUG] object LZ4Compressor is not a member of package ai.rapids.cudf.nvcomp |
#4695 | [BUG] Segfault with UCX and ASYNC allocator |
#5138 | [BUG] xgboost job failed if we enable PCBS |
#5135 | [BUG] GpuRegExExtract is not align with RegExExtract |
#5084 | [BUG] GpuWriteTaskStatsTracker complains for all writes in local mode |
#5123 | [BUG] Compile error for Spark330 because of VectorizedColumnReader constructor added a new parameter. |
#5133 | [BUG] Compile error for Spark330 because of Spark changed the method signature: QueryExecutionErrors.mapKeyNotExistError |
#4959 | [BUG] Test case in OpcodeSuite failed on Spark 3.3.0 |
#5863 | Update 22.06 changelog to include new commits [skip ci] |
#5861 | [Doc]Add Spark3.3 support in doc for 22.06 branch[skip ci] |
#5851 | Update 22.06 changelog to include new commits [skip ci] |
#5848 | Update spark330shim to use released lib |
#5840 | [DOC] Updated RapidsConf to reflect the default value of spark.rapids.sql.improvedFloatOps.enabled [skip ci] |
#5816 | Update 22.06.0 changelog to latest [skip ci] |
#5795 | Update FAQ to include local jar deployment via extraClassPath [skip ci] |
#5802 | Update spark-rapids-jni.version to release 22.06.0 |
#5798 | Fall back to CPU for RoundCeil and RoundFloor expressions |
#5791 | Remove ORC encryption test from 321cdh |
#5766 | Fix the overflow of container type when casting floats to decimal |
#5786 | Fix rounds over decimal in Spark 330+ |
#5761 | Throw an exception when attempting to read columnar encrypted Parquet files on the GPU |
#5784 | Update the error string for test_cast_neg_to_decimal_err on 330 |
#5781 | Correct the exception string for test_mod_pmod_by_zero on Spark 3.3.0 |
#5764 | Add test for encrypted ORC write |
#5760 | Enable avrotest in nightly tests [skip ci] |
#5746 | Init 22.06 changelog [skip ci] |
#5716 | Disable Avro support when spark-avro classes not loadable by Shim classloader |
#5737 | Remove the ORC encryption tests |
#5753 | [DOC] Update regexp compatibility for 22.06 [skip ci] |
#5738 | Update Spark2 explain code for 22.06 |
#5731 | Throw SparkDateTimeException for InvalidInput while casting in ANSI mode |
#5742 | Spark-3.3 build fix - Move QueryExecutionErrors to sql package |
#5641 | [Doc]Update 22.06 documentation[skip ci] |
#5701 | Update docs for qualification tool to reflect recommendations and UI [skip ci] |
#5283 | Add documentation for MIG on Dataproc [skip ci] |
#5728 | Qualification tool: Add test for stage failures |
#5681 | Branch 22.06 nvcomp notice binary [skip ci] |
#5713 | Fix GpuCast losing the timezoneId during canonicalization |
#5715 | Update GPU ORC statistics write support |
#5718 | Update the error message for div_by_zero test |
#5604 | ORC encrypted write should fallback to CPU |
#5674 | Fix reading ORC/PARQUET over empty clipped schema |
#5676 | Fix ORC reading over different schemas |
#5693 | Temporarily allow 3.3.1 for 3.3.0 shims. |
#5591 | Enable regular expressions by default |
#5664 | Fix edge case where one side of regexp choice ends in duplicate string anchors |
#5542 | Support arrays of arrays and structs for concat on arrays |
#5677 | Qualification tool Enable UI by default |
#5575 | Regexp: Transpile \D , \W to Java's definitions |
#5668 | Add user as CI owner [skip ci] |
#5627 | Install locales and generate en_US.UTF-8 |
#5514 | ANSI mode: allow casting between numeric type and timestamp type |
#5600 | Qualification tool UI cosmetics and CSV output changes |
#5658 | Fallback to CPU when && found in character class |
#5644 | Qualification tool: Enable UDF reporting in potential problems |
#5645 | Add support for octal digits in character classes |
#5643 | Fix missing GpuBatchScanExec metrics in SQL UI |
#5441 | Enable optional float confs and update docs mentioning them |
#5532 | Support hex digits in character classes and escaped characters in character class ranges |
#5625 | [DOC]update links for 2206 release[skip ci] |
#5623 | Handle duplicates in negated character classes |
#5533 | Support GpuMapConcat |
#5614 | Move HostConcatResultUtil out of unshimmed classes |
#5612 | Qualification tool: update SQL Df value used and look at jobs in SQL |
#5526 | Fix whitespace \s and \S tests |
#5541 | Regexp: Transpile \d , \w to Java's definitions |
#5598 | Qualification tool: Update RunningQualificationApp tests |
#5601 | Update test_div_overflow_exception_when_ansi test for Spark-3.3 |
#5588 | Update Databricks build scripts |
#5599 | Move ShimServiceProvider file re-init/truncate |
#5531 | Filter rows with null keys when coalescing due to reaching cuDF row limits |
#5550 | Qualification tool hook up final output based on per exec analysis |
#5540 | Support RaiseError |
#5505 | Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#5583 | Disable spark snapshot shims build for pre-merge |
#5584 | Enable automerge from branch-22.06 to 22.08 [skip ci] |
#5581 | nightly CI to install and deploy cuda11 classifier dist jar [skip ci] |
#5579 | Update test_cast_neg_to_decimal_err to work with Databricks 10.4 where exception is different |
#5578 | Fix unfiltered partitions being used to create GpuBatchScanExec RDD |
#5560 | Minor: Clean up the tests of concat_list |
#5528 | Enable build and test with JDK11 |
#5571 | Update array_min and array_max to use new cudf operations |
#5558 | Fix target file for update from extra-resources in dist module |
#5556 | Move FsInput creation into AvroFileReader |
#5483 | Don't distinguish between types of ArithmeticException for Spark 3.2.x |
#5539 | Fix IntervalSuite cases failure |
#5421 | Support multi-threaded reading for avro |
#5538 | Add tests for string to timestamp functions in ANSI mode |
#5546 | Set hasSideEffects correctly for GpuCreateMap |
#5529 | Fix failing bool agg test in Spark 3.3 |
#5500 | Fallback parquet reading with merged schema and native footer reader |
#5534 | MVN_OPT to last, as it is empty in most cases |
#5523 | Enable forcePositionEvolution for 321cdh |
#5501 | Build against specified spark-rapids-jni snapshot jar [skip ci] |
#5489 | Fallback to the CPU if Parquet encryption keys are set |
#5527 | Fix bug with character class immediately following a string anchor |
#5506 | Fix ClassCastException in regular expression transpiler |
#5519 | Address feedback in "string anchors regexp replace" PR |
#5520 | [DOC] Remove Spark from our naming of Tools [skip ci] |
#5491 | Enables $ , \z , and \Z in REGEXP_REPLACE on the GPU |
#5470 | Qualification tool support UI code generation |
#5353 | Supports casting between ANSI interval types and integral types |
#5487 | Add limited support for captured vars and athrow |
#5499 | [DOC]update doc for emr6.6[skip ci] |
#5485 | Add cudaStreamSynchronize when a new device buffer is added to the spill framework |
#5477 | Add support for \h , \H , \v , \V , and \R character classes |
#5490 | Qualification tool: Update speedup factor for few operators |
#5494 | Fix databrick Shim to support Ansi mode when casting from string to date |
#5498 | Enable 330 unit tests for nightly |
#5504 | Fix printing of split information when dumping debug data |
#5486 | Fix regression in AnsiCastOpSuite with Spark 3.3.0 |
#5436 | Support map_filter operator |
#5471 | Add implicit safeFree for RapidsBuffer |
#5465 | Fix query planning issue when Iceberg is used with DPP and AQE |
#5459 | Add test cases for casting string to date in ANSI mode |
#5443 | Add support for regular expressions containing octal digits greater than \200 |
#5468 | Qualification tool: Add support for join, pandas, aggregate execs |
#5473 | Remove hasNan check over array_contains |
#5434 | Check schema compatibility when building parquet readers |
#5442 | Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5466 | [Doc] Change the picture of the query plan to text format. [skip ci] |
#5310 | Use C++ to parse and filter parquet footers. |
#5454 | QualificationTool. Add speedup information to AppSummaryInfo |
#5455 | Moved ShimCurrentBatchIterator so it's visible to db312 and db321 |
#5354 | Plugin should throw same arithmetic exceptions as Spark part1 |
#5440 | Qualification tool support for read and write execs and more, add mapping stage times to sql execs |
#5431 | [DOC] Update the ubuntu repo key [skip ci] |
#5425 | Handle readBatch changes for Spark 3.3.0 |
#5438 | Add tests for all-null data for array_max |
#5428 | Make the sync marker uniform for the Avro coalescing reader |
#5432 | Test case insensitive reading for Parquet and CSV |
#5433 | [DOC] Removed mention of 30x from shims.md [skip ci] |
#5424 | Exclude all unicode line terminator characters from matching dot |
#5426 | Qualification tool: Parsing Execs to get the ExecInfo #2 |
#5427 | Workaround to fix cuda repo key rotation in ubuntu images [skip ci] |
#5419 | Append my id to blossom-ci whitelist [skip ci] |
#5422 | xfail tests for spark 3.3.0 due to changes in readBatch |
#5420 | Qualification tool: Parsing Execs to get the ExecInfo #1 |
#5418 | Add GpuEqualToNoNans and update GpuPivotFirst to use to handle PivotFirst with NaN support enabled on GPU |
#5306 | Support coalescing reading for avro |
#5410 | Update docs for removal of 311cdh |
#5414 | Add 320+-noncdh to Databricks to fix 321db build |
#5349 | Enable some repetitions for \A and \Z |
#5346 | ADD 321cdh shim to rapids and remove 311cdh shim |
#5408 | [DOC] Add rebase mode notes for databricks doc [skip ci] |
#5348 | Qualification tool: Skip GPU event logs |
#5400 | Restore test_computation_in_grpby_columns and test_struct_self_join |
#5399 | Update New Issue template to recommend a Discussion or Question [skip ci] |
#5293 | Support array_repeat |
#5359 | Qualification tool base plan parsing infrastructure |
#5360 | Revert "skip failing tests for Spark 3.3.0 (#5313)" |
#5326 | Update GCP doc and scripts [skip ci] |
#5352 | Fix spark330 build due to mapKeyNotExistError changed |
#5317 | Support arrays_zip |
#5316 | Support ANSI mode for ToUnixTimestamp, UnixTimestamp, GetTimestamp, DateAddInterval |
#5319 | Re-enable support for \Z in regular expressions on the GPU |
#5315 | Simplify conditional catalyst expressions generated by udf-compiler |
#5301 | Support existence join type for broadcast nested loop join |
#5313 | skip failing tests for Spark 3.3.0 |
#5311 | Add information about the discussion board to the README and FAQ [skip ci] |
#5308 | Remove unused ColumnViewUtil |
#5289 | Re-enable dollar ($) line anchor in regular expressions in find mode |
#5274 | Perform explicit UnsafeRow projection in ColumnarToRow transition |
#5297 | GpuStringSplit now honors thespark.rapids.sql.regexp.enabled configuration option |
#5307 | Remove compatibility guide reference to issue #4060 |
#5298 | Qualification tool: Operator mapping from plugin to CSV file |
#5266 | Update Outdated GCP getting started guide[skip ci] |
#5300 | Fix DIST_JAR PATH in coverage-report [skip ci] |
#5290 | Add documentation about reporting security issues [skip ci] |
#5277 | Support multiple datatypes in TypeSig.withPsNote() |
#5296 | Fix spark330 build due to removal of isElementAt parameter from mapKeyNotExistError |
#5291 | fix dead links in shims.md [skip ci] |
#5276 | fix markdown check issue[skip ci] |
#5270 | Include dependency of common jar in tools jar |
#5265 | Remove unused generic types |
#5288 | Temporarily xfail tests to restore premerge builds |
#5287 | Fix nightly scripts to deploy w/ classifier correctly [skip ci] |
#5134 | Support division on ANSI interval types |
#5279 | Add test case for ANSI pmod and ANSI Remainder |
#5284 | Enable support for escaping the right square bracket |
#5280 | [BUG] Fix incorrect plugin nightly deployment and release [skip ci] |
#5249 | Use a bundled spark-rapids-jni dependency instead of external cudf dependency |
#5268 | [BUG] When ASYNC is enabled GDS needs to handle cudaMalloced bounce buffers |
#5230 | Update csv float tests to reflect changes in precision in cuDF |
#5001 | Add fuzzing test for JSON reader |
#5155 | Support casting between day-time interval and string |
#5247 | Fix test failure caused by change in Spark 3.3 exception |
#5254 | Fix the integration test of collect_list_reduction |
#5243 | Throw again after logging that RMM could not intialize |
#5105 | Support multiplication on ANSI interval types |
#5171 | Fix the bug COALESCING reading does not work for v2 parquet/orc datasource |
#5157 | Update the log warning of UDF compiler |
#5213 | Support sample on ANSI interval types |
#5218 | XFAIL tests that are failing due to issue 5211 |
#5202 | Profiling tool: Remove gettingResultTime from stages & jobs aggregation |
#5201 | Fix merge conflict from branch-22.04 |
#5195 | Refactor Spark33XShims to avoid code duplication |
#5185 | Fix test failure with Spark 3.3 by looking for less specific error message |
#4992 | Support Collect-like Reduction Aggregations |
#5193 | Fix auto merge conflict 5192 [skip ci] |
#5020 | Support arithmetic operators on ANSI interval types |
#5174 | Fix auto merge conflict 5173 [skip ci] |
#5168 | Fix auto merge conflict 5166 |
#5151 | Remove NvcompLZ4CompressionCodec single-buffer APIs |
#5132 | Add count support for all types |
#5141 | Upgrade to UCX 1.12.1 for 22.06 |
#5143 | Fix merge conflict with branch-22.04 |
#5144 | Adapt to storage-partitioned join additions in SPARK-37377 |
#5139 | Make mvn-verify check name more descriptive [skip ci] |
#5136 | Fix GpuRegExExtract about inconsistent to Spark |
#5107 | Fix GpuFileFormatDataWriter failing to stat file after commit |
#5124 | Fix ShimVectorizedColumnReader construction for recent Spark 3.3.0 changes |
#5047 | Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode |
#5089 | Enable regular expressions containing \s and \S |
#5087 | Add support for regexp_replace with back-references |
#5110 | Appending my id (mattahrens) to the blossom-ci whitelist [skip ci] |
#5090 | Add nvtx ranges around pre, agg, and post steps in hash aggregate |
#5092 | Remove single-buffer compression codec APIs |
#5093 | Fix leak when GDS buffer store closes |
#5067 | Premerge databricks CI autotrigger [skip ci] |
#5083 | Remove EMRShimVersion |
#5076 | Unshim cache serializer and other 311+-all code |
#5074 | Make ASYNC the default allocator for 22.06 |
#5073 | Add in nvtx ranges for parquet filterBlocks |
#5077 | Change Scala style continuation indentation to be 2 spaces to match guide [skip ci] |
#5070 | Fix merge from 22.04 to 22.06 |
#5046 | Init 22.06.0-SNAPSHOT |
#5059 | Fix merge from 22.04 to 22.06 |
#5036 | Unshim many expressions |
#4993 | PCBS and Parquet support ANSI year month interval type |
#5031 | Unshim many SparkShim interfaces |
#5027 | Fix merge of branch-22.04 to branch-22.06 |
#5022 | Unshim many Pandas execs |
#5013 | Unshim GpuRowBasedScalaUDF |
#5012 | Unshim GpuOrcScan and GpuParquetScan |
#5010 | Unshim GpuSumDefaults |
#5007 | Remove schema utils, case class copying, file partition, and legacy statistical aggregate shims |
#4999 | Enable automerge from branch-22.04 to branch-22.06 [skip ci] |
#4734 | [FEA] Support approx_percentile in reduction context |
#1922 | [FEA] Support ORC forced positional evolution |
#123 | [FEA] add in support for dayfirst formats in the CSV parser |
#4863 | [FEA] Improve timestamp support in JSON and CSV readers |
#4935 | [FEA] Support reading Avro: primitive types |
#4915 | [FEA] Drop support for Spark 3.0.1, 3.0.2, 3.0.3, Databricks 7.3 ML LTS |
#4815 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists |
#3245 | [FEA] GpuGetMapValue should support all valid value data types and non-complex key types |
#4914 | [FEA] Support for Databricks 10.4 ML LTS |
#4945 | [FEA] Support filter and comparisons on ANSI day time interval type |
#4004 | [FEA] Add support for percent_rank |
#1111 | [FEA] support spark.sql.legacy.timeParserPolicy when parsing CSV files |
#4849 | [FEA] Support parsing dates in JSON reader |
#4789 | [FEA] Add Spark 3.1.4 shim |
#4646 | [FEA] Make JSON parsing of NaN and Infinity values fully compatible with Spark |
#4824 | [FEA] Support reading decimals from JSON and CSV |
#4814 | [FEA] Support element_at with non-literal index |
#4816 | [FEA] Support org.apache.spark.sql.catalyst.expressions.GetArrayStructFields |
#3542 | [FEA] Support str_to_map function |
#4721 | [FEA] Support regular expression delimiters for str_to_map |
#4791 | Update Spark 3.1.3 to be released |
#4712 | [FEA] Allow to partition on Decimal 128 when running on the GPU |
#4762 | [FEA] Improve support for reading JSON integer types |
#4696 | [FEA] Support casting map to string |
#1572 | [FEA] Add in decimal support for pmod, remainder and divide |
#4763 | [FEA] Improve support for reading JSON boolean types |
#4003 | [FEA] Add regular expression support to GPU implementation of StringSplit |
#4626 | [FEA] cannot run on GPU because unsupported data types in 'partitionSpec' |
#33 | [FEA] hypot SQL function |
#4515 | [FEA] Set RMM async allocator as default |
#3026 | [FEA] [Audit]: Set the list of read columns in the task configuration to reduce reading of ORC data |
#4895 | Add support for structs in GpuScalarSubquery |
#4393 | [BUG] Columnar to Columnar transfers are very slow |
#589 | [FEA] Support ExistenceJoin |
#4784 | [FEA] Improve copying decimal data from CPU columnar data |
#4685 | [FEA] Avoid regexp cost in string_split for escaped characters |
#4777 | Remove input upcast in GpuExtractChunk32 |
#4722 | Optimize DECIMAL128 average aggregations |
#4645 | [FEA] Investigate ASYNC allocator performance with additional queries |
#4539 | [FEA] semaphore optimization in shuffled hash join |
#2441 | [FEA] Use AST for filter in join APIs |
#5233 | [BUG] rapids-tools v22.04.0 release jar reports maven dependency issue : rapids-4-spark-common_2.12:jar:22.04.0 NOT FOUND |
#5183 | [BUG] UCX EGX integration test array_test.py::test_array_exists failures |
#5180 | [BUG] create_map failed with java.lang.IllegalStateException: This is not supported yet |
#5181 | [BUG] Dataproc tests failing when trying to detect for accelerated row conversions |
#5154 | [BUG] build failed in databricks 10.4 runtime (updated recently) |
#5159 | [BUG] Approx percentile query fails with UnsupportedOperationException |
#5164 | [BUG] Databricks 9.1ML failed with "java.lang.NoSuchMethodError: org.apache.spark.sql.execution.metric.SQLMetrics$.createSizeMetric" |
#5125 | [BUG] GpuCast.hasSideEffects does not check if child expression has side effects |
#5091 | [BUG] Profiling tool fails process custom task accumulators of type CollectionAccumulator |
#5050 | [BUG] Release build of v22.04.0 FAILED on "Execution attach-javadoc failed: NullPointerException" with maven option '-P source-javadoc' |
#5035 | [BUG] Different CSV parsing behavior between 22.04 and 22.02 |
#5065 | [BUG] spark330+ build error due to SPARK-37463 |
#5019 | [BUG] udf compiler failed to translate UDF in spark-shell |
#5048 | [BUG] OOM for q18 of TPC-DS benchmark testing on Spark2a |
#5038 | [BUG] When spark.rapids.sql.regexp.enabled is on in 22.04 snapshot jars, Reading a Delta table in Databricks may cause driver error |
#5023 | [BUG] When+sequence could trigger "Illegal sequence boundaries" error |
#5021 | [BUG] test_cache_reverse_order failed |
#5003 | [BUG] Cloudera 3.1.1 tests fail due to ClouderaShimVersion |
#4960 | [BUG] Spark 3.3 IT cache_test:test_passing_gpuExpr_as_Expr failure |
#4913 | [BUG] Fall back to the CPU if we see a scale on Ceil or Floor |
#4806 | [BUG] When running xgboost training, if PCBS is enabled, it fails with java.lang.AssertionError |
#4542 | [BUG] test_write_round_trip failed Maximum pool size exceeded |
#4911 | [BUG][Audit] [SPARK-38314] - Fail to read parquet files after writing the hidden file metadata |
#4936 | [BUG] databricks nightly window_function_test failures |
#4931 | [BUG] Spark 3.3 IT test cache_test.py::test_passing_gpuExpr_as_Expr fails with IllegalArgumentException |
#4710 | [BUG] cudaErrorIllegalAddress for q95 (3TB) on GCP with ASYNC allocator |
#4918 | [BUG] databricks nightly build failed |
#4826 | [BUG] cache_test failures when testing with 128-bit decimal |
#4855 | [BUG] Shim tests in sql-plugin module are not running |
#4487 | [BUG] regexp_find hangs with some patterns |
#4486 | [BUG] Regular expressions with hex digits not working as expected |
#4879 | [BUG] [SPARK-38237][SQL] ClusteredDistribution clustering keys break build with wrong arguments |
#4883 | [BUG] row-based_udf_test.py::test_hive_empty_* fail nightly tests |
#4876 | [BUG] Nightly build failed on Databricks with "pip: No such file or directory" |
#4739 | [BUG] Plugin will crash with query > 100 columns on pascal GPU |
#4840 | [BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists |
#4841 | [BUG] test_compress_write_round_trip failed on Spark 3.3 |
#4668 | [FEA][Audit] - [SPARK-37750][SQL] ANSI mode: optionally return null result if element not exists in array/map |
#3971 | [BUG] udf-examples dependencies are incorrect |
#4022 | [BUG] Ensure shims.v2.ParquetCachedBatchSerializer and similar classes are at most package-private |
#4526 | [BUG] Short circuit AND/OR in ANSI mode |
#4787 | [BUG] Dataproc notebook IT test failure - NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes |
#4704 | [BUG] Update the premerge and nightly tests after moving the UDF example to external repository |
#4795 | [BUG] Read ORC does not ignoreCorruptFiles |
#4802 | [BUG] GPU CSV read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#4803 | [BUG] GPU JSON read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#1986 | [BUG] CSV reading null inconsistent between spark.rapids.sql.format.csv.enabled=true&false |
#126 | [BUG] CSV parsing large number values overflow |
#4759 | [BUG] Profiling tool can miss datasources when they are GPU reads |
#4798 | [BUG] Integration test builds failing with worker_id not found |
#4727 | [BUG] Read Parquet does not ignoreCorruptFiles |
#4744 | [BUG] test_groupby_std_variance_partial_replace_fallback failed |
#4761 | [BUG] test_simple_partitioned_read failed on Spark 3.3 |
#2071 | [BUG] parsing invalid boolean CSV values return true instead of null |
#4749 | [BUG] test_write_empty_parquet_round_trip failed |
#4730 | [BUG] python UDF tests are leaking |
#4290 | [BUG] Investigate q32 and q67 for decimals potential regression |
#4409 | [BUG] Possible race condition in regular expression support for octal digits |
#4728 | [BUG] test_mixed_compress_read orc_test.py failures |
#4736 | [BUG] buildall --profile=321 fails on missing spark301 rapids-4-spark-sql dependency |
#4702 | [BUG] cache_test.py failed w/ cache.serializer in spark 3.3.0 |
#4031 | [BUG] Spark 3.3.0 test failure: NoSuchMethodError org.apache.orc.TypeDescription.getAttributeValue |
#4664 | [BUG] MortgageAdaptiveSparkSuite failed with duplicate buffer exception |
#4564 | [BUG] map_test ansi failed in spark330 |
#119 | [BUG] LIKE does not work if null chars are in the string |
#124 | [BUG] CSV/JSON Parsing some float values results in overflow |
#4045 | [BUG] q93 failed in this week's NDS runs |
#4488 | [BUG] isCastingStringToNegDecimalScaleSupported seems set wrong for some Spark versions |
#5251 | Update 22.04 changelog to latest [skip ci] |
#5232 | Fix issue in GpuArrayExists where a parent view outlived the child |
#5239 | Fix tools depending on the common jar |
#5205 | Update 22.04 changelog to latest [skip ci] |
#5190 | Fix column->row conversion GPU check: |
#5184 | Fix CPU fallback for Map lookup |
#5191 | Update version-def to use released cudfjni 22.04.0 [skip ci] |
#5167 | Update cudfjni version to released 22.04.0 |
#5169 | Terminate test earlier if pytest ENV issue [skip ci] |
#5160 | Fix approximate percentile reduction UnsupportedOperationException |
#5165 | Update Databricks 10.4 for changes to the QueryStageExec and ClusteredDistribution |
#4997 | Update docs for the 22.04 release[skip ci] |
#5146 | Support env var INTEGRATION_TEST_VERSION to override shim version |
#5103 | Init 22.04 changelog [skip ci] |
#5122 | Disable GPU accelerated row-column transpose for Pascal GPUs: |
#5127 | GpuCast.hasSideEffects now checks to see if the child expression has side-effects |
#5118 | On task failure catch some CUDA exceptions and kill executor |
#5069 | Update for the public release [skip ci] |
#5097 | Implement hasSideEffects for GpuGetArrayItem, GpuElementAt, GpuGetMapValue, GpuUnaryMinus, and GpuAbs |
#5079 | Disable spark snapshot shims pre-merge build in 22.04 |
#5094 | Fix profiling tool reading collectionAccumulator |
#5078 | Disable JSON and CSV floating-point reads by default |
#4961 | Support approx_percentile in reduction context |
#5062 | Update Spark 2.x explain API with changes in 22.04 |
#5066 | Add getOrcSchemaString for OrcShims |
#5030 | Fix regression from 21.12 where udfs defined in repl no longer worked |
#5051 | Revert "Replace ParquetFileReader.readFooter with open() and getFooter " |
#5052 | Work around incompatibility between Databricks Delta loads and GpuRegExpExtract |
#4972 | Add support for ORC forced positional evolution |
#5042 | Implement hasSideEffects for GpuSequence |
#5040 | Fix missing imports for 321db shim |
#5033 | Removed limit from the test |
#4938 | Improve compatibility when reading timestamps from JSON and CSV sources |
#5026 | Update RoCE doc URL [skip ci] |
#4976 | Replace ParquetFileReader.readFooter with open() and getFooter |
#4989 | Use conf.useCompression config to decide if we should be compressing the cache |
#4956 | Add avro reader support |
#5009 | Remove references of shims folder in docs [skip ci] |
#5004 | Add ClouderaShimVersion to unshimmed files |
#4971 | Fall back to the CPU for non-zero scale on Ceil or Floor functions |
#4996 | Fix collect_set on struct type |
#4998 | Added the id back for struct children to make them unique |
#4995 | Include 321db shim in distribution build [skip ci] |
#4981 | Update doc for CSV reading interval |
#4973 | Implement support for ArrayExists expression |
#4988 | Remove support for Spark 3.0.x |
#4955 | Add UDT support to ParquetCachedBatchSerializer (CPU) |
#4994 | Add databricks 10.4 build in pre-merge |
#4990 | Remove 30X permerge support for version 22.04 and above [skip ci] |
#4958 | Add independent mvn verify check [skip ci] |
#4933 | Set OrcConf.INCLUDE_COLUMNS for ORC reading |
#4944 | Support for non-string key-types for GetMapValue and element_at() |
#4974 | Add shim for Databricks 10.4 |
#4907 | Add markdown check action |
#4977 | Add missing 314 to buildall script |
#4927 | Support reading ANSI day time interval type from CSV source |
#4965 | Documentation: add example python api call for ExplainPlan.explainPotentialGpuPlan [skip ci] |
#4957 | Document agg pushdown on ORC file limitation [skip ci] |
#4946 | Support predictors on ANSI day time interval type |
#4952 | Have a fixed GPU memory size for integration tests |
#4954 | Fix of failing to read parquet files after writing the hidden file metadata in |
#4953 | Add Decimal 128 as a supported type in partition by for databricks running window |
#4941 | Use new list reduction API to improve performance |
#4926 | Support DayTimeIntervalType in ParquetCachedBatchSerializer |
#4947 | Fallback to ARENA if ASYNC configured and driver < 11.5.0 |
#4934 | Replace MetadataAttribute with FileSourceMetadataAttribute to follow the update in Spark for 3.3.0+ |
#4942 | Fix window rank integration tests on |
#4928 | Disable regular expressions on GPU by default |
#4923 | Support GpuScalarSubquery on nested types |
#4924 | Implement percent_rank() on GPU |
#4853 | Improve date support in JSON and CSV readers |
#4930 | Add in support for sorting arrays with structs in sort_array |
#4861 | Add Apache Spark 3.1.4-SNAPSHOT Shims |
#4925 | Remove unused Spark322PlusShims |
#4921 | Add DatabricksShimVersion to unshimmed class list |
#4917 | Default some configs to protect against cluster settings in integration tests |
#4922 | Add support for decimal 128 for db and spark 320+ |
#4919 | Case-insensitive PR title check [skip ci] |
#4796 | Implement ExistenceJoin Iterator using an auxiliary left semijoin |
#4857 | Transition to v2 shims [Databricks] |
#4899 | Fixed Decimal 128 bug in ParquetCachedBatchSerializer |
#4810 | Support ANSI intervals to/from Parquet |
#4909 | Make ARENA the default allocator for 22.04 |
#4856 | Enable shim tests in sql-plugin module |
#4880 | Bump hadoop-client dependency to 3.1.4 |
#4825 | Initial support for reading decimal types from JSON and CSV |
#4859 | Fallback to CPU when Spark pushes down Aggregates (Min/Max/Count) for ORC |
#4872 | Speed up copying decimal column from parquet buffer to GPU buffer |
#4904 | Relocate Hive UDF Classes |
#4871 | Minor changes to print revision differences when building shims |
#4882 | Disable write/read Parquet when Parquet field IDs are used |
#4858 | Support non-literal index for GpuElementAt and GpuGetArrayItem |
#4875 | Support running GetArrayStructFields on GPU |
#4885 | Enable fuzz testing for Regular Expression repetitions and move remaining edge cases to CPU |
#4869 | Support for hexadecimal digits in regular expressions on the GPU |
#4854 | Avoid regexp_cost with stringSplit on the GPU using transpilation |
#4888 | Clean up leak detection code |
#4901 | fix a broken link in CONTRIBUTING.md[skip ci] |
#4891 | update getting started doc because aws-emr 6.5.0 released[skip ci] |
#4881 | Fix compilation error caused by ClusteredDistribution parameters |
#4890 | Integration-test tests jar for hive UDF tests |
#4878 | Set conda/mamba default to Python version to 3.8 [skip ci] |
#4874 | Fix spark-tests syntax issue [skip ci] |
#4850 | Also check cuda runtime version when using the ASYNC allocator |
#4851 | Add worker ID to temporary table names in tests |
#4847 | Fix test_compress_write_round_trip failure on Spark 3.3 |
#4848 | Profile tool: fix printing of task failed reason |
#4636 | Support str_to_map |
#4835 | Trim parquet_write_test to reduce integration test runtime |
#4819 | Throw exception if casting from double to datetime |
#4838 | Trim cache tests to improve integration test time |
#4839 | Optionally return null if element not exists map/array |
#4822 | Push decimal workarounds to cuDF |
#4619 | Move the udf-examples module to the external repository spark-rapids-examples |
#4844 | Update spark313 dep to released one |
#4827 | Make InternalExclusiveModeGpuDiscoveryPlugin and ExplainPlanImpl as protected class. |
#4836 | Support WindowExec partitioning by Decimal 128 on the GPU |
#4760 | Short circuit AND/OR in ANSI mode |
#4829 | Make bloopInstall version configurable in buildall |
#4823 | Reduce redundancy of decimal testing |
#4715 | Patterns such (3?)+ should now fall back to CPU |
#4809 | Add ignoreCorruptFiles for ORC readers |
#4790 | Improve JSON and CSV parsing of integer values |
#4812 | Default integration test configs to allow negative decimal scale |
#4805 | Avoid output cast by using unsigned type output for GpuExtractChunk32 |
#4804 | Profiling tool can miss datasources when they are GPU reads |
#4797 | Do not check for metadata during schema comparison |
#4785 | Support casting Map to String |
#4794 | Decimal-128 support for mod and pmod |
#4799 | Fix failure to generate worker_id when xdist is not present |
#4742 | Add ignoreCorruptFiles feature for Parquet reader |
#4792 | Ensure GpuM2 merge aggregation does not produce a null mean or m2 |
#4770 | Improve columnarCopy for HostColumnarToGpu |
#4776 | Improve aggregation performance of average on DECIMAL128 columns |
#4786 | Add shims to compare ORC TypeDescription |
#4780 | Improve JSON and CSV support for boolean values |
#4778 | Decrease chance of random collisions in test temporary paths |
#4782 | Check in host leak detection code |
#4781 | Add Spark properties table to profiling tool output |
#4714 | Add regular expression support to string_split |
#4754 | Close SpillableBatch to avoid leaks |
#4758 | Fix merge conflict with branch-22.02 [skip ci] |
#4694 | Add clarifications and details to integration-tests README [skip ci] |
#4740 | Enable regular expressions on GPU by default |
#4735 | Re-enables partial regex support for octal digits on the GPU |
#4737 | Check for a null compression codec when creating ORC OutStream |
#4738 | Change resume-from to aggregator in buildall [skip ci] |
#4698 | Add tests for few json options |
#4731 | Trim join tests to improve runtime of tests |
#4732 | Fix failing serializer tests on Spark 3.3.0 |
#4709 | Update centos 8 dockerfile to handle EOL issue [skip ci] |
#4724 | Debug dump to Parquet support for DECIMAL128 columns |
#4688 | Optimize DECIMAL128 sum aggregations |
#4692 | Add FAQ entry to discuss executor task concurrency configuration [skip ci] |
#4588 | Optimize semaphore acquisition in GpuShuffledHashJoinExec |
#4697 | Add preliminary test and test framework changes for ExistanceJoin |
#4716 | GpuStringSplit should return an array on not-null elements |
#4611 | Support BitLength and OctetLength |
#4408 | Use the ORC version that corresponds to the Spark version |
#4686 | Fall back to CPU for queries referencing hidden metadata columns |
#4669 | Prevent deadlock between RapidsBufferStore and RapidsBufferBase on close |
#4707 | Fix auto merge conflict 4705 [skip ci] |
#4690 | Fix map_test ANSI failure in Spark 3.3.0 |
#4681 | Reimplement check for non-regexp strings using RegexParser |
#4683 | Fix documentation link, clarify documentation [skip ci] |
#4677 | Make Collect, first and last as deterministic aggregate functions for Spark-3.3 |
#4682 | Enable test for LIKE with embedded null character |
#4673 | Allow GpuWindowExec to partition on structs |
#4637 | Improve support for reading CSV and JSON floating-point values |
#4629 | Remove shims module |
#4648 | Append new authorized user to blossom-ci safelist |
#4623 | Fallback to CPU when aggregate push down used for parquet |
#4606 | Set default RMM pool to ASYNC for cuda 11.2+ |
#4531 | Use libcudf mixed joins for conditional hash semi and anti joins |
#4624 | Enable integration test results report on Jenkins [skip ci] |
#4597 | Update plugin version to 22.04.0-SNAPSHOT |
#4592 | Adds SQL function HYPOT using the GPU |
#4504 | Implement AST-based regular expression fuzz tests |
#4560 | Make shims.v2.ParquetCachedBatchSerializer as protected |
#4305 | [FEA] write nvidia tool wrappers to allow old YARN versions to work with MIG |
#4410 | [FEA] ReplicateRows - Support ReplicateRows for decimal 128 type |
#4360 | [FEA] Add explain api for Spark 2.X |
#3541 | [FEA] Support max on single-level struct in aggregation context |
#4238 | [FEA] Add a Spark 3.X Explain only mode to the plugin |
#3952 | [Audit] [FEA][SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1 |
#4412 | [FEA] Improve support for \A, \Z, and \z in regular expressions |
#3979 | [FEA] Improvements for CPU(Row) based UDF |
#4467 | [FEA] Add support for regular expression with repeated digits (\d+ , \d* , \d? ) |
#4439 | [FEA] Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#3512 | [FEA] Support org.apache.spark.sql.catalyst.expressions.Sequence |
#3475 | [FEA] Spark 3.2.0 reads Parquet unsigned int64(UINT64) as Decimal(20,0) but CUDF does not support it |
#4091 | [FEA] regexp_replace: Improve support for ^ and $ |
#4104 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ReplicateRows |
#4027 | [FEA] Support SubqueryBroadcast on GPU to enable exchange reuse during DPP |
#4284 | [FEA] Support idx = 0 in GpuRegExpExtract |
#4002 | [FEA] Implement regexp_extract on GPU |
#3221 | [FEA] Support GpuFirst and GpuLast on nested types under reduction aggregations |
#3944 | [FEA] Full support for sum with overflow on Decimal 128 |
#4028 | [FEA] support GpuCast from non-nested ArrayType to StringType |
#3250 | [FEA] Make CreateMap duplicate key handling compatible with Spark and enable CreateMap by default |
#4170 | [FEA] Make regular expression behavior with $ and \r consistent with CPU |
#4001 | [FEA] Add regexp support to regexp_replace |
#3962 | [FEA] Support null characters in regular expressions in RLIKE |
#3797 | [FEA] Make RLike support consistent with Apache Spark |
#4392 | [FEA] could the parquet scan code avoid acquiring the semaphore for an empty batch? |
#679 | [FEA] move some deserialization code out of the scope of the gpu-semaphore to increase cpu concurrent |
#4350 | [FEA] Optimize the all-true and all-false cases in GPU If and CaseWhen |
#4309 | [FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition |
#4395 | [FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator |
#4134 | [FEA] Allow EliminateJoinToEmptyRelation in GpuBroadcastExchangeExec |
#4189 | [FEA] understand why between is so expensive |
#4725 | [DOC] Broken links in guide doc |
#4675 | [BUG] Jenkins integration build timed out at 10 hours |
#4665 | [BUG] Spark321Shims.getParquetFilters failed with NoSuchMethodError |
#4635 | [BUG] nvidia-smi wrapper script ignores ENABLE_NON_MIG_GPUS=1 on a heterogeneous multi-GPU machine |
#4500 | [BUG] Build failures against Spark 3.2.1 rc1 and make 3.2.1 non snapshot |
#4631 | [BUG] Release build with mvn option -P source-javadoc FAILED |
#4625 | [BUG] NDS query 5 fails with AdaptiveSparkPlanExec assertion |
#4632 | [BUG] Build failing for Spark 3.3.0 due to deprecated method warnings |
#4599 | [BUG] test_group_apply_udf and test_group_apply_udf_more_types hangs on Databricks 9.1 |
#4600 | [BUG] crash if we have a decimal128 in a struct in an array |
#4581 | [BUG] Build error "GpuOverrides.scala:924: wrong number of arguments" on DB9.1.x spark-3.1.2 |
#4593 | [BUG] dup GpuHashJoin.diff case-folding issue |
#4559 | [BUG] regexp_replace with replacement string containing \ can produce incorrect results |
#4503 | [BUG] regexp_replace with back references produces incorrect results on GPU |
#4567 | [BUG] Profile tool hangs in compare mode |
#4315 | [BUG] test_hash_reduction_decimal_overflow_sum[30] failed OOM in integration tests |
#4551 | [BUG] protobuf-java version changed to 3.x |
#4499 | [BUG]GpuSequence blows up when nulls exist in any of the inputs (start, stop, step) |
#4454 | [BUG] Shade warnings when building the tools artifact |
#4541 | [BUG] Column vector leak in conditionals_test.py |
#4514 | [BUG] test_hash_reduction_pivot_without_nans failed |
#4521 | [BUG] Inconsistencies in handling of newline characters and string and line anchors |
#4548 | [BUG] ai.rapids.cudf.CudaException: an illegal instruction was encountered in databricks 9.1 |
#4475 | [BUG] \D and \W match newline in Spark but not in cuDF |
#1866 | [BUG] GpuFileFormatWriter does not close the data writer |
#4524 | [BUG] RegExp transpiler fails to detect some choice expressions that cuDF cannot compile |
#3226 | [BUG]OOM happened when do cube operations |
#2504 | [BUG] OOM when running NDS queries with UCX and GDS |
#4273 | [BUG] Rounding past the size that can be stored in a type produces incorrect results |
#4060 | [BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently |
#4039 | [BUG] Spark 3.3.0 IT Array test failures |
#3849 | [BUG] In ANSI mode we can fail in cases Spark would not due to conditionals |
#4445 | [BUG] mvn clean prints an error message on a clean dir |
#4421 | [BUG] the driver is trying to load CUDA with latest 22.02 |
#4455 | [BUG] join_test.py::test_struct_self_join[IGNORE_ORDER({'local': True})] failed in spark330 |
#4442 | [BUG] mvn build FAILED with option -P noSnapshotsWithDatabricks |
#4281 | [BUG] q9 regression between 21.10 and 21.12 |
#4280 | [BUG] q88 regression between 21.10 and 21.12 |
#4422 | [BUG] Host column vectors are being leaked during tests |
#4446 | [BUG] GpuCast crashes when casting from Array with unsupportable child type |
#4432 | [BUG] nightly build 3.3.0 failed: HashClusteredDistribution is not a member of org.apache.spark.sql.catalyst.plans.physical |
#4443 | [BUG] SPARK-37705 breaks parquet filters from Spark 3.3.0 and Spark 3.2.2 onwards |
#4316 | [BUG] Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly intermittently |
#4378 | [BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+ |
#4423 | [BUG] Build is failing due to FileScanRDD changes in Spark 3.3.0-SNAPSHOT |
#4401 | [BUG]array_test.py::test_array_contains failures |
#4403 | [BUG] NDS query 72 logs codegen fallback exception and produces incorrect results |
#4386 | [BUG] conditionals_test.py FAILED with side_effects_cast[Integer/Long] on Databricks 9.1 Runtime |
#3934 | [BUG] Dependencies of published integration tests jar are missing |
#4341 | [BUG] GpuCast.scala:nnn warning: discarding unmoored doc comment |
#4356 | [BUG] nightly spark303 deploy pulling spark301 aggregator |
#4347 | [BUG] Dist jar pom lists aggregator jar as dependency |
#4176 | [BUG] ParseDateTimeSuite UT failed |
#4292 | [BUG] no meaningful message is surfaced to maven when binary-dedupe fails |
#4351 | [BUG] Tests FAILED On SPARK-3.2.0, com.nvidia.spark.rapids.SerializedTableColumn cannot be cast to com.nvidia.spark.rapids.GpuColumnVector |
#4346 | [BUG] q73 decimal was twice as slow in weekly results |
#4334 | [BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 |
#4339 | The parameter dataType is not necessary in resolveColumnVector method. |
#4275 | [BUG] Row-based Hive UDF will fail if arguments contain a foldable expression. |
#4229 | [BUG] regexp_replace [^a] has different behavior between CPU and GPU for multiline strings |
#4294 | [BUG] parquet_write_test.py::test_ts_write_fails_datetime_exception failed in spark 3.1.1 and 3.1.2 |
#4205 | [BUG] Get different results when casting from timestamp to string |
#4277 | [BUG] cudf_udf nightly cudf import rmm failed |
#4246 | [BUG] Regression in CastOpSuite due to cuDF change in parsing NaN |
#4243 | [BUG] test_regexp_replace_null_pattern_fallback[ALLOW_NON_GPU(ProjectExec,RegExpReplace)] failed in databricks |
#4244 | [BUG] Cast from string to float using hand-picked values failed |
#4227 | [BUG] RAPIDS Shuffle Manager doesn't fallback given encryption settings |
#3374 | [BUG] minor deprecation warnings in a 3.2 shim build |
#3613 | [BUG] release312db profile pulls in 311until320-apache |
#4213 | [BUG] unused method with a misleading outdated comment in ShimLoader |
#3609 | [BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging |
#4127 | [BUG] CUDF 22.02 nightly test failure |
#4773 | Update 22.02 changelog to latest [skip ci] |
#4771 | revert cudf api links from legacy to stable[skip ci] |
#4767 | Update 22.02 changelog to latest [skip ci] |
#4750 | Updated doc for decimal support |
#4757 | Update qualification tool to remove DECIMAL 128 as potential problem |
#4755 | Fix databricks doc for limitations.[skip ci] |
#4751 | Fix broken hyperlinks in documentation [skip ci] |
#4706 | Update 22.02 changelog to latest [skip ci] |
#4700 | Update cudfjni version to released 22.02.0 |
#4701 | Decrease nighlty tests upper limitation to 7 [skip ci] |
#4639 | Update changelog for 22.02 and archive info of some older releases [skip ci] |
#4572 | Add download page for 22.02 [skip ci] |
#4672 | Revert "Disable 311cdh build due to missing dependency (#4659)" |
#4662 | Update the deploy script [skip ci] |
#4657 | Upmerge spark2 directory to the latest 22.02 changes |
#4659 | Disable 311cdh build by default because of a missing dependency |
#4508 | Fix Spark 3.2.1 build failures and make it non-snapshot |
#4652 | Remove non-deterministic test order in nightly [skip ci] |
#4643 | Add profile release301 when mvn help:evaluate |
#4630 | Fix the incomplete capture of SubqueryBroadcast |
#4633 | Suppress newTaskTempFile method warnings for Spark 3.3.0 build |
#4618 | [DB31x] Pick the correct Python runner for flatmap-group Pandas UDF |
#4622 | Fallback to CPU when encoding is not supported for JSON reader |
#4470 | Add in HashPartitioning support for decimal 128 |
#4535 | Revert "Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 (#4471)" |
#4583 | Avoid unapply on PromotePrecision |
#4573 | Correct version from 21.12 to 22.02[skip ci] |
#4575 | Correct and update links in UDF doc[skip ci] |
#4501 | Switch and/or to use new cudf binops to improve performance |
#4594 | Resolve case-folding issue [skip ci] |
#4585 | Spark2 module upmerge, deploy script, and updates for Jenkins |
#4589 | Increase premerge databricks IDLE_TIMEOUT to 4 hours [skip ci] |
#4485 | Add json reader support |
#4556 | regexp_replace with back-references should fall back to CPU |
#4569 | Fix infinite loop with Profiling tool compare mode and app with no sql ids |
#4529 | Add support for Spark 2.x Explain Api |
#4577 | Revert "Fix CVE-2021-22569 (#4545)" |
#4520 | GpuSequence refactor |
#4570 | A few quick fixes to try to reduce max memory usage in the tests |
#4477 | Use libcudf mixed joins for conditional hash joins |
#4566 | remove scala-library from combined tools jar |
#4552 | Fix resource leak in GpuCaseWhen |
#4553 | Reenable test_hash_reduction_pivot_without_nans |
#4530 | Fix correctness issues in regexp and add \r and \n to fuzz tests |
#4549 | Fix typos in integration tests README [skip ci] |
#4545 | Fix CVE-2021-22569 |
#4543 | Enable auto-merge from branch-22.02 to branch-22.04 [skip ci] |
#4540 | Remove user kuhushukla |
#4434 | Support max on single-level struct in aggregation context |
#4534 | Temporarily disable integration test - test_hash_reduction_pivot_without_nans |
#4322 | Add an explain only mode to the plugin |
#4497 | Make better use of pinned memory pool |
#4512 | remove hadoop version requirement[skip ci] |
#4527 | Fall back to CPU for regular expressions containing \D or \W |
#4525 | Properly close data writer in GpuFileFormatWriter |
#4502 | Removed the redundant test for element_at and fixed the failing one |
#4523 | Add more integration tests for decimal 128 |
#3762 | Call the right method to convert table from row major <=> col major |
#4482 | Simplified the construction of zero scalar in GpuUnaryMinus |
#4510 | Update copyright in NOTICE [skip ci] |
#4484 | Update GpuFileFormatWriter to stay in sync with recent Spark changes, but still not support writing Hive bucketed table on GPU. |
#4492 | Fall back to CPU for regular expressions containing hex digits |
#4495 | Enable approx_percentile by default |
#4420 | Fix up incorrect results of rounding past the max digits of data type |
#4483 | Update test case of reading nested unsigned parquet file |
#4490 | Remove warning about RMM default allocator |
#4461 | [Audit] Add bucketed scan info in query plan of data source v1 |
#4489 | Add arrays of decimal128 to join tests |
#4476 | Don't acquire the semaphore for empty input while scanning |
#4424 | Improve support for regular expression string anchors \A , \Z , and \z |
#4491 | Skip the test for spark versions 3.1.1, 3.1.2 and 3.2.0 only |
#4459 | Use merge sort for struct types in non-key columns |
#4494 | Append new authorized user to blossom-ci whitelist [skip ci] |
#4400 | Enable approx percentile tests |
#4471 | Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 |
#4462 | Rename DECIMAL_128_FULL and rework usage of TypeSig.gpuNumeric |
#4479 | Change signoff check image to slim-buster [skip ci] |
#4464 | Throw SparkArrayIndexOutOfBoundsException for Spark 3.3.0+ |
#4469 | Support repetition of \d and \D in regexp functions |
#4472 | Modify docs for 22.02 to address issue-4319[skip ci] |
#4440 | Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#4376 | Add sequence support |
#4460 | Abstract the text based PartitionReader |
#4383 | Fix correctness issue with CASE WHEN with expressions that have side-effects |
#4465 | Refactor for shims 320+ |
#4463 | Avoid replacing a hash join if build side is unsupported by the join type |
#4456 | Fix build issues: 1 clean non-exists target dirs; 2 remove duplicated plugin |
#4416 | Unshim join execs |
#4172 | Support String to Decimal 128 |
#4458 | Exclude some metadata operators when checking GPU replacement |
#4451 | Some metrics improvements and timeline reporting |
#4435 | Disable add profile src execution by default to make the build log clean |
#4436 | Print error log to stderr output |
#4155 | Add partial support for line begin and end anchors in regexp_replace |
#4428 | Exhaustively iterate ColumnarToRow iterator to avoid leaks |
#4430 | update pca example link in ml-integration.md[skip ci] |
#4452 | Limit parallelism of nightly tests [skip ci] |
#4449 | Add recursive type checking and fallback tests for casting array with unsupported element types to string |
#4437 | Change logInfo to logWarning |
#4447 | Fix 330 build error and add 322 shims layer |
#4417 | Fix an Intellij debug issue |
#4431 | Add DateType support for AST expressions |
#4433 | Import the right pandas from conda [skip ci] |
#4419 | Import the right pandas from conda |
#4427 | Update getFileScanRDD shim for recent changes in Spark 3.3.0 |
#4397 | Ignore cufile.log |
#4388 | Add support for ReplicateRows |
#4399 | Update docs for Profiling and Qualification tool to change wording |
#4407 | Fix GpuSubqueryBroadcast on multi-fields relation |
#4396 | GpuShuffleCoalesceIterator acquire semaphore after host concat |
#4361 | Accommodate altered semantics of cudf::lists::contains() |
#4394 | Use correct column name in GpuIf test |
#4385 | Add missing GpuSubqueryBroadcast replacement rule for spark31x |
#4387 | Fix auto merge conflict 4384[skip ci] |
#4374 | Fix the IT module depends on the tests module |
#4365 | Not publishing integration_tests jar to Maven Central [skip ci] |
#4358 | Update GpuIf to support expressions with side effects |
#4382 | Remove unused scallop dependency from integration_tests |
#4364 | Replace Scala document with Scala comment for inner functions |
#4373 | Add pytest tags for nightly test parallel run [skip ci] |
#4150 | Support GpuSubqueryBroadcast for DPP |
#4372 | Move casting to string tests from array_test.py and struct_test.py to cast_test.py |
#4371 | Fix typo in skipTestsFor330 calculation [skip ci] |
#4355 | Dedicated deploy-file with reduced pom in nightly build [skip ci] |
#4352 | Revert "Ignore failing string to timestamp tests temporarily (#4197)" |
#4359 | Audit - SPARK-37268 - Remove unused variable in GpuFileScanRDD [Databricks] |
#4327 | Print meaningful message when calling scripts in maven |
#4354 | Fix regression in AQE optimizations |
#4343 | Fix issue with binding to hash agg columns with computation |
#4285 | Add support for regexp_extract on the GPU |
#4349 | Fix PYTHONPATH in pre-merge |
#4269 | The option for the nightly script not deploying jars [skip ci] |
#4335 | Fix the issue of exporting Column RDD |
#4336 | Split expensive pytest files in cases level [skip ci] |
#4328 | Change the explanation of why the operator will not work on GPU |
#4338 | Use scala Int.box instead of Integer constructors |
#4340 | Remove the unnecessary parameter dataType in resolveColumnVector method |
#4256 | Allow returning an EmptyHashedRelation when a broadcast result is empty |
#4333 | Add tests about writing empty table to ORC/PAQUET |
#4337 | Support GpuFirst and GpuLast on nested types under reduction aggregations |
#4331 | Fix parquet options builder calls |
#4310 | Fix typo in shim class name |
#4326 | Fix 4315 decrease concurrentGpuTasks to avoid sum test OOM |
#4266 | Check revisions for all shim jars while build all |
#4282 | Use data type to create an inspector for a foldable GPU expression. |
#3144 | Optimize AQE with Spark 3.2+ to avoid redundant transitions |
#4317 | [BUG] Update nightly test script to dynamically set mem_fraction [skip ci] |
#4206 | Porting GpuRowToColumnar converters to InternalColumnarRDDConverter |
#4272 | Full support for SUM overflow detection on decimal |
#4255 | Make regexp pattern [^a] consistent with Spark for multiline strings |
#4306 | Revert commonizing the int96ParquetRebase* functions |
#4299 | Fix auto merge conflict 4298 [skip ci] |
#4159 | Optimize sample perf |
#4235 | Commonize v2 shim |
#4274 | Add tests for timestamps that overflowed before. |
#4271 | Skip test_regexp_replace_null_pattern_fallback on Spark 3.1.1 and later |
#4278 | Use mamba for cudf conda install [skip ci] |
#4270 | Document exponent differences when casting floating point to string [skip ci] |
#4268 | Fix merge conflict with branch-21.12 |
#4093 | Add tests for regexp() and regexp_like() |
#4259 | fix regression in cast from string to float that caused signed NaN to be considered valid |
#4241 | fix bug in parsing regex character classes that start with ^ and contain an unescaped ] |
#4224 | Support row-based Hive UDFs |
#4221 | GpuCast from ArrayType to StringType |
#4007 | Implement duplicate key handling for GpuCreateMap |
#4251 | Skip test_regexp_replace_null_pattern_fallback on Databricks |
#4247 | Disable failing CastOpSuite test |
#4239 | Make EOL anchor behavior match CPU for strings ending with newline |
#4153 | Regexp: Only transpile once per expression rather than once per batch |
#4230 | Change to build tools module with all the versions by default |
#4223 | Fixes a minor deprecation warning |
#4215 | Rebalance testing load |
#4214 | Fix pre_merge ci_2 [skip ci] |
#4212 | Remove an unused method with its outdated comment |
#4211 | Update test_floor_ceil_overflow to be more lenient on exception type |
#4203 | Move all the GpuShuffleExchangeExec shim v2 classes to org.apache.spark |
#4193 | Rename 311until320-apache to 311until320-noncdh |
#4197 | Ignore failing string to timestamp tests temporarily |
#4160 | Fix merge issues for branch 22.02 |
#4081 | Convert String to DecimalType without casting to FloatType |
#4132 | Fix auto merge conflict 4131 [skip ci] |
#4099 | [REVIEW] Init version 22.02.0 |
#4113 | Fix pre-merge CI 2 conditions [skip ci] |
Changelog of older releases can be found at docs/archives