v1.3.0-preview
Pre-release
Pre-release
What's Changed
- VL Make velox writer queue size configurable @yikf #6341
- VL Remove useless ctx variable @gaoyangxiaozhu #6348
- [1632]CHDaily 20240706) @kyligence-git #6359
- VL fix build bundle package @zhouyuan #6364
- VL Fix process_setup_alinux3 arrow CMakeLists.txt path @liujiayi771 #6363
- VL Daily 2024_07_08) @GlutenPerfBot #6366
- [6262]CHJson input format ignore key case @KevinyhZou #6263
- [6285]VL Add debian10 vcpkg depends @wenwj0 #6286
- [CELEBORN] CelebornShuffleManager#stop should stop non-null _vanillaCelebornShuffleManager @SteNicholas #6371
- VL Update ubuntu docker to use cmake 3.28 @boneanxs #6373
- [6304]CHSupport array_join @KevinyhZou #6305
- VL Daily 2024_07_09) @GlutenPerfBot #6376
- [6378]CH Support delta count optimizer for the MergeTree format @zzcclp #6379
- [6345]CH Deprecate SCALAR_FUNCTIONS in SerializedPlanParser @lgbo-ustc #6347
- [TEST] Use project version rather than Gluten version in Gluten it @ulysses-you #6385
- [6377]CH Support window function
percent_rank
@lgbo-ustc #6386 - VL Minor refactor for ValueStream node construction and usage @Yohahaha #6382
- VL Enable levenshtein function @zhli1142015 #6389
- VL Daily 2024_07_10) @GlutenPerfBot #6384
- [1632]CHDaily 20240710) @kyligence-git #6383
- Test input_file_name, input_file_block_start & input_file_block_length when scan falls back @gaoyangxiaozhu #6318
- [6394]VL Fix the vcpkg package script @weixiuli #6395
- [6288]CH Support BroadcastNestedLoopJoinExe[Part one] @loneylee #6290
- [CELEBORN] Rename CelebornHashBasedColumnarShuffleWriter to CelebornColumnarShuffleWriter @kerwin-zk #6391
- VL Fix E function fallback issue in some condition @gaoyangxiaozhu #6397
- [CI] Fix centos7 failure @marin-ma #6404
- [1632]CHDaily 20240711) @kyligence-git #6399
- [CELEBORN] Add compression for row-based shuffle @kerwin-zk #6380
- VL Daily 2024_07_11) @GlutenPerfBot #6400
- CORE Remove local sort for TopNRowNumber @ulysses-you #6381
- VL Spark assert_true function support @gaoyangxiaozhu #6329
- VL Add schema validation for all operators @zhli1142015 #6406
- CORE Minor code cleanups against fallback tagging @zhztheplayer #6320
- VL Try to find arrow libs from velox bundled path firstly @PHILO-HE #6413
- VL disable tpch benchmarks on comment/merge @zhouyuan #6402
- [UT] Add a tool to validate any unary expression with all its accepted types @PHILO-HE #6392
- CH Fix a source file name typo @zhztheplayer #6412
- VL Fix Pi function fallback issue in some condition @gaoyangxiaozhu #6408
- [CELEBORN] VeloxCelebornColumnarBatchSerializer uses the key and default value of SHUFFLE_COMPRESS to check whether to compress shuffle output @SteNicholas #6414
- VL Quick fix for commit conflicts @zhztheplayer #6418
- [Doc] Update new supported spark functions @gaoyangxiaozhu #6423
- VL Add a test to validate substring_index @boneanxs #6393
- VL Fix shuffle spill triggered by evicting buffers during stop @marin-ma #6422
- VL Enable repeat function @zhli1142015 #6419
- VL Accelerate Arrow compile @jinchengchenghh #6426
- [CI]VL Update docker image for CI @zhouyuan #6401
- VL Daily 2024_07_12) @GlutenPerfBot #6417
- VL Daily 2024_07_13) @GlutenPerfBot #6436
- VL Daily 2024_07_14) @GlutenPerfBot #6441
- VL Set Arrow_SOURCE to AUTO to allow using system arrow libs @PHILO-HE #6325
- [CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle in ClickHouse backend @SteNicholas #6432
- VL Make sure the same thrift lib bundled in arrow build is used for building Velox @zhztheplayer #6431
- CORE Make SparkSession transient in HiveTableScanExecTransformer @yikf #6410
- [6176]CH Add tpcds suite from decimal table schema @loneylee #6369
- VL Move dependencies setup ahead @PHILO-HE #6444
- CH[CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle in ClickHouse backend @SteNicholas #6454
- VL Enable right and anti join in smj @JkSelf #6449
- CH[CELEBORN] CHCelebornColumnarBatchSerializer uses AtomicBoolean to identify whether to call close() to avoid calling close() twice situation @SteNicholas #6455
- [CI]VL Re-enable a build job running on clean dockers weekly @PHILO-HE #6424
- CORE Update LICENSE, NOTICE, LICENSE-binary, NOTICE-binary @weiting-chen #6443
- CORE Change DISCLAIMER to DISCLAIMER-WIP @weiting-chen #6442
- VL RAS: Minor code cleanup for offloading project @zhztheplayer #6452
- VL Add a way to create static build with docker container and gluten-te @zhztheplayer #6457
- [6467]CH Minor Fix Build @baibaichen #6468
- VL Minor improvements and fixes for gluten-it and gluten-te @zhztheplayer #6471
- CORE Fix fallback for spark sequence function with literal array data as input @gaoyangxiaozhu #6433
- VL Fix offload input_file_name assert error @zml1206 #6390
- VL update docker image for cache-native-lib job @yma11 #6466
- [BUILD] Fix unbound variable @zml1206 #6474
- VL Daily 2024_07_16) @GlutenPerfBot #6460
- [6437][BUILD] Fix vcpkg setup-build-dependens.sh for centos @wecharyu #6438
- [6470]CHFix Task not serializable error when inserting mergetree data @zzcclp #6473
- [6425]CH Support day time internval @lgbo-ustc #6456
- VL remove redundant code in parquet datasource to avoid memory leakage PR6430 @liujp #6462
- CORE Spark version function support @gaoyangxiaozhu #6469
- VL Daily 2024_07_17) @GlutenPerfBot #6479
- VL Minor improvements on gluten-it / gluten-te toolchains @zhztheplayer #6476
- CH Support merge MergeTree files @liuneng1994 #6472
- [6463]CHrefactor the code of parsing join parameters @lgbo-ustc #6485
- [1632]CHDaily 20240718) @kyligence-git #6491
- VL Daily 2024_07_18) @GlutenPerfBot #6492
- [6495]VL Fix build issue: --build_arrow=ON wipes --build_type= setting silently @PHILO-HE #6498
- VL RAS: Make default rough cost model exhaustively offload computations @zhztheplayer #6493
- VL Print exception early when raised from ManagedReservationListener#unreserve @zhztheplayer #6504
- VL Fix broken GHA CI cache, add cache for Centos 8 build @zhztheplayer #6497
- CORE Prevent CH backend from referencing arrow-gluten Jars @zhztheplayer #6494
- VL Oops, a minor follow-up to #6497 @zhztheplayer #6516
- [1632]CHDaily 20240719) @kyligence-git #6511
- [6067]VL [Part 3-1] Refactor: Rename VeloxColumnarWriteFilesExec to ColumnarWriteFilesExec @baibaichen #6403
- VL Daily 2024_07_19) @GlutenPerfBot #6512
- [DOC] Update documents @PHILO-HE #6344
- VL Ensure sed pattern can be matched when modifying velox setup scripts @PHILO-HE #6505
- [6463]CH Enable cartesion product @lgbo-ustc #6510
- [6523]VL fix: Remove fix for stringop-overflow warning in alinux3 @majetideepak #6522
- [6501]VL Fix the missing fileReadProperties when constructing a LocalFilesNode @kecookier #6503
- VL[DOC] Add uniffle doc @summaryzb #6533
- [6529] Fix build error on macOS caused by ConfigArrow.cmake @xumingming #6530
- [6509] enable read iceberg table with timestamptz as partitioned column. @j7nhai #6508
- VL Add thread_safe to several VeloxRuntime classes @FelixYBW #6526
- VL Fix weekly build job @PHILO-HE #6543
- [MINOR] Add ep/_ep/ to .gitignore @wForget #6547
- [6534] VL Fix ObjectStore::stores initialized twice issue @xumingming #6549
- [6499]CH Support soft affinity for mergetree @loneylee #6545
- [6535] Make helper scripts executable @xumingming #6536
- VL Daily 2024_07_23) @zhztheplayer #6552
- VL Fix for centos9 build of Gluten @deepashreeraghu #6183
- [MINOR]VL Remove duplicate mvn packages @wForget #6560
- VL Following #6526, minor fixes and improvements @zhztheplayer #6554
- VL Row based sort shuffle implementation @marin-ma #6475
- [6562]VL Decouple BUILD_BENCHMARKS and BUILD_TESTS build options @NEUpanning #6563
- VL Daily 2024_07_24) @GlutenPerfBot #6567
- VL Add config for show velox task metrics when finished @Yohahaha #6573
- [6477]VL Fix occasional dead lock during spilling @zhztheplayer #6515
- VL Move setup-centos7.sh & setup-centos8.sh into Gluten and clean up some script code @PHILO-HE #6559
- VL Daily 2024_07_25) @zhztheplayer #6582
- VL Update to add centos9 for weekly build @deepashreeraghu #6580
- VL Update Velox Version 2024_07_25-1) @zhztheplayer #6584
- VL Enable timestamp in parquet write @JkSelf #6428
- [6067]CH [Part 3-2] Basic support for Native Write in Spark 3.5 @baibaichen #6586
- VL Gluten-it: --data-gen-strategy=once to skip generating data when it already exists @zhztheplayer #6587
- [6589]CH Mergetree supported spark.sql.caseSensitive @loneylee #6592
- [Minor] Move a test from spark-3.2 module to a common test module @PHILO-HE #6585
- [6604]CH Fix mergetree partition with whitespace error @loneylee #6605
- [6195]VLAdd unit tests for udf @NEUpanning #6603
- VL Minor: Remove deprecated GHA jobs @PHILO-HE #6606
- VL Daily 2024_07_26) @GlutenPerfBot #6597
- [1632]CHDaily 20240727) @kyligence-git #6611
- VL Fix std::min params type mismatch in Apple clang 15 @zml1206 #6593
- [6544]CH Support existence join @lgbo-ustc #6548
- VL Improve package scripts @wForget #6569
- VL Enable timestamp and binary type for HLL agg function @zhli1142015 #6619
- VL Expose API
SparkMemoryUtil.dumpMemoryManagerStats(tmm: TaskMemoryManager)
for debugging purpose @zhztheplayer #6617 - VL Row based sort follow-up @marin-ma #6579
- VL Daily 2024_07_29) @zhztheplayer #6616
- VL Daily 2024_07_30) @GlutenPerfBot #6626
- [6583]CH Fix a bug in serializing aggregating keys which are complicated types @lgbo-ustc #6624
- VL Row-based sort shuffle follow-up (minor) @marin-ma #6628
- [MINOR] Reduce unnecessary dependencies @wForget #6608
- VL Enable collect_set, min, max for complex types @zhli1142015 #6629
- VL Spark mask function support @gaoyangxiaozhu #6271
- CH Refactor off heap memory management, clean shuffle write code @liuneng1994 #6558
- [6561]CH Fix incompatiable type exception throw in capture function while processing array literal with
transform
@taiyang-li #6601 - [6632] Bump Celeborn 0.4.2 and 0.5.1 @SteNicholas #6633
- [1632]CHDaily 20240730) @kyligence-git #6640
- VL Daily 2024_07_31) @GlutenPerfBot #6643
- VL Reduce spill in sort-based shuffle @marin-ma #6639
- [6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable @PHILO-HE #6627
- VL Support Sum(Literal)/Count(Literal) with empty input schema @zml1206 #6631
- [6067]CH[MINOR][UT] Pass backends-clickhouse ut in Spark 3.5 @baibaichen #6623
- VL Support row type and fix subfield in filter push-down @rui-mo #6618
- [6610] Update clickhouse.md due to upgrade to clang-18 @lwz9103 #6654
- [6590]CH Support compact mergetree file on s3 @lwz9103 #6591
- VL Daily 2024_08_01) @GlutenPerfBot #6664
- VL Gluten-it: --auto-cluster-resource to automatically set up CPU cores and memory sizes for local cluster @zhztheplayer #6655
- [6600]VL Support date type in window range frame @zml1206 #6653
- CH Fix some test cases too slow @liuneng1994 #6659
- [6656][CELEBORN] Fix CelebornColumnarShuffleWriter assertion failed @exmy #6657
- [1632]CHDaily 20240801) @kyligence-git #6665
- CORE Propagate SQLConf to code block of TaskResources.runUnsafe @zhztheplayer #6658
- [INFRA] Support automatically label new pull requests @ulysses-you #6668
- [6483] Support Uniffle 0.9.0 @SteNicholas #6484
- CH Hotfix a configuration bug in shuffle writer @liuneng1994 #6677
- VL Set default validation log level to WARN @yma11 #6676
- [6483]VL[DOC] Upgrade Uniffle version to 0.9.0 in Velox.md @SteNicholas #6680
- CORE Make collectQueryExecutionFallbackSummary as a public util method @wForget #6679
- [6557]CH Try to replace sort merge join with hash join when cannot offload it @lgbo-ustc #6570
- VL Daily 2024_08_02) @GlutenPerfBot #6684
- [6645]VL Remove VeloxWriteQueue which may introduce deadlock @WangGuangxin #6646
- VL Recover broken memory-trace option spark.gluten.backtrace.allocation @zhztheplayer #6635
- VL Allow specifying maximum batch size for batch resizing @zhztheplayer #6670
- VL Eliminate pre local sort after offload date type range frame window @zml1206 #6667
- [6701]CH fix: Performace regression at 20240802 daily build @baibaichen #6702
- [1632]CHDaily 20240803) @kyligence-git #6700
- CH Support CACHE DATA command for MergeTree table @liuneng1994 #6621
- [6695]CH Introduce shuffleWallTime in CHMetricsApi to calculate the overall shuffle write time @SteNicholas #6696
- [6588]CH Cast columns if necessary before finally writing to ORC/Parquet files during native inserting @taiyang-li #6691
- VL Remove redundant hash function in substrait function validation @jinchengchenghh #6690
- [6656][UNIFFLE] VeloxUniffleColumnarShuffleWriter should send commit for all ColumnBatch with empty rows @SteNicholas #6698
- CH Fix debug building error @taiyang-li #6710
- VL Daily 2024_08_05) @GlutenPerfBot #6708
- VL Fix out-of-date centos 7 image in velox_docker_cache.yml @zhztheplayer #6719
- VL Daily 2024_08_06) @GlutenPerfBot #6717
- CORE Bump version to 1.3.0-SNAPSHOT @PHILO-HE #6607
- [6669]CH Fix diff of cast string to boolean @exmy #6711
- [6531] Minor polish for metrics code @xumingming #6532
- VL Use conf to control C2R occupied memory @XinShuoWang #5952
- [1632]CHDaily 20240806) @kyligence-git #6718
- [6686]CH Disable percent_rank @lgbo-ustc #6687
- VL Add reader process to shuffle benchmark @marin-ma #6682
- VL Minor follow-ups for PRs @zhztheplayer #6693
- VL Allow udf type conversion @marin-ma #6660
- VL Use the scripts dir in the current path as SCRIPTDIR @liujiayi771 #6729
- [6561]CH Fix exception when mapFromArrays accepts its first argument with type Array(Nullable(T)) @taiyang-li #6721
- VL Doc: Update RSS docs @wForget #6692
- [6589]CH Mergetree supported spark.sql.caseSensitive[Part.2] @loneylee #6733
- VL Skip UTF-8 validation in JSON parsing @PHILO-HE #6661
- CH Fix memory config spill_mem_ratio always zero @liuneng1994 #6743
- VL Fix arrow lib conflict on centos-9 @PHILO-HE #6742
- VL Reduce memory waste in sort based shuffle @marin-ma #6727
- CORE Fix schema mismatch between ReadRelNode and LocalFilesNode @jiangjiangtian #6746
- [6736] Phase 1: Use task-shared lock in ManagedReservationListener @zhztheplayer #6741
- [6569][FOLLOWUP]VL Delete unnecessary gcc9 enable of package script @wForget #6730
- VL Hot fix - mistakenly changed debug log @marin-ma #6751
- [6681] CHfix array(decimal32) in CH columnar to row @loudongfeng #6722
- [6705] CORE [Part 1] Avoid adding c2r for ColumnarWriteFilesExec, since it neither output Columnar batch data nor InternalRow @baibaichen #6745
- VL Use Velox's monolithic build @PHILO-HE #6731
- [CELEBORN][FOLLOWUP] Add compression for row-based shuffle @kerwin-zk #6739
- [1632]CHDaily 20240808) @kyligence-git #6755
- VL Daily 08-08 @jinchengchenghh #6752
- VL Daily 2024_08_09) @GlutenPerfBot #6762
- VL Minor class name / package name clean-ups @zhztheplayer #6720
- VL Add a new test case for FlushableHashAggregateRule's coverage @zhztheplayer #6757
- [Minor] Clean up useless code in ParquetFileFormat/OrcFileFormat @PHILO-HE #6663
- [6737]package delta into bundle jar when specify delta profile @dcoliversun #6738
- [6705]]COREVLCH [Part-2] Rework CumnarWriteFilesExec @baibaichen #6761
- VL Fix shuffle spill not reported to spark metric @marin-ma #6740
- [6750]CH Fix optimize error if file mappings not loaded @lwz9103 #6753
- [MINOR]CH Rename package of some extension rules @zml1206 #6747
- [MINOR] update repository first in setup-ubuntu envs @wecharyu #6749
- VL RAS: Renew validator instance for each rule applier call @zhztheplayer #6766
- [1632]CHDaily 20240809) @kyligence-git #6764
- [6388]CH Support function format @taiyang-li #6716
- VL RAS: Add a new built-in cost model that avoids offloading trivial projects if its neighbor nodes fell back @zml1206 #6756
- [6768]CH Clear mixed join contition to avoid uneccessary data copy @lgbo-ustc #6769
- VL Fix High Precision Rounding @ArnavBalyan #6707
- [6705]CH Basic Support Delta write @baibaichen #6767
- VL Add Scala 2.13 support @Preetesh2110 #6326
- VL Daily 2024_08_10) @GlutenPerfBot #6771
- VL Daily 2024_08_11) @GlutenPerfBot #6775
- VL Fall back scan if file scheme is not supported by registered file systems @zhli1142015 #6672
- [6778]CH Enable percent_rank again @lgbo-ustc #6779
- [6724]CH Shuffle writer supports compression level configuration for CompressionCodecFactory @SteNicholas #6725
- [GLUEN-6506]CHFix ORC read wrong timestamp value @KevinyhZou #6507
- VL Update a dockerfile used for CI vcpkg build @PHILO-HE #6781
- [6736]VL Phase 2: Minimize lock scope in ListenableArbitrator @zhztheplayer #6783
- [6674]CH Support sort merge join metrics @SteNicholas #6774
- CORE Following #6745, append some minor code cleanups @zhztheplayer #6788
- VL Daily2024_08_13) @jinchengchenghh #6794
- VL Fix arrow dataset csv scan IncompatibleClassChangeError @jinchengchenghh #6785
- VL Fix parquet write sort spill OOM @jinchengchenghh #6480
- [6148]CORE Simplify JniLibLoader loading mechanism for native libraries @ArnavBalyan #6791
- VL Daily 2024_08_14) @GlutenPerfBot #6821
- VL Enable an integration test case in CI OOM tests @zhztheplayer #6804
- VL Add shuffle writer type to ColumnarExchange display string @marin-ma #6799
- [6768]CH Try to reorder hash join tables based on AQE statistics @lgbo-ustc #6770
- [6600]Fix NPE issue when running window sql @JkSelf #6803
- CORE Remove fixed 1.8 Java compiler version in module gluten-ut-common (#6825) @zhztheplayer #6825
- VL Remove lz4 change in modify_velox.patch @jinchengchenghh #6824
- VL Validate binary expressions with their accepted types @PHILO-HE #6521
- [6819]CH Refactor source from jave iter && make casting happens before materializing @taiyang-li #6830
- VL Enable full functionality of split function @rui-mo #4752
- VL No need to obtain old shrunken memory @boneanxs #6847
- [6834]CORE Remove unused DDL plan that doesn't correspond to Substrait spec @EpsilonPrime #6833
- CORE Remove an unused binary file @zhztheplayer #6838
- [6860]CH Minor refactors on expand operator @taiyang-li #6861
- VL Verify empty2null is offloaded when v1writer fallback @Yohahaha #6859
- VL Update document for split and mask functions @gaoyangxiaozhu #6858
- [1632]CHDaily 20240815) @kyligence-git #6848
- VL Add a docker build job and reuse pre-built arrow libs @PHILO-HE #6826
- VL Daily2024_08_15) @jinchengchenghh #6851
- VL Remove suspend section when spilling Velox task @zhztheplayer #6875
- [6768]CH Try to use multi join on clauses instead of inequal join condition @lgbo-ustc #6787
- [6822]VL Fix wrong maxRowsToInsert and sort time metrics @marin-ma #6832
- VL Fix warning when spark.gluten.sql.columnarToRowMemoryThreshold is not set @zhztheplayer #6866
- VL Fix Arrow ColumnarBatch cannnot revoke rowIterator correctly @jinchengchenghh #6797
- [6768]CH Refactor reordering shuffle hash join tables @lgbo-ustc #6854
- [6819]CH HOTFIX variable shadow in source from jave iter @taiyang-li #6885
- [6879]CH Fix partition value diff when it contains blank spaces @taiyang-li #6880
- [6878]CH Avoid name collisions in naming aggregate result @lgbo-ustc #6886
- [6887]VL Daily 2024_08_16) @GlutenPerfBot #6872
- [6067]CH[MINOR][UT] Followup 6623, fix backends-clickhouse ut issse in CI @baibaichen #6891
- [1632]CHDaily 20240817) @kyligence-git #6903
- [6827]VL Add a new test case for Round's coverage @jiangjiangtian #6884
- [6849]VL Call static initializers once in Spark local mode / when session is renewed @zhztheplayer #6855
- [6889]VL Rename test class
TestOperator
toMiscOperatorSuite
@zhztheplayer #6890 - [6915][MISC]Fix workflow permission issue. @weiting-chen #6911
- VL[CI] Change to use push event to trigger docker build workflow @PHILO-HE #6918
- [6368] Redact sensitive configs when calling
gluten::printConfig
@ArnavBalyan #6793 - [6887]VL Daily 2024_08_19) @GlutenPerfBot #6910
- CH A simple job scheduler for merge tree cache sync load @liuneng1994 #6842
- [6915]COREFix listComments TypeError @weiting-chen #6919
- VL Add helper function ColumnarBatches.toString and InternalRow toString @jinchengchenghh #6458
- [6887]VL Daily 2024_08_20) @GlutenPerfBot #6928
- CHduplicate column name case support in broadcast join #6926 @loudongfeng #6927
- [3582]CH Fix bug for decimal and float type @baibaichen #6925
- [6915]CH Follow VL, fix github issue comment @lwz9103 #6922
- [1632]CHDaily 20240820) @kyligence-git #6929
- [6902]VLfix: Update to copy new LICENSE, NOTICE into jar @weiting-chen #6901
- [6882]CORE Move Spark / columnar rule list to backend code @zhztheplayer #6931
- [6864]VL Set a Velox gflag to allow growing buffer created in another Velox task @zhztheplayer #6932
- [6893]VL Change to using native libs generated by vcpkg build in Gluten scala tests @PHILO-HE #6894
- [6935]CHquery fails when set session level join_algorithm to… @loudongfeng #6944
- [6887]VL Daily 2024_08_21) @GlutenPerfBot #6946
- CH Added cleanup logic for expiration mergetree part cache @liuneng1994 #6955
- [6840]CH Enable cache files for hdfs @loneylee #6841
- VL Print memory statistics during task ending when leak is found @zhztheplayer #6959
- [6923]CH
total_bytes_written
is not updated in celeborn partition writers @lgbo-ustc #6939 - [6950]CORE Move specific rules into backend modules @zhztheplayer #6953
- [6908]VL Fix error when getting output from a Velox task that is under spilling by background thread @zhztheplayer #6934
- VL Malformed CI job name @zhztheplayer #6956
- [6893]VL Fix wrong github workflows path for hashing and minor code refactor @PHILO-HE #6952
- [5936]VL Add more types in function type validation and document Cast function @PHILO-HE #6963
- CH Ignore cache file with hdfs suite @loneylee #6969
- [6938]CH Fix core dump when range partition include literal @baibaichen #6964
- [6957]VL Fix missing mvn when CI cache is hit @PHILO-HE #6966
- [6957]VL Fix mvn not found in cache job @PHILO-HE #6974
- [6887]VL Daily 2024_08_22) @GlutenPerfBot #6967
- [1632]CHDaily 20240822) @kyligence-git #6968
- [6980]CORE In shim poms, use Scala Maven compiler configuration inherited from parent pom @zhztheplayer #6972
- VL Following #6959, leak memory dump is not correctly printed @zhztheplayer #6985
- [6987][DOC] fix: add shell newline character after spark-shell @dcoliversun #6986
- [1632]CHDaily 20240823) @kyligence-git #6984
- VL Add wallnanos for WriteFiles @Yohahaha #6976
- VL Remove config
a.g.s.c.extended.columnar.transform.rules
anda.g.s.c.extended.columnar.post.rules
from Velox backend @zhztheplayer #6991 - [6887]VL Daily 2024_08_23) @GlutenPerfBot #6983
- [6981]CHNot supported operator TakeOrderedAndProjectExecTransformer for BroadcastRelation @loudongfeng #6982
- [6877]CH Support anti/semi join with inequal join condition @lgbo-ustc #6913
- VL Support create temporary function for native hive udf @marin-ma #6829
- [6877]CH[UT] HotFix: Exclude unstable merge join q72 @baibaichen #7006
- [7008]VL Report spill metrics from Velox operators to Spark task @zhztheplayer #7009
- [6960]VL Limit Velox untracked global memory manager's usage @zhztheplayer #6988
- [6951]CORECH Move CustomerExpressionTransformer to CH backend @zhztheplayer #6993
- VL Following #6988, move a warning from core to Velox backend @zhztheplayer #7010
- VL[uniffle] Correct the write wait duration log @zuston #6994
- [6887]VL Daily 2024_08_26) @GlutenPerfBot #7002
- [6997]VL Ignore a test: cleanup file if job failed @PHILO-HE #6965
- [6887]VL Daily 2024_08_27) @GlutenPerfBot #7018
- CORE Rename OASPackageBridges @zhztheplayer #7022
- CH Enable more uts in GlutenOrcV1SchemaPruningSuite @taiyang-li #6895
- VL Add write IO metrics for WriteFiles @Yohahaha #7011
- [7035]VL Use first line of
ls-remote
's output as build's target commit @wForget #7036 - [6977]CH Remove concat function parser @taiyang-li #6978
- [6995]CORE Limit soft affinity duplicate reading detection max cache items @zhli1142015 #7003
- [5471]VLfeat: Support read Hudi COW table @yma11 #6049
- CORE Fix incorrect precision of decimal literal @jiangjiangtian #6954
- VL Set Spark memory overhead automatically according to off-heap size when it's not explicitly configured @zhztheplayer #7045
- VL Remove including xsimd headers coming from velox build path @PHILO-HE #7044
- [7037]VL Add dwarf dependency to folly when building with vcpkg @Z1Wu #7038
- [7033]VL Improve vcpkg docker file @wForget #7030
- [4724]CH Support function array_except @taiyang-li #7039
- [1632]CHDaily 20240828) @kyligence-git #7040
- CORE Fix a variable name typo @ychris78 #7053
- [7049]VL Install lib stemmer through vcpkg @PHILO-HE #7050
- [6989]CH Support RTrim with const source column @lwz9103 #6992
- [7024]VL Skip call collectMetrics when the task does not call next() @kecookier #7025
- [7014]CH Fix: different results from
get_json_object
@lgbo-ustc #7034 - [7031]CORE Initialize new module structure gluten-core / gluten-substrait @zhztheplayer #7057
- [6961]VL[feat] Add decimal write support for ArrowWritableColumnVector @jinchengchenghh #6962
- [1632]CHDaily 20240830) @kyligence-git #7062
- CH Add GlutenJsonExpressionsSuite @exmy #7064
- [6887]VL Daily 2024_08_28) @GlutenPerfBot #7041
- [6887]VL Daily 2024_08_31) @GlutenPerfBot #7070
- [6887]VL Daily 2024_09_01) @GlutenPerfBot #7073
- [341]CH Support BHJ + isNullAwareAntiJoin for the CH backend @zzcclp #7072
- [6887]VL Daily 2024_09_03) @GlutenPerfBot #7085
- VL Gluten-it: Remove a IDE-generated maven module name @zhztheplayer #7091
- [7031] Move task lifecycle management / memory consumer facilities to gluten-core @zhztheplayer #7088
- [7090]VL fix: Number of sorting keys must be greater than zero @dcoliversun #7089
- [7054]CH Fix cse alias issues @taiyang-li #7084
- [7077]CH
have_compressed
is lost inHashJoin::reuseJoinedData
@lgbo-ustc #7083 - VL Remove a limit for BHJ in stage fallback policy @PHILO-HE #7105
- [7031] Move iterator wrappers to gluten-core @zhztheplayer #7095
- [6589]CH Fix alias name cause caseSensitive error on mergetree create @loneylee #7063
- [6809]CH Support function unix_seconds/unix_date/unix_micros/unix_millis @taiyang-li #7094
- [6887]VL Daily 2024_09_04) @GlutenPerfBot #7106
- VL Remove Spark tokenizer @rui-mo #6713
- [7068]CORE Fix issue updating leaf input metrics @ivoson #7067
- [7015]VL Remove udf native registration @marin-ma #7016
- [V] Remove complex type fallback for parquet @yma11 #6712
- [6887]VL Daily 2024_09_05) @GlutenPerfBot #7119
- [7118]VL Fix duckdb target issue when vcpkg is enabled @PHILO-HE #7117
- [5880]CORE Ignore fallback for ColumnarWriteFilesExec children @wForget #7113
- [6813]CH Support soundex function @taiyang-li #7093
- [6748]CORE Search stack trace to infer adaptive execution context @PHILO-HE #7121
- [6571]VL Add platform and arch subdirectory for base lib package @wForget #6942
- [6863]VL Pre-alloc and reuse compress buffer to avoid OOM in spill @marin-ma #6869
- [7130]CORE Skip command execution when collect qe fallback summary @wForget #7132
- [6887]VL Daily 2024_09_06) @GlutenPerfBot #7136
- [7031]CORE Move JNI / exception utilities to gluten-core @zhztheplayer #7134
- VL CI: Run Q97 oom test but ignore the failure @zhztheplayer #7135
- VL New option to follow vanilla Spark's build side in shuffled hash join @zhztheplayer #7133
- VL Minor follow-ups for #6942 @zhztheplayer #7129
- CH Add package with spark 3.5 @loneylee #7140
- [7028]CH[Part-1] Using
PushingPipelineExecutor
to write merge tree @baibaichen #7029 - [4039]VL Support array insert function for spark 3.4+ @ivoson #7123
- [7032]CH Fix incorrect result using timestamp in-filter @lwz9103 #7122
- [7004]CORE Bump Spark version to 3.4.3 @Yohahaha #7115
- VL Fix function
input_file_name()
outputs empty string in certain query plan patterns @zml1206 #7124 - [1632]CHDaily 20240906) @kyligence-git #7137
- [7144]VL[RAS] Spark input file function support @zml1206 #7146
- [6834]CORE feat: add other join types from the official Substrait @EpsilonPrime #6835
- [7148]CORE Remove meaningless plan change log on TransformPreOverrides rule @wecharyu #7150
- [7155]CH Fix bucket table create error by mergetree @loneylee #7156
- CH Refactor: Move SerializedPlanParser::global_context to QueryContext @baibaichen #7147
- Revert "[6930]VL Print memory statistics during task ending when leak is found" @zhztheplayer #7158
- [6887]VL Daily 2024_09_07) @GlutenPerfBot #7152
- VL CI: In GHA CI, set timeout for Q97 OOM job @zhztheplayer #7162
- [7023]CH Shade dependency jars @loudongfeng #7027
- VL Fix weekly build job failure @PHILO-HE #7163
- [7112]CH Pushdown aggregation's pre-projection ahead expand node @lgbo-ustc #7142
- CH Minor, update package.sh @lwz9103 #7175
- [6808]CH support function arrays_zip @taiyang-li #7048
- [6887]VL Daily 2024_09_10) @GlutenPerfBot #7172
- [7164]VL Disable background IO threads by default @zhztheplayer #7165
- VL[CI] Upgrade GHA upload/download artifacts @PHILO-HE #7182
- [6887]VL Daily 2024_09_11) @GlutenPerfBot #7190
- VL[MINOR] Allow build_gluten_cpp read custom velox home @boneanxs #7184
- [7177]CH Fix read hdfs performance issue @loneylee #7187
- [7180]CH Fix ut
Eliminate NAAJ when BuildSide is HashedRelationWithAllNullKeys
for the CH backend when the aqe is on @zzcclp #7181 - [7179]CH Fix infinite loop with parquet column index reader @lwz9103 #7185
- CH Shuffle writer connects to CH pipeline @liuneng1994 #6723
- [7100]CH support function timestamp_seconds/timestamp_millis/timestamp_micros @taiyang-li #7102
- [MINOR][BUILD] Extract gcc version from libgluten.so @wecharyu #7189
- CH Fix load cache missing columns @liuneng1994 #7192
- VL Make conf option
s.g.s.c.shuffledHashJoin.optimizeBuildSide
work correctly with options.g.s.c.forceShuffledHashJoin
@zhztheplayer #7186 - [6887]VL Daily 2024_09_12) @GlutenPerfBot #7199
- CORE minor: Duplicated inheritance on SparkPlan @zml1206 #7197
- VL Add tests for Velox SMJ's coverage @zhztheplayer #7195
- [7087]CH Support
WindowGroupLimitExec
@lgbo-ustc #7176 - [7202]CH Fix: local executor cannot dump pipeline stats @lgbo-ustc #7204
- [7145]CH[PART]refactor for rel parsers @lgbo-ustc #7193
- VL Fix ioWaitTime metrics for scan @Yohahaha #7198
- [7205] VL Optimize row to column for scalar type @jinchengchenghh #7206
- [6816]CH support function zip_with with some minor refactors @taiyang-li #7211
- [7208]VLfix: loading libvelox.so failed when using static glog @JinHelin404 #7209
- [6887]VL Daily 2024_09_13) @rui-mo #7219
- [7031]VL Minimized backend API @zhztheplayer #7218
- [7224]CHUpdate doc for compiling ch backend @lgbo-ustc #7225
- [7224]CH update doc for compiling ch backend @lgbo-ustc #7227
- VL Collapse trivial projects generated by rule PushDownInputFileExpression @zml1206 #7188
- VL Rename ShuffledHashJoinExecTransformer.scala to HashJoinExecTransformer.scala @PHILO-HE #7228
- CORE Minor code cleanup for package object of
org.apache.gluten
@zhztheplayer #7231 - VL RAS: Avoid adding R2C whose schema contains complex data types @zhztheplayer #7229
- [7222]CH Fail to compile ch backend @lgbo-ustc #7226
- [7116] CH support outer explode @shuai-xu #7207
- [7224]CHUpdate ClickHouse.md @lgbo-ustc #7230
- [6805]CH support function array_remove/array_repeat @taiyang-li #7210
- [6887]VL Daily 2024_09_14) @GlutenPerfBot #7237
- [7241]VL Correct loaded libname in SharedLibraryLoader @wForget #7245
- CH Support rocksdb disk metadata @liuneng1994 #7239
- [7243]VL Fix Q97 cross-task spilling hangs @zhztheplayer #7244
- [7213]CORE Make fallback reason for CheckOverflowInTableInsert clearer @wForget #7248
- CH Fix GlutenLiteralExpressionSuite and GlutenMathExpressionsSuite @taiyang-li #7235
- [6887]VL Daily 2024_09_18) @GlutenPerfBot #7259
- [7220]CHFix expand bug in grouping sets query @KevinyhZou #7221
- [7203]CORE Make push down filter to scan as a individual rule @zml1206 #7215
- [1632]CHDaily 20240918) @kyligence-git #7260
- [7264]COREVL Reduce module dependencies of
gluten-data
@zhztheplayer #7265 - [7276]VL Make fallback reason for GetStructField clearer @wForget #7277
- [6887]VL Daily 2024_09_19) @JkSelf #7272
- [7264]VL Rename module
gluten-data
togluten-arrow
@zhztheplayer #7278 - VL Fix bug when setting Spark memory overhead automatically @leoluan2009 #7275
- [7262]CH Fix cache file commond run normal with config disabled @loneylee #7263
- [7028]CH[Part-2] Refactor: Move MergeTree related UT to mergetree module @baibaichen #7279
- COREVL Minor code cleanups @zhztheplayer #7280
- VL Fix columns added to
outNames
twice when building Substrait plan @zml1206 #7274 - VL Gluten-it: In auto cluster mode, add option
--off-heap-ratio
for adjusting memory shares of off-heap and on-heap @zhztheplayer #7286 - [6768]CH Clear unused configures after refactor reordering hash join tables @lgbo-ustc #7287
- VL CI: Q97 OOM test passed, stop ignoring its return code in CI @zhztheplayer #7294
- [1632]CHDaily 20240920) @kyligence-git #7299
- VL Customize VCPKG build features according to user's build options @PHILO-HE #7052
- VL Remove unused config VELOX_FORCE_COMPLEX_TYPE_SCAN_FALLBACK @felipepessoto #7303
- [INFRA] Label gluten-hudi as DATA_LAKE @dcoliversun #7298
- [6887]VL Daily 2024_09_23) @GlutenPerfBot #7309
- VL Enhance spill log readability @Yohahaha #7300
- [6975]CH Rewrite decimal arithmetic @loneylee #7196
- VL fix vcpkg package script #7052 @zhouyuan #7316
- VL Minor code cleanups @zhztheplayer #7312
- VL Minor follow-ups for #7052 @zhztheplayer #7315
- [7178]VL Fix field not found error when struct field name contains upper case @zml1206 #7304
- [6887]VL Daily 2024_09_24) @GlutenPerfBot #7321
- VL Add VeloxTransitionSuite @zhztheplayer #7324
- CORE Remove unused allPushDownFilters param @zml1206 #7317
- [7096] CH fix exception when same names in group by @shuai-xu #7101
- VL Fix vcpkg binary caching in docker image @PHILO-HE #7331
- [7327][INFRA] Publish Velox Backend Test Result Report in Github Action @dcoliversun #7328
- [MINOR][DOCS] Improve the configuration document @beliefer #7334
- VL Fix field name parsing in Subfield @rui-mo #7330
- [6975]CH Fix decimal cast overflow exception @loneylee #7335
- VL Follow-ups for #7304 @zhztheplayer #7340
- [7323]VL Always round negative decimals for integral types @surnaik #7337
- [7344]CH Fix the error default database name and table name for the mergetree file format when using path based @zzcclp #7346
- [7348]CH Move function calculateColumnAndSecondaryIndexSizesImpl outside @loneylee #7349
- [6887]VL Daily 2024_09_25) @GlutenPerfBot #7338
- [7028]CH[Part-3] Refactor: Move mergetree related codes to backends-clickhouse @baibaichen #7234
- [7313]VL Explicit Arrow transitions, part 1: add LoadArrowDataExec / OffloadArrowDataExec @zhztheplayer #7343
- VL Minor: Fix documentation of flushable partial aggregate @surnaik #7353
- [7351]CORE Code cleanup for Gluten session extensions @beliefer #7352
- VL Override nodename for IcebergScanTransformer @leoluan2009 #7345
- [7283]CORE Support DynamicPruningExpression conversion @wForget #7284
- VL Enable AtLeastNNonNulls function @zhli1142015 #7326
- [6887]VL Daily 2024_09_26) @JkSelf #7355
- [7356]CORE Make GlutenConfig.GLUTEN_CONFIG_PREFIX private @baibaichen #7357
- VL Use new krb5 download url when enable vcpkg @leoluan2009 #7347
- [7367]CH Revert #7101 @baibaichen #7368
- [6887]VL Daily 2024_09_27) @GlutenPerfBot #7370
- [7358]CH Optimize the strategy of the partition split according to the files count @zzcclp #7361
- [1632]CHDaily 20240928) @kyligence-git #7379
- [7313]VL Explicit Arrow transitions, part 2: new algorithm to find optimal transition @zhztheplayer #7372
- [7313]VL Explicit Arrow transitions, part 3: code cleanups @zhztheplayer #7383
- [6887]VL Daily 2024_09_29) @GlutenPerfBot #7381
- [7385]CH Add some config parameters to constrol the cache size for the mergetree parts @zzcclp #7386
- [7364]CORE Simplify the RuleInjector @beliefer #7365
- [7376][ICEBERG]Avoid retrieving the partition schema of Iceberg @lyy-pineapple #7377
- VL pass phase in recusive invocation of spillTree @waitinfuture #7388
- [7028]CH[Part-4] Refactor
DeltaMergeTreeFileFormat
to read table configuration from deltalog's metadata @baibaichen #7170 - [6887]VL Daily 2024_10_01) @GlutenPerfBot #7398
- [7307]VL Update openssl version in velox setup for centos9 @pratham76 #7308
- [6887]VL Daily 2024_10_02) @GlutenPerfBot #7404
- [1632]CHDaily 20241003) @kyligence-git #7407
- [7394]CHReduce the times of the calling listFiles when executing query from the parquet file format @zzcclp #7417
- [7313]VL Explicit Arrow transitions, part 4: explicit Arrow-to-Velox transition @zhztheplayer #7392
- CORE Minor: Fix warnings and rename event handler better @surnaik #7412
- [7427]CHRevert "fix (#7349)" @baibaichen #7428
- [6887]VL Daily 2024_10_03) @GlutenPerfBot #7406
- [6887]VL Daily 2024_10_04) @GlutenPerfBot #7408
- [7437]CHRevert "Auxiliary commit to revert individual files from #7170 @baibaichen #7438
- [7418]VL Add checks for allocation failures and initialize variables @majetideepak #7419
- [7400]CORE Scala code style clean up for Backend.scala @beliefer #7401
- CORE Infra: Do not dismiss stale reviews @zhztheplayer #7430
- CORE Infra: Forward GitHub discussions to Apache mailing list @zhztheplayer #7429
- [6856]CHSupport arrays_overlap and fix array_join diff @KevinyhZou #6857
- [7313]VL Explicit Arrow transitions, part 5: extra code cleanups @zhztheplayer #7436
- [6887]VL Daily 2024_10_08) @GlutenPerfBot #7422
- VL Adapt setup-centos8.sh to latest velox helper functions @liujiayi771 #7442
- Minor fix for info.sh @PHILO-HE #7444
- [7028]CH[Part-5] Refactor: add NativeOutputWriter to unify CHDatasourceJniWrapper @baibaichen #7395
- [7325] CH add a config to enable turn off read json @shuai-xu #7333
- VL Fix dependencies setup @PHILO-HE #7443
- [7402]CORE Code cleanup for GlutenPlugin @beliefer #7403
- CORE Update AllocationListener usage after successfully releasing memory @wForget #7396
- VL Prepare shim API for breaking change in SPARK-48610 @zhztheplayer #7445
- [6887]VL Daily 2024_10_09) @GlutenPerfBot #7447
- CORE Fix GH security issues @zhouyuan #7448
- [1632]CHDaily 20241010) @kyligence-git #7454
- VL Remove VELOX_BUILD_PATH from include directories if build test is disabled @PHILO-HE #7449
- [7426]CH Fixed: json path contains spaces @lgbo-ustc #7435
- CORE Fix duplicated column names in DS test @FelixYBW #7457
- [7440]VL Enable unit tests on missing all struct fields @rui-mo #7456
- [6784][6828]VL Add tests for weekOfYear and cast string as date @zml1206 #6888
- [7459]VL Move 3.2 / 3.3 Velox native file writer code to
backend-velox
/cpp/velox
@zhztheplayer #7461 - [7410]CORE Add test args to run spark-ut with Java 17 @CodenameGHOST007 #7411
- [7465]CH Fix compile error 'Unable to locate class corresponding to inner class entry for BuilderParent in owner com.google.protobuf.AbstractMessage' @zzcclp #7466
- [7240]CH Fix all failed uts in GlutenComplexTypeSuite @taiyang-li #7242
- [6887]VL Daily 2024_10_11) @GlutenPerfBot #7468
- VL Following #7461, add a minor fix @zhztheplayer #7471
- [7373][DOC]VL Add document for profiling gluten with velox @wForget #7374
- [7480]VL Clean up some code for protobuf @PHILO-HE #7473
- [7480]VL Build centos-8 docker image for GHA workflow @PHILO-HE #7481
- VL Follow-up for #7481 to fix docker build error @PHILO-HE #7491
- [7489][INFRA] fix: allow to fetch artifacts from workflow
Velox backend Github Runner
in upstream repo @dcoliversun #7490 - [7291][DELTA] fix: push down input_file_name expression to transformer scan in delta @dcoliversun #7483
- [6887]VL Code clean for hasUnsupportedColumns function @zhli1142015 #7477
- [7495][BUILD] Fix macOS does not support version-script @zml1206 #7497
- [6887]VL Daily 2024_10_12) @GlutenPerfBot #7487
- [6887]VL Daily 2024_10_13) @GlutenPerfBot #7504
- [7493]VL Update Velox.md to clarify dependency deployment @PHILO-HE #7492
- VL Enable Spark legacy date formatter if spark.sql.legacy.timeParserPolicy is set to 'LEGACY' @NEUpanning #7375
- [7243]VL Fix hanging by cross-task spilling @zhztheplayer #7479
- [7484]CHFix element_at diff @KevinyhZou #7485
- [7496][DOCS] Add View the Surefire reports of velox test in NewToGluten.md @dcoliversun #7501
- [7389] CH fix cast map to string diff with spark @shuai-xu #7393
- [7510]VL[CI] Change centos-8 docker image to accelerate GHA workflow @PHILO-HE #7511
- [7509]VL Memory management: Release all native memory managers after all Velox tasks were released @zhztheplayer #7478
- [7518]VL Clean up some code used for building protobuf @PHILO-HE #7522
- [7509]VL Register release hooks as well as factories for Runtime and MemoryManager @zhztheplayer #7516
- [7432]CH Exception when the result of get_json_object is an array @lgbo-ustc #7513
- [7110]VL[DELTA] support IncrementMetric in gluten @dcoliversun #7111
- [7526]VL Scala code style for VeloxCollect @beliefer #7527
- VL Lower default spill run size to reduce overhead memory usage @zhztheplayer #7463
- [7524]VL[UNIFFLE] Reset rss.row.based configuration of uniffle @wForget #7525
- [7311]CH Support grace aggregate algorithm in partial aggregating stages @lgbo-ustc #7322
- [6876] Support Spark-352 @zhouyuan #7138
- [6887]VL Daily 2024_10_15) @GlutenPerfBot #7530
- [7517]CH Support build gluten package with scala213 @lwz9103 #7520
- [1632]CHDaily 20241015) @kyligence-git #7529
- [7145]CH Decouple
SerializedPlanParser
from other parser modules @lgbo-ustc #7250 - [7539]VL Remove some unnecessary Velox code changes from modify_velox.patch @PHILO-HE #7540
- [6887]VL Daily 2024_10_16) @GlutenPerfBot #7549
- [7535]VL Containerized build within CentOS 7 image @zhztheplayer #7538
- [7514]VL Reorganize Dockerfiles and document how to build gluten in docker @PHILO-HE #7515
- [7420]VL Fix GCS configuration @majetideepak #7421
- [7550]CH Rewrite
get_json_object
insingular_or_list
@lgbo-ustc #7551 - [7535]VL CentOS 7 containerized build: Fix for automake version error @zhztheplayer #7555
- [7542]CH Fix cache not refresh @loneylee #7547
- [7499]VL[CI] Enable ccache in GHA job @zhouyuan #7546
- [1632]CHDaily 20241016) @kyligence-git #7558
- [6887]VL Daily 2024_10_17) @GlutenPerfBot #7566
- [7499]VL CI: Remove GHA binary cache @zhztheplayer #7554
- [7522]CH Improve jsonpath support in
get_json_object
@lgbo-ustc #7556 - [7482]CH Remove redundant head object operation of s3 @loneylee #7565
- [7359]VL feat: Support columnar partial project for UDF @jinchengchenghh #7360
- [7563]CH Fix failure on too large double number @lgbo-ustc #7570
- [1632]CHDaily 20241017) @kyligence-git #7567
- [7559]VL[UNIFFLE] Set rss.enabled to true in UniffleShuffleManager @wForget #7560
- [7450]VL Improve CollectRewriteRule for Velox @beliefer #7451
- [7359]VL Enable partial project in RAS @zhztheplayer #7574
- [7572]CORE Check if iterator has been closed @wForget #7573
- VL Remove unnecessary vanilla Spark compatibility code for VeloxCollectSet function @zhztheplayer #7590
- [1632]CHDaily 20241018) @kyligence-git #7588
- [7541]VL Improve HLLRewriteRule for Velox @beliefer #7543
- [6887]VL Daily 2024_10_18) @GlutenPerfBot #7587
- [6887]VL Daily 2024_10_19) @GlutenPerfBot #7607
- [6887]VL Daily 2024_10_21) @GlutenPerfBot #7617
- [7591]CHFix: fail to normalize json text with empty object in it @lgbo-ustc #7595
- [7596]CH Fix bnlj empty join error @loneylee #7597
- [7600]VL Prepare test case for the removal of workaround code for empty schema batches @zhztheplayer #7601
- [7615]CORE Introduce
GlutenFormatFactory
@baibaichen #7616 - [7609]CORE Fix the bug that Gluten cannot change logging level @beliefer #7610
- [7455]CH Add
spark_modulo
for compatibility @lgbo-ustc #7619 - [7621]CH Fix repeat function reports an error when times is a negative number @loneylee #7622
- [7577]VL Add pattern match for extension rules @zml1206 #7584
- [7585]VL Fix S3 and GCS configs @majetideepak #7586
- [7604]CORE Code refactors against ColumnarRuleApplier.Executor @beliefer #7606
- [7581]CORE Code cleanup for GlutenColumnarRule @beliefer #7582
- [6887]VL Daily 2024_10_22) @GlutenPerfBot #7626
- [7545]CH Fix regexp_replace group catching syntax diff @zhanglistar #7603
- [7623]CH Fix running cache command with error when executor add and removed. @loneylee #7625
- [7600]VL Remove EmptySchemaWorkaround @zhztheplayer #7620
- [7336]CORE Bump Spark version to v3.5.3 @Yohahaha #7537
- CORE Move scala file to scala package and fix minor typo in comment @leoluan2009 #7635
- [7028]CH[Part-6] Introduce MergeTreeDelayedCommitProtocol @baibaichen #7506
- VL Fix missing VELOX_HOME in builddeps-veloxbe.sh @liujiayi771 #7629
- [5525][FOLLOWUP] Fix mvn versions:set does not work for shim submodules @JinHelin404 #7593
- [6887]VL Daily 2024_10_23) @GlutenPerfBot #7643
- [7600]VL Simplify offload rules in RAS @zhztheplayer #7646
- VL Move Scala file to Scala package @surnaik #7653
- VL Code cleanup for Arrow CSV UTs @zhztheplayer #7651
- [7458]VL Upgrade to GCC-11 for centos-7/8 and ubuntu-20.04 @PHILO-HE #7578
- [6887]VL Daily 2024_10_24) @GlutenPerfBot #7662
- CORECH Remove ValidatorApi.doSparkPlanValidate @zhztheplayer #7668
- VL Follow-up fix for gcc upgrade PR @PHILO-HE #7667
- [7475]VL Add a config to control whether add trim node when CAST from varchar @Henry2SS #7476
- [BUILD][DOCS]VL Change BUILD_JEMALLOC to ENABLE_JEMALLOC_STATS @surnaik #7650
- VL Upgrade FB_OS_VERSION to v2024.07.01.00 @PHILO-HE #7671
- [7673]CH Fix substrait infinite loop @loneylee #7674
- [7665] Remove duplicated tpch/tpcds queries resources @marin-ma #7666
- [5103]VL Use jvm libhdfs replace c++ libhdfs3 @JkSelf #6172
- Revert "[5103]VL Use jvm libhdfs replace c++ libhdfs3" @marin-ma #7683
- [7659]CH Implement splittable bzip2 decompression @taiyang-li #7638
- [5103]VL Use jvm libhdfs replace c++ libhdfs3 @JkSelf #7684
- CORE Rework the implementation of spark.gluten.enabled @zhztheplayer #7672
- [6887]VL Daily 2024_10_26) @GlutenPerfBot #7688
- [1632]CHDaily 20241026) @kyligence-git #7689
- [7359]VL Optimize string in partial project @jinchengchenghh #7592
- [7681]CH[ARM] Fix compile issue for SparkFunctionFloor @loudongfeng #7682
- [7661]VL Fix validate native IfThen expr @wForget #7669
- [6887]VL Daily 2024_10_28) @marin-ma #7694
- [7665] Remove tpch-queries-velox @marin-ma #7705
- VL Code clean for BasicPhysicalOperatorTransformer @zml1206 #7695
- [7143]VL CI: Add GHA job for running all UTs with RAS=ON @zhztheplayer #7702
- [6887]VL Daily 2024_10_29) @GlutenPerfBot #7708
- [7657]CHFix to_unix_timestamp when input parameter is timestamp type @KevinyhZou #7660
- [6387]CH support percentile function @taiyang-li #6396
- [7685]VL[RAS] Add new cost model to avoid costly r2c @zml1206 #7686
- [7143]VL Fix several UTs for RAS @zhztheplayer #7701
- [7713]CH Fix page index reader failed with or logical operator @baibaichen #7716
- VL CI: One OOM GHA job has passed, stop ignoring its result @zhztheplayer #7712
- [7709]CH Rule constructor simplifications @beliefer #7710
- [7717]CH [ARM]fix compile issue for SparkFunctionRoundHalfUp @loudongfeng #7718
- [7143]VL RAS: Fix test case "test ignore row to columnar" when RAS=ON @zhztheplayer #7725
- [1632]CHDaily 20241030) @kyligence-git #7720
- [6887]VL Daily 2024_10_30) @marin-ma #7722
- [6887]VL Daily 2024_10_31) @GlutenPerfBot #7739
- VL RAS: Fix fallen back plan nodes are not tagged with meaningful fallback reasons @zhztheplayer #7731
- VL Enhance write parquet with compression codec test @wecharyu #7737
- [7714]CHFix issue caused by incomplete line if there is only one line in last bzip2 block @taiyang-li #7715
- VL Add metric to indicate aggregation pushdown @zhli1142015 #7729
- [7703]VL ColumnarBuildSideRelation transform support multiple key columns @yikf #7704
- [6887]VL Daily 2024_11_01) @GlutenPerfBot #7761
- [7670]CH Fix enable 'files.per.partition.threshold' bug @loneylee #7758
- [7143]VL RAS: Fix failed UTs in GlutenSQLQueryTestSuite @zhztheplayer #7754
- [7747]CH Fix murmur3hash on arm @lwz9103 #7757
- [7143]VL RAS: Revert #7731, disable the relevant test cases since RAS doesn't report fallback details due to #7763 @zhztheplayer #7764
- [7143]VL RAS: Catch exceptions thrown from rewrite rules @zhztheplayer #7767
- [7771]CHFix crc32 failure in bzip2 @taiyang-li #7772
- [7765]CH Support CACHE META command for MergeTree table @loneylee #7774
- [1632]CHDaily 20241101) @kyligence-git #7762
- [7775]CORE Make sure the softaffinity hash executor list is in order @zzcclp #7776
- [6887]VL Daily 2024_11_02) @GlutenPerfBot #7784
- [7753]CORE Do not replace literals of expand's projects in
PullOutPreProject
@lgbo-ustc #7756 - [7446][BUILD] build third party libs using jar from JAVA_HOME @Zand100 #7736
- [7174]VL Force fallback scan operator when spark.sql.parquet.mergeSchema enabled @Yohahaha #7634
- [6887]VL Daily 2024_11_03) @GlutenPerfBot #7786
- [7782] Fix profile Darwin-x86 os.arch error @zml1206 #7783
- [7780]CH Fix split diff @taiyang-li #7781
- [7792]CH Set default minio ak/sk to minioadmin @lwz9103 #7793
- [DOC] Update release plan for Velox backend @zhouyuan #7744
- [7727]CORE Unify the variable name of GlutenConfig with glutenConf @beliefer #7728
- [7700]CH Fix issue when partition values contain space @exmy #7719
- [7741]VL refine build package tool @zhouyuan #7742
- [7797]VL Fix lacking icu lib on centos-7 @PHILO-HE #7798
- VL Remove a duplicated Maven dependency, and some follow-ups for #7764 @zhztheplayer #7773
- [7143]VL RAS: Enable the RAS UT jobs in GHA CI @zhztheplayer #7770
- [6887]VL Daily 2024_11_05) @GlutenPerfBot #7808
- VL In
ColumnarBatchSerializerJniWrapper_serialize
, check if the byte array is constructed successfully @NEUpanning #7733 - [7814]CH Support trigger Gluten ClickHouse CI on ARM @lwz9103 #7815
- [7243]VL Suspend the Velox task while reading an input Java iterator to make the task spillable @zhztheplayer #7748
- [7654]CH Fix round on arm @lwz9103 #7794
- [1632]CHDaily 20241105) @kyligence-git #7809
- CH Ignore unstabe uts and add more message when failed. @baibaichen #7821
- [7812]CH Fix the query failed for the mergetree format when the 'spark.databricks.delta.stats.skipping' is off @zzcclp #7813
- [6887]VL Daily 2024_11_06) @GlutenPerfBot #7822
- VL Remove load shared libhdfs @Yohahaha #7818
- [7749]VL Trim ISOControl characters in string for casting to integral type @wForget #7806
- CH Rename Mergetree part file name to avoid duplicated file name @liuneng1994 #7769
- [7807] Transform relation bound attr using the name if attr'exprId not found. @yikf #7819
- [1632]CHDaily 20241106) @kyligence-git #7824
- [7795]CH Add backend task id log @loneylee #7801
- [7647]CH Lazy expand for aggregation @lgbo-ustc #7649
- VL Remove one legacy Velox config used for Spark collect_list function @PHILO-HE #7826
- CORE Remove unused dependencies of gluten-substrait @zml1206 #7833
- [7079]VL Fix metrics for InputIteratorTransformer of broadcast exchange @ivoson #7167
- [7800]VL Add config for max reclaim wait time to avoid dead lock when memory arbitration @Yohahaha #7799
- [7829]CH Fix read csv file with datetime field not equals spark @loneylee #7832
- [7796]CH Fix diff while casting bool to string @taiyang-li #7804
- [7778]CH Make aggregation output schema same as CH native @lgbo-ustc #7811
- Revert "[7800]VL Add config for max reclaim wait time to avoid dead lock when memory arbitration" @zhztheplayer #7836
- VL Sort shuffle writer use vectorized c2r @marin-ma #6782
- [6887]VL Daily 2024_11_07) @GlutenPerfBot #7834
- [7675]VL Support parquet write with complex data type(eg. MAP, ARRYY) @weixiuli #7676
- [7759]CHFix pre project push down in aggregate @KevinyhZou #7779
- [7795]CH Remove duplicate log object @loneylee #7839
- [1632]CHDaily 20241107) @kyligence-git #7835
- VL Fix ccache installation in docker @PHILO-HE #7848
- CORE Minor: Rename LimitTransformer to LimitExecTransformer @zhztheplayer #7843
- VL Follow-up fix for PR #7848 to install ccache @PHILO-HE #7858
- [7458]VL Upgrade GCC to version 11 in gluten-te's ubuntu dockerfile @zhztheplayer #7859
- CORE Remove member TransformContext#inputAttributes as unused @zhztheplayer #7844
- VL Re-enable background IO threads by default @zhztheplayer #7845
- [7850]VL Native writer support CreateHiveTableAsSelectCommand @yikf #7851
- [7862]CHfix pre-projection in aggregate not take effect @KevinyhZou #7863
- CH[Doc] Add Gluten CH Debug docs. @lwz9103 #7846
- VL Clean up some legacy code and correct minimum GCC version @PHILO-HE #7865
- VL Do not use --version-script link option on Darwin @PHILO-HE #7820
- [7647]CH Enable lazy expand for
avg
andsum(decimal)
@lgbo-ustc #7840 - [MINOR][INFRA] Exclude metastore_db from git @yikf #7871
- [7760] Fix udf implicit cast & update doc @marin-ma #7852
- [6887]VL Daily 2024_11_08) @GlutenPerfBot #7854
- [6887]VL Daily 2024_11_09) @GlutenPerfBot #7875
- [7028]CH[Part-7] Support one pipeline write for mergetree @baibaichen #7788
- VL Fix weekly scheduled GHA job @PHILO-HE #7888
- VL Enable array test for GlutenParquetIOSuite @zml1206 #7841
- [Doc] Show gluten icon when using IDEA @liuneng1994 #7894
- [1632]CHDaily 20241111) @kyligence-git #7884
- [6887]VL Daily 2024_11_10) @GlutenPerfBot #7881
- [6887]VL Daily 2024_11_11) @GlutenPerfBot #7883
- [7847]CORE Distinguish between native scan and vanilla spark scan in plan tree string @zml1206 #7877
- [7886]VL Fix broken Ubuntu 20.04 + VCPKG + GCS + ABFS build @zhztheplayer #7906
- [7890][UI] Optimize cleanup gluten sql executions ui data @zml1206 #7891
- [7907]CH Fixed data race in
ExpresionParser::getUniqueName
@lgbo-ustc #7908 - VL CI: Fix out-of-date module name in labeler.yml @zhztheplayer #7915
- [7078]CORE The fallback check for Scan should not be skipped when DPP is present @wang-zhun #7080
- [7868]CH Nested column pruning for Project(Filter(Generate)) @taiyang-li #7869
- CORE Consolidate RewriteSparkPlanRulesManager, AddFallbackTagRule, TransformPreOverrides into a single rule @zhztheplayer #7918
- [7823] Revert "read data from orc file format - ignore reading except date32" @baibaichen #7917
- [6887]VL Daily 2024_11_12) @zhouyuan #7899
- [7028]CH[Part-8] Support one pipeline write for partition mergetree @baibaichen #7924
- [Minor] Fix a typo in Gluten config @PHILO-HE #7931
- VL Clean up unused variables in cpp source files @rui-mo #7929
- CH Fix SIGSEGV on jstring2string @liuneng1994 #7928
- [7647]CH Fixed a bug in finding attributes in replacement map @lgbo-ustc #7927
- CORE Revert Spark version from v353 to v352 @Yohahaha #7930
- VL[CI] Fix back upload golden files @zml1206 #7880
- [6887]VL Daily 2024_11_13) @zhouyuan #7926
- [7243]VL A follow-up fix for #7748 @zhztheplayer #7935
- [6666]VL Use custom SparkExprToSubfieldFilterParser @rui-mo #6754
- [7641]VL Add Gluten benchmark scripts @marin-ma #7642
- [7647]CH Remove duplicated columns in agg results @lgbo-ustc #7937
- [7856]CORE Ensure correct enabling of GlutenCostEvaluator @weixiuli #7857
- VL Fix wrong lib suffix for google_cloud_cpp_storage @PHILO-HE #7933
- VL Add test for scan operator with filter on decimal/timestamp/binary field @rui-mo #7945
- [7362]VL Add test for 'IN' and 'OR' filter in Scan @zml1206 #7363
- [7387]CH Allow parallel downloading in scan operator for hive text/json table when the whole compresse(not bzip2) file is a single file split @taiyang-li #7598
- [6887]VL Daily 2024_11_14) @GlutenPerfBot #7942
- [7647]CH Drop literals in aggregation results @lgbo-ustc #7951
- CH Fix issues due to ClickHouse/ClickHouse#71539 @baibaichen #7952
- [6896] Add buffered read for hash/sort shuffle reader @marin-ma #7897
- [7837]VL Spark driver should not initialize cache if not in local mode @leoluan2009 #7853
- [7267]CORECH Support nested column pruning for
HiveTableScan
json/parquet/orc format @KevinyhZou #7268 - [7499]VL[CI] Print ccache statistics for tracking its efficacy @PHILO-HE #7957
- [7594] CH support cast const map to string @shuai-xu #7599
- [7947]CORE Add buildSide info for BroadcastNestedLoopJoinExecTransformer simpleStringWithNodeId @zml1206 #7948
- [6887]VL Daily 2024_11_15) @GlutenPerfBot #7954
- [6887]VL Daily 2024_11_16) @GlutenPerfBot #7958
- [6887]VL Daily 2024_11_17) @GlutenPerfBot #7961
- Add config to support viewfs in Gluten. @JkSelf #7892
- [7959]CH
AdvancedExpandStep
generates less row than expected @lgbo-ustc #7960 - [7962]CH A friendly API to build aggregator params @lgbo-ustc #7963
- VL Clean up some legacy code related to USE_AVX512 @PHILO-HE #7956
- [6887]VL Daily 2024_11_18) @GlutenPerfBot #7965
- [7910]COREVL Flip dependency direction for gluten-iceberg @zhztheplayer #7967
- [7983]CH Fix NPE when disable spark.shuffle.compress @exmy #7984
- [7887]VL[DOC] Add usage doc about dynamic load jvm libhdfs and native libhdfs3 @JkSelf #7982
- [6853]CORE Move more general query planner APIs from gluten-substrait to gluten-core @zhztheplayer #7972
- [6887]VL Daily 2024_11_19) @GlutenPerfBot #7978
- [7969]VL Enable spill to multiple directories for micro benchmark @marin-ma #7970
- CORE Avoid formatted comments from being messed by non-spotless linters (especially IDE linters) @zhztheplayer #7989
- [7751]VL Merge two consecutive aggregates to one in complete mode @yikf #7752
- [7800]VL Add config for max reclaim wait time to avoid dead lock when memory arbitration @Yohahaha #7990
- [7986]CH Improve lazy expand for high cardinality aggregation @lgbo-ustc #7995
- CORE Minor: Use lower case for Maven profile names @zhztheplayer #8001
- CORE Query planner: A more explicit practice to register columnar batch types @zhztheplayer #8002
- [7979]CH Fix exception cause by one child of UnionExec outputs Array(Nothing) while the other outputs Array(String) @taiyang-li #7980
- [7971]CH Support using left side as the build table for the left anti/semi join @zzcclp #7981
- [6887]VL Daily 2024_11_20) @GlutenPerfBot #7997
- [7267]CORECH Move schema pruning optimization of HiveTableScan to an individual post-transform rule @zhztheplayer #8008
- [8005]VL Add MergeTwoPhasesHashBaseAggregate to injectRas list @yikf #8006
- VL fallback unsupported orc write for spark32 and spark33 @jackylee-ch #7996
- CORECH Remove API BackendSettingsApi#supportShuffleWithProject @zhztheplayer #8009
- [7999]VL Add compression codec extension to velox written parquet file @liujiayi771 #8000
- [7028]CH[Part-9] Collecting Delta stats for parquet @baibaichen #7993
- [6887]VL Daily 2024_11_21) @GlutenPerfBot #8012
- VL Link shared jemalloc lib to work with LD_PRELOAD @PHILO-HE #7369
- [6887]VL Daily 2024_11_22) @GlutenPerfBot #8019
- [6887]VL Daily 2024_11_24) @GlutenPerfBot #8028
- [7953]VL Fetch and dump all inputs for micro benchmark on middle stage begin @marin-ma #7998
- [7950]VL Keep Core module's build flag consistent with Velox @surnaik #8027
- VL RAS: Remove alternative constraint sets passing to RAS planner @zhztheplayer #8033
- [6920]CORE Move API
Backend#defaultBatchType
down toBackendSettingsApi
in module gluten-substrait @zhztheplayer #8016 - [8010]CORE Don't generate native metrics if transformer don't generate relNode @zml1206 #8011
- VL Bump jemalloc version and update relevant documents @PHILO-HE #8035
- [MISC] Velox maintainers as triage member(collaborators) @zhouyuan #8037
- VL Clean up duplicate CMake code for setting CMAKE_CXX_FLAGS @surnaik #8034
- [7741]VL Fix deprecated actions/upload-artifact version issue when building bundle package @wangyum #8017
- VL vcpkg: Broken libelf mirror @zhztheplayer #8047
- [6887]VL Daily 2024_11_26) @GlutenPerfBot #8042
- [7896]CHFix to_date diff for time parser policy config @KevinyhZou #7923
- CHDaily 20241118) @liuneng1994 #7968
- [8046]VL CI: fix velox cache/bundle package script @zhouyuan #8051
- [7631]VL Fall back lead/lag if input is foldable @zml1206 #8038
- [6920]CORE Redesign and move trait
GlutenPlan
togluten-core
@zhztheplayer #8036 - [3839]CH Extend nested column pruning in vanilla spark @taiyang-li #7992
- [8039]VL Native writer should respect table properties @yikf #8040
- VL Enable locate function test @rui-mo #4791
- [6920]VL Following #8036, append some code cleanups @zhztheplayer #8058
- [7977]VL Include cstdint header explicitly @yabinma #8030
- [8061]VL Fall back nth_value if input is foldable @zml1206 #8062
- [8046]VLCI: fix ccache path @zhouyuan #8064
- [7905]CH Implete window's
topk
by aggregation @lgbo-ustc #7976 - [6887]VL Daily 2024_11_27) @GlutenPerfBot #8057
- [8073]CH Replace some deprecated methods about sort @lgbo-ustc #8079
- [7860]CORE In shuffle writer, replace MemoryMappedFile to avoid OOM @ccat3z #7861
- [6887]VL Daily 2024_11_28) @GlutenPerfBot #8067
- [6887]VL Daily 2024_11_29) @GlutenPerfBot #8086
- [8094]CH[Part-1] Support reading data from the iceberg with CH backend @zzcclp #8095
- [8074]CH Fix adjust output constant column @lwz9103 #8076
- [8096]CH Invalid header for disk tmp file @lgbo-ustc #8100
- [8021]CH Fix ORC read/write mismatch and parquet read failure when column with complex types contains null @taiyang-li #8023
- [8095]CH package with iceberg profile @lwz9103 #8106
- [8103][DOC] Fix TPC-H/DS queries link @merrily01 #8104
- [1632]CHDaily 20241129) @kyligence-git #8087
- CH Support separate debug symbols from so file @liuneng1994 #8083
- [1632]CHDaily 20241130) @kyligence-git #8112
- [6887]VL Daily 2024_11_30) @GlutenPerfBot #8111
- [8080]CHSupport function transform_keys/transform_values @taiyang-li #8085
- Revert "[8080]CHSupport function transform_keys/transform_values" @taiyang-li #8121
- [8119]CH Disable
max_bytes_ratio_before_external_group_by
@lgbo-ustc #8120 - [8060]CORE GlutenShuffleManager as a registry of shuffle managers @zhztheplayer #8084
- [7745]VL Incorporate SQL Union operator into Velox execution pipeline @zhztheplayer #7842
- [8090]CHRemove sparkLeast and sparkGreatest functions @KevinyhZou #8091
- VL Enable GlutenJsonExpressionsSuite @zhli1142015 #8099
- [7028]CH[Part-11] Support write parquet files with bucket @lwz9103 #8052
- [1632]CHDaily 20241203) @kyligence-git #8125
- [8046]VL Fix GHA checkout issue on centos-7 for weekly build job @PHILO-HE #8129
- [6887]VL Daily 2024_12_03) @GlutenPerfBot #8124
- [8130]CH Use the actual user instead of yarn user to read hdfs file @exmy #8131
- VL Enhance VeloxHashShuffleWriter partition buffer size estimation by incorporating complex type columns @kecookier #8089
- [7518][FOLLOWUP] Remove build_protobuf parameter from build-guide @wForget #8140
- [6887]VL Daily 2024_12_04) @GlutenPerfBot #8137
- [1632]CHDaily 20241204) @kyligence-git #8135
- CORE Add Gluten Project Improvement Proposals (GPIP) doc @yikf #8133
- VL Add back RAII style Velox driver suspension into RowVectorStream @zhztheplayer #8149
- VL Change C style casts to C++ style @rui-mo #8153
- [DOC] Fix typos in documentation @rui-mo #8155
- [7143]VL RAS: Remove experimental flags for RAS @zhztheplayer #8154
- [8148]CH Fix corr with NaN @loneylee #8150
- [1632]CHDaily 20241205) @kyligence-git #8152
- [6887]VL Daily 2024_12_05) @GlutenPerfBot #8147
- VL Minor fix for cpp code style (part 1) @rui-mo #8157
- CORE Simplify code of offload scan @zml1206 #8144
- [8159]CHRemove
SparkFunctionDecimalDivide
@KevinyhZou #8160 - [7900]VL Enable prefix sort config in spill @jinchengchenghh #7904
- CORE Bump celeborn to 0.5.2 @yikf #8054
- [6920]COREVL New APIs and refactors to allow different backends / components to be registered and used @zhztheplayer #8143
- [8142]CH Duplicated columns in group by @lgbo-ustc #8164
- [6887]VL Daily 2024_12_06) @GlutenPerfBot #8162
- [6887]VL Daily 2024_12_07) @GlutenPerfBot #8171
- CORE Add nativeFilters info for simpleString of scan @zml1206 #8169
- CORE[UNIFFLE] Bump uniffle 0.9.1 @wForget #8166
- [8115]CORE Refine the BuildSideRelation transform to support all scenarios @yikf #8116
- [6887]VL Daily 2024_12_08) @GlutenPerfBot #8174
- CORE[MIRROR] Fix performance issue when allScanPartitions are very large @WangGuangxin #8126
- CORE Query planner: Simplify validator
FallbackByNativeValidation
@zhztheplayer #8177 - VL Change loadQuantum default value to 8MB from 256MB @yikf #8186
- [6887]VL Daily 2024_12_09) @GlutenPerfBot #8178
- VL Fix upload arrow path of build bundle package gha @wForget #8193
- [1632]CHDaily 20241210) @kyligence-git #8191
- [8356]COREVLCH Make Iceberg code implement component API @zhztheplayer #8192
- Disable scheduled GHA jobs for forked repos @wForget #8189
- [6887]VL Daily 2024_12_10) @GlutenPerfBot #8190
- [8202][ICEBERG] Fix get iceberg index error @lyy-pineapple #8199
- CORE Following #8192, amend a quick fix for build info message @zhztheplayer #8205
- [7261]CORE Support offloading partial filters to native scan @zml1206 #8082
- [6887]VL Daily 2024_12_11) @GlutenPerfBot #8200
- [8043] Use spark.shuffle.spill.diskWriteBufferSize in sort-based shuffle @marin-ma #8203
- [8168] Add pre-projections for join condition @lgbo-ustc #8185
- [7755] CH translate support args with unequal length @shuai-xu #7768
- VL Minor fix for cpp code style (part 2) @rui-mo #8210
- VL Change the loadQuantum config if velox cache is enabled. @yikf #8197
- [7912]VL Flip dependency direction for gluten-delta @zhztheplayer #8218
- [1632]CHDaily 20241212) @kyligence-git #8213
- [6887]VL Daily 2024_12_12) @GlutenPerfBot #8211
- [8187]VL Support velox cache metrics @yikf #8188
- [8206]VL Support collect_set in window @WangGuangxin #8220
- VL Fix sort based shuffle oom in spill when compress was disabled @clay4megtr #7553
- [1632]CHDaily 20241213) @kyligence-git #8224
- [8229]VL Don't rewrite collect_list/collect_set in window @zml1206 #8230
- [6887]VL Daily 2024_12_13) @GlutenPerfBot #8223
- [8025]VL Respect config kSpillReadBufferSize and add spill compression codec @jinchengchenghh #8045
- [8216]CH Fix OOM when cartesian product with empty data @lwz9103 #8219
- [6887]VL Daily 2024_12_14) @GlutenPerfBot #8233
- [8208]CORE A new unified approach of source folder isolation for iceberg / hudi / delta with Maven @zhztheplayer #8198
- VL Fix crash when there are unreleased memory pools during termintating a Velox task @zhztheplayer #8243
- [6887]VL Daily 2024_12_15) @GlutenPerfBot #8235
- [MINOR] VL Enhance the gluten timer to support seconds, milliseconds, and microseconds @kecookier #8231
- [7914]CORE Flip dependency direction for gluten-celeborn @zhztheplayer #8241
- [7911]CORE Flip dependency direction for gluten-hudi @zhztheplayer #8240
- CORE Minor: OverTarget is required only with sufficient memory and doesn't spill due to zero used bytes post-borrow @kecookier #8247
- VL Support concat_ws function @PHILO-HE #8101
- CH Minor, add delta profile to package.sh @lwz9103 #8250
- [7028]CH[Part-12] Add Local SortExec for Partition Write in one pipeline mode @baibaichen #8237
- [7913]CORE Flip dependency direction for gluten-uniffle @zhztheplayer #8242
- [8128]VL Retry borrowing when granted size is less than requested in multi-slot and shared mode @kecookier #8132
- [8215]VL Support cast timestamp to date @zml1206 #8212
- [8018]CORE Introduce ApplyResourceProfileExec to apply resource profile for query stage @zjuwangg #8195
- CH Hotfix to #8212 @baibaichen #8259
- [DOC] Update HowTo.md to fix outdated link and test script location @zjuwangg #8255
- [6887]VL Daily 2024_12_16) @GlutenPerfBot #8238
- VL Allow shared dependencies for lib GCS @PHILO-HE #8251
- [6887]VL Daily 2024_12_17) @GlutenPerfBot #8248
- CORE Minor code cleanups for TreeMemoryConsumer @zhztheplayer #8254
- minor, remove deprecated gluten-clickhouse-celeborn jar @lwz9103 #8263
- VL Fix
RetryOnOomMemoryTarget
only spills one single consumer on retrying @zhztheplayer #8262 - [6887]VL Daily 2024_12_18) @GlutenPerfBot #8260
- COREVLCH GHA: Update pull request paths triggering CI @zhztheplayer #8264
- [1632]CHDaily 20241218) @kyligence-git #8261
- [7903]VL move local velox patch to oap/velox @zhouyuan #8265
- [8268]CORE Remove preconditions in OverAcquire.repay @kecookier #8269
- [6887]VL Daily 2024_12_19) @GlutenPerfBot #8270
- [8257]CORE Make IDEA support IssueNavigationLink @merrily01 #8258
- CORE Use component file to discover components @zhztheplayer #8271
- [8108]VL Correct the logic of null on failure behavior for try cast @acvictor #8107
- [1632]CHDaily 20241219) @kyligence-git #8274
- Revert Revert "[8080]CHSupport function transform_keys/transform_values" @taiyang-li #8277
- [7028]CH[Part-10] Collecting Delta stats for MergeTree @baibaichen #8029
- [6887]VL Daily 2024_12_20) @GlutenPerfBot #8286
- [8356]VL Delta support / Hudi support as Gluten components @zhztheplayer #8282
- [8266]VL[CI] Pre-install spark sources in docker image @PHILO-HE #8290
- [7641]VL Add perf analysis scripts for TPCH workload @marin-ma #8065
- [8266]VL Use pre-installed resources for Spark/Celeborn @PHILO-HE #8294
- [6887]VL Daily 2024_12_21) @GlutenPerfBot #8297
- [GLUTE-8279]CH Fix concat diff while single argument with array type is input @taiyang-li #8280
- VL RAS: A couple of minor fixes for RAS @zhztheplayer #8292
- CH Refactor: don't
using namespace DB
in header @baibaichen #8300 - [8244]CORE Softaffinity use consistent hash schedule @yikf #8245
- [6887]VL Daily 2024_12_23) @GlutenPerfBot #8301
- [7641]VL Fix security issue for perf analysis scripts @marin-ma #8309
- VL Simplify code for PartialProjectRule @zml1206 #8273
- [8253]CH Fix cast failed when in-filter with tuple values @lwz9103 #8256
- [8050]VL Add viewfs support in scan validation @JkSelf #8049
- [1632]CHDaily 20241224) @kyligence-git #8312
- [6887]VL Daily 2024_12_25) @GlutenPerfBot #8334
- CH Disable gluten arm ci @lwz9103 #8337
- [INFRA][MINOR] Change the issueRegexp to
(?:#|GLUTEN-)(\d+)
from the vcs.xml @yikf #8329 - VL RAS: A benchmark suite for performance of query optimization @zhztheplayer #8339
- [Shims] Fix the code style issue of prepareWrite @rui-mo #8316
- [8341]CH Fix code style and respect max_read_buffer_size for bzip2 read buffer @taiyang-li #8342
- [8330]VL Improve convert the viewfs path to hdfs path @wangyum #8331
- [8325]CH Fix miss matched result for
$
and.
in reg expression @lgbo-ustc #8345 - VL Remove compile option
--enable_ep_cache
@zhztheplayer #8350 - [8352][ICEBERG] Fix read error when partition column was drop @lyy-pineapple #8353
- CH [MINOR] Configure Log4j2 to print logs of
org.apache.iceberg
for tracingClickHouseIcebergSuite
aborted issues @baibaichen #8361 - [7028]CH[Part-14] Refactor Case Sensitive Support for MergeTree @baibaichen #8346
- [8060]COREVL Various of fixes for the experimental
GlutenShuffleManager
@zhztheplayer #8355 - [MINOR] Avoid duplicate comment symbol in setup scripts @Zouxxyy #8360
- [7534]CH Refactor and optimize sparkDecimalXXX functions @taiyang-li #8105
- [1632]CHDaily 20241228) @kyligence-git #8368
- [7028]CH[Part-15] [MINOR] Fix UTs @baibaichen #8364
- [8327]CORE Introduce the ConfigEntry to make the config definition more flexible @yikf #8328
- [6887]VL Daily 2024_12_30) @GlutenPerfBot #8371
- [8354]CH Fix cse issue in aggregate[Part2] @loneylee #8376
- VL Add some fixes following #8355 @zhztheplayer #8373
- [8375]CH split decimal binary arithmetic functions into files @lgbo-ustc #8378
- [8283]CH Eliminate CSE via
ExpressionParser
@lgbo-ustc #8284 - [7964]VL Support S3 Bucket Config @majetideepak #8123
- Revert "[8327]CORE Introduce the ConfigEntry to make the config definition more flexible (#8328)" @baibaichen #8382
- VL Various fixes for gluten-it @zhztheplayer #8396
- [7750]VL Move ColumnarBuildSideRelation's memory occupation to Spark off-heap @zjuwangg #8127
- [6887]VL Daily 2025_01_02) @zhouyuan #8388
- VL Allow shared dependencies for s3 and abfs libs @majetideepak #8402
- [6887]VL Daily 2025_01_03) @GlutenPerfBot #8405
- [8393]VL Fix the smj result mismatch issue @JkSelf #8394
- [8327]CORE[Part-1] Rename
GlutenConfig.getConf
toGlutenConfig.get
@yikf #8395 - VL Support loading dependency libs for Oracle linux @xinghuayu007 #8391
- [1632]CHDaily 20250103) @kyligence-git #8407
- VL Plumb ABFS config @majetideepak #8403
- [8398] Bump Celeborn to 0.4.3 and 0.5.2 @SteNicholas #8399
- [6887]VL Daily 2025_01_04) @GlutenPerfBot #8420
- [7602]CH Add spark cast array to string @zhanglistar #8392
- [8327]CORE[Part-2] Move
GlutenConfig.scala
toorg.apache.gluten.config
package dir @yikf #8426 - [6887]VL Daily 2025_01_05 @GlutenPerfBot #8424
- VL Make enabling orc scan dynamically configurable @LoseYSelf #8433
- [TEST] Fix gluten test util getExecutedPlan @j7nhai #8374
- VL Support casting timestamp type to varchar type @PHILO-HE #8338
- [7502]CHFix orc write time zone diff @KevinyhZou #7523
- [8397]CH[Part-1]: Disable hdfs while compiling clickhouse backend on macOS @yxheartipp #8400
- [6887]VL Daily 2025_01_06 @GlutenPerfBot #8427
- [8304]CORE Add an optimization rule to collapse nested get_json_object functions @KevinyhZou #8305
- [8183]CORE Prune unused column in project operator @liujiayi771 #8295
- [8408]CH Fix compile failures on ARM @loudongfeng #8413
- [1632]CHDaily 20250107) @kyligence-git #8440
- [8307]VL Support Int64 Timestamp in parquet reader @zml1206 #8308
- [6887]VL Daily 2025_01_07) @GlutenPerfBot #8439
- [7641]VL Minor fixup for the perf analysis script @marin-ma #8430
- VL Gluten-it: New option
--collect-sql-metrics=execution-time
to collect slowest plan nodes into benchmark report @zhztheplayer #8445 - [8375]CH[MINOR] Fix USE_EMBEDDED_COMPILER @baibaichen #8441
- VL Remove an out-of-date warning message @zhztheplayer #8447
New Contributors
- @wenwj0 made their first contribution #6286
- @wecharyu made their first contribution #6438
- @liujp made their first contribution #6462
- @majetideepak made their first contribution #6522
- @jiangjiangtian made their first contribution #6746
- @ArnavBalyan made their first contribution #6707
- @Preetesh2110 made their first contribution #6326
- @EpsilonPrime made their first contribution #6833
- @zuston made their first contribution #6994
- @Z1Wu made their first contribution #7038
- @JinHelin404 made their first contribution #7209
- @beliefer made their first contribution #7334
- @pratham76 made their first contribution #7308
- @CodenameGHOST007 made their first contribution #7411
- @Henry2SS made their first contribution #7476
- @Zand100 made their first contribution #7736
- @yabinma made their first contribution #8030
- @merrily01 made their first contribution #8104
- @clay4megtr made their first contribution #7553
- @xinghuayu007 made their first contribution #8391
- @LoseYSelf made their first contribution #8433
- @yxheartipp made their first contribution #8400
Full Changelog: v1.2.1...v1.3.0-preview