Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed case about test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test #9923

Closed
winningsix opened this issue Dec 1, 2023 · 4 comments · Fixed by #9970
Assignees
Labels
bug Something isn't working

Comments

@winningsix
Copy link
Collaborator

Describe the bug
It fails at test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test.

Failed at 321 DBX. From the log, it seems some different behavior in overflow exception.

error
AssertionError: Expected error 'Rounding necessary' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o403016.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 8103.0 failed 1 times, most recent failure: Lost task 2.0 in stage 8103.0 (TID 25287) (10.2.128.16 executor driver): java.lang.ArithmeticException: Overflow
	at java.math.BigDecimal$LongOverflow.check(BigDecimal.java:3152)
	at java.math.BigDecimal.longValueExact(BigDecimal.java:3137)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:761)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:82)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:208)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:97)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3396)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3327)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3316)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3316)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3609)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3547)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3535)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1182)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1170)
	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2750)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$runSparkJobs$1(Collector.scala:297)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:293)
	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:377)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:128)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:135)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:122)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:110)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:92)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$1(ResultCacheManager.scala:541)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:529)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:549)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:402)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:395)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:289)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:506)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:503)
	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:4105)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4373)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:819)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4371)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:233)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:417)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:178)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1038)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:128)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:367)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4371)
	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:4103)
	at sun.reflect.GeneratedMethodAccessor136.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ArithmeticException: Overflow
	at java.math.BigDecimal$LongOverflow.check(BigDecimal.java:3152)
	at java.math.BigDecimal.longValueExact(BigDecimal.java:3137)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:761)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:82)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:208)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:97)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more'
Stacktrace
data_gen = Decimal(20,7)
    @pytest.mark.parametrize('data_gen', [DecimalGen(7, 7), DecimalGen(20, 7)], ids=idfn)
    @allow_non_gpu(*non_utc_allow)
    def test_timestamp_seconds_rounding_necessary(data_gen):
>       assert_gpu_and_cpu_error(
            lambda spark : unary_op_df(spark, data_gen).selectExpr("timestamp_seconds(a)").collect(),
            conf={},
            error_message='Rounding necessary')
../../src/main/python/date_time_test.py:579: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:646: in assert_gpu_and_cpu_error
    assert_spark_exception(lambda: with_cpu_session(df_fun, conf), error_message)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
func = <function assert_gpu_and_cpu_error.<locals>.<lambda> at 0x7ff313f66700>
error_message = 'Rounding necessary'
    def assert_spark_exception(func, error_message):
        """
        Assert that a specific Java exception is thrown
        :param func: a function to be verified
        :param error_message: a string such as the one produce by java.lang.Exception.toString
        :return: Assertion failure if no exception matching error_message has occurred.
        """
        with pytest.raises(Exception) as excinfo:
            func()
        actual_error = excinfo.exconly()
>       assert error_message in actual_error, f"Expected error '{error_message}' did not appear in '{actual_error}'"
E       AssertionError: Expected error 'Rounding necessary' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o403016.collectToPython.
E       : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 8103.0 failed 1 times, most recent failure: Lost task 2.0 in stage 8103.0 (TID 25287) (10.2.128.16 executor driver): java.lang.ArithmeticException: Overflow
E       	at java.math.BigDecimal$LongOverflow.check(BigDecimal.java:3152)
E       	at java.math.BigDecimal.longValueExact(BigDecimal.java:3137)
E       	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
E       	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
E       	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:761)
E       	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:82)
E       	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:208)
E       	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
E       	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
E       	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
E       	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
E       	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.Task.run(Task.scala:97)
E       	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
E       	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
E       	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
E       	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
E       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
E       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
E       	at java.lang.Thread.run(Thread.java:750)
E       
E       Driver stacktrace:
E       	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3396)
E       	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3327)
E       	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3316)
E       	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
E       	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
E       	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
E       	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3316)
E       	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
E       	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
E       	at scala.Option.foreach(Option.scala:407)
E       	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
E       	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3609)
E       	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3547)
E       	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3535)
E       	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
E       	at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1182)
E       	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E       	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
E       	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1170)
E       	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2750)
E       	at org.apache.spark.sql.execution.collect.Collector.$anonfun$runSparkJobs$1(Collector.scala:297)
E       	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E       	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
E       	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:293)
E       	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:377)
E       	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:128)
E       	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:135)
E       	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:122)
E       	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:110)
E       	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:92)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$1(ResultCacheManager.scala:541)
E       	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:529)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:549)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:402)
E       	at scala.Option.getOrElse(Option.scala:189)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:395)
E       	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:289)
E       	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:506)
E       	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
E       	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:503)
E       	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:4105)
E       	at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4373)
E       	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:819)
E       	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4371)
E       	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:233)
E       	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:417)
E       	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:178)
E       	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1038)
E       	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:128)
E       	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:367)
E       	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4371)
E       	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:4103)
E       	at sun.reflect.GeneratedMethodAccessor136.invoke(Unknown Source)
E       	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E       	at java.lang.reflect.Method.invoke(Method.java:498)
E       	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E       	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
E       	at py4j.Gateway.invoke(Gateway.java:306)
E       	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E       	at py4j.commands.CallCommand.execute(CallCommand.java:79)
E       	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
E       	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
E       	at java.lang.Thread.run(Thread.java:750)
E       Caused by: java.lang.ArithmeticException: Overflow
E       	at java.math.BigDecimal$LongOverflow.check(BigDecimal.java:3152)
E       	at java.math.BigDecimal.longValueExact(BigDecimal.java:3137)
E       	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
E       	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
E       	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:761)
E       	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:82)
E       	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:208)
E       	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
E       	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
E       	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
E       	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
E       	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.scheduler.Task.run(Task.scala:97)
E       	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
E       	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
E       	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
E       	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
E       	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
E       	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
E       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
E       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
E       	... 1 more'
@winningsix winningsix added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 1, 2023
@winningsix winningsix changed the title [BUG] test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test [BUG] Failed case about test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test Dec 1, 2023
@winningsix
Copy link
Collaborator Author

cc @thirtiseven

@revans2
Copy link
Collaborator

revans2 commented Dec 5, 2023

Just hit this again in a nightly build.

DATAGEN_SEED=1701787361, INJECT_OOM

java.lang.ArithmeticException: Rounding necessary
...
at java.math.BigDecimal.commonNeedIncrement(BigDecimal.java:4179)

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 5, 2023
@thirtiseven thirtiseven self-assigned this Dec 6, 2023
@thirtiseven
Copy link
Collaborator

thirtiseven commented Dec 6, 2023

timestamp_seconds will check for rounding necessary before overflow, the Decimal(20,7) case satisfies both rounding necessary and overflow, so it should complain about rounding necessary.

It looks like the first element in the df with the failed seed is 1793879511158.1649100, so it somehow bypasses the rounding necessary check and passes the overflow check next. But I can't reproduce it locally. (updated: I can repro with length=5 in this case with this seed)

@thirtiseven
Copy link
Collaborator

Fixed in #9970

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants