Perform handle spill IO outside of locked section in SpillFramework #11880

zpuller · 2024-12-16T22:23:01Z

Addresses #11830

Moves the IO outside of the critical section in the spillable buffer handle spill functions to allow threads interacting with the SpillFramework to manage the spill state of a handle consistently without being blocked on IO. Eg. if thread t1 is in the middle of spilling, and thread t2 wants to check whether this handle is currently spilling, it doesn't need to wait for the spill IO operation to complete in order to check whether the handle it spillable.

Signed-off-by: Zach Puller <[email protected]>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

Signed-off-by: Zach Puller <[email protected]>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

Signed-off-by: Zach Puller <[email protected]>

revans2

Could you please update the docs for all of the public APIs in SpillableHandle and for SpillableHandle itself to make it clear what APIs need to be thread safe and how they are supposed to be protected. For example spill has no indication as to how it should behave when multiple people call it. We can infer that it one wins and the others fail. I want this mostly so it is clear what the contract is for each of these APIs. spillable being a best effort API intended only for quick filtering is fine, but it needs to be documented so if someone tries to use it in a way that requires it to be exact we know that is wrong and it violates the contract. This also will let me reason about the code and who is calling it so that I can better reason about the correctness.

Also could you please explain how error handling/error recovery is intended to happen when spilling? Like if an exception is thrown while we are in the middle of trying to spill something what should happen. Are we supposed to just keep it in the spilling state?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

zpuller · 2024-12-17T21:20:12Z

Could you please update the docs for all of the public APIs in SpillableHandle and for SpillableHandle itself to make it clear what APIs need to be thread safe and how they are supposed to be protected.

Absolutely yeah, I'm just planning to resolve the other outstanding issues in the PR and ensure I have a clear understanding and answer to the other questions, and then will document accordingly.

Signed-off-by: Zach Puller <[email protected]>

zpuller · 2024-12-19T18:33:10Z

I discussed offline with @abellina and he gave me some ideas on how to rework this to not require a separate lock, but to just move the IO component out of the protected section of spill and swap the buffers atomically. I pushed some changes but I'm also converting this back to draft as it's not fully ready for review yet. I still have to address some comments above, plus I am planning to do some manual testing with pulling in the most recent 25.02 branch changes.

jihoonson · 2024-12-19T22:54:19Z

I'm triggering a build job to see if some test consistently fails for all PRs.

jihoonson · 2024-12-19T22:54:21Z

build

Signed-off-by: Zach Puller <[email protected]>

…into spill_lock

Signed-off-by: Zach Puller <[email protected]>

zpuller · 2025-01-14T19:28:41Z

I've pushed several updates to the branch. Per my previous comment, now the changes are structured in such a way that the interface does not need to change.

I was able to test this and see evidence in traces showing that it allows multiple threads to spill concurrently.

I considered refactoring some duplicate code fragments between the handle types around how we do the buffer swaps etc. but I'm not sure if it will actually simplify the overall code readability/PR so I held off for now.

Signed-off-by: Zach Puller <[email protected]>

abellina · 2025-01-15T17:25:50Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

    }
  }

  override def close(): Unit = {
+    // do we need to Cuda.deviceSynchronize here?
+    // what if we don't spill


I don't see a need here. close is called from the user code and that's who owns the handle, so the caller needs to be careful about calling synchronize before closing memory, but at the same time we use stream aware allocators, so I don't see cases where we need to add extra synchronization.

Oh sorry, I know we just discussed that offline, I forgot to delete this comment

abellina · 2025-01-15T17:26:04Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

@@ -527,33 +601,64 @@ class SpillableColumnarBatchHandle private (
    materialized
  }

+
+  private var toSpill: Option[ColumnarBatch] = None
+  private var spilled: Option[ColumnarBatch] = None


why do we need both toSpill and spilled?

toSpill is used to both indicate which thread is currently spilling as well as hold a reference to the underlying buffer during spill. spilled is used to make sure the resource gets cleaned up properly after spilling. It's possible that there's some clever way to consolidate them that I couldn't think of

but for instance if you are spilling only have toSpill set and you do toSpill = None, dev = None when closing the handle, then when spill IO finishes and it tries to swap the buffers you'll end up with all reference set to None

So what do you expect the final state of a closed handle to look like? I would expect them to all be set to None after we closed anything that could not be closed in when the spill was happening.

To me it just feels simpler that if we see a close when a spill is happening, we mark the handle as closed, but we don't try to close anything. Then when the spill is finished, but still in the synchronized step we call close again (or perhaps an underlying close method) to finish the job.

I would also really like some tests to verify that this is working because I keep seeing race conditions and I don't trust myself to have caught all of them.

Yep they should all be set to None. I've tried to rework it to put as much as possible into a separate close impl method that we can invoke from close or spill as you described. It only doesn't handle the local variables like staging but I still set those to None and close, separately within the spill call.

Also added some monte carlo style tests to test overlaying async closing and spilling at different delays. I was able to see in my local testing that I'm producing all possible branches of the race so to speak, but I removed the print statements for the PR, not sure if it's sufficient to simply run the tests and see that no buffers leak, or if we want to add some test helper logic into SpillFramework to actually assert that we triggered the race. For now I left that out as it may be overkill.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

Signed-off-by: Zach Puller <[email protected]>

revans2 · 2025-01-17T21:27:27Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

+              host = stagingHost
+            }
+            spilled = dev
+            dev = None


I think we need to have code to handle the case where close was called in the middle of a spill.

It looks like we are going to leak stagingHost, and dev which is then stored into spilled with the current code.

revans2 · 2025-01-17T21:32:03Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

@@ -527,33 +601,64 @@ class SpillableColumnarBatchHandle private (
    materialized
  }

+
+  private var toSpill: Option[ColumnarBatch] = None
+  private var spilled: Option[ColumnarBatch] = None


So what do you expect the final state of a closed handle to look like? I would expect them to all be set to None after we closed anything that could not be closed in when the spill was happening.

To me it just feels simpler that if we see a close when a spill is happening, we mark the handle as closed, but we don't try to close anything. Then when the spill is finished, but still in the synchronized step we call close again (or perhaps an underlying close method) to finish the job.

I would also really like some tests to verify that this is working because I keep seeing race conditions and I don't trust myself to have caught all of them.

Signed-off-by: Zach Puller <[email protected]>

…into spill_lock

Signed-off-by: Zach Puller <[email protected]>

abellina · 2025-01-22T17:14:23Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

@@ -160,6 +155,11 @@ trait StoreHandle extends AutoCloseable {
   *   removed on shutdown, or by handle.close, but 0-byte handles are not spillable.
   */
  val approxSizeInBytes: Long
+
+  /**
+   * This is used to resolve races between closing a handle while spilling.


Suggested change

* This is used to resolve races between closing a handle while spilling.

* This is used to resolve races between closing a handle and spilling.

abellina · 2025-01-22T17:16:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

        }
+        sizeInBytes
+      } else {
+          0


spacing is odd here.

abellina · 2025-01-22T17:27:23Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

+            } else {
+              host = staging
+            }
+            // set spilled to dev instead of toSpill so that if dev was already closed during spill,


I am a little confused by this comment. toSpill is dev. It's just a reference to it. So when you increased the refcount at line 649, you increased it for dev and toSpill. I don't believe it makes much of a difference what we set spilled to (dev or toSpill) but we should just remove the comment I think.

The logic here is that it's an issue of timing: yes we set toSpill to equal dev but later if we call close, we intentionally leave toSpill as is in case we are mid spill, but set dev to None, therefore the two variables are no longer equal at that point.

Having said that, I think I may be able to simplify this now that we explicitly keep track of closed but let me check.

abellina · 2025-01-22T17:29:38Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

    }
  }

  private def withChunkedPacker[T](body: ChunkedPacker => T): T = {
    val tbl = synchronized {
-      if (dev.isEmpty) {
+      if (toSpill.isEmpty) {


this is a bad design for this method, and it's my fault. Could we pass to withChunkedPacker a batch instead? That way we don't have this exception and don't rely on state.

So the signature would become:

private def withChunkedPacker[T](batchToPack: ColumnarBatch)(body: ChunkedPacker => T): T

abellina · 2025-01-22T17:32:26Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

+            } else {
+              host = staging
+            }
+            // set spilled to dev instead of toSpill so that if dev was already closed during spill,


same here, the comment is confusing to me.

abellina · 2025-01-22T17:34:34Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

    }
  }

-  override def close(): Unit = {
+  private def doClose(): Unit = synchronized {
    releaseDeviceResource()


so now releaseDeviceResource is inside of the handle lock. This method ends up calling the spill store and takes its lock. I was trying to avoid this to prevent lock inversion deadlocks. Do you have a reason to move this inside of the handle lock?

I think it shouldn't be needed, let me try to undo that.

zpuller added 4 commits December 13, 2024 13:51

try out a spill lock

1f97e53

Signed-off-by: Zach Puller <[email protected]>

fix test

128928d

Signed-off-by: Zach Puller <[email protected]>

clean

664ec96

Signed-off-by: Zach Puller <[email protected]>

add host buffer handle spill lock test

82b0d7e

Signed-off-by: Zach Puller <[email protected]>

zpuller changed the title ~~Spill lock~~ Add an explicit spill lock to the Spill Framework Dec 16, 2024

jlowe reviewed Dec 16, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Outdated Show resolved Hide resolved

revans2 reviewed Dec 17, 2024

View reviewed changes

zpuller added 2 commits December 17, 2024 09:43

pr comments

2cf6f30

Signed-off-by: Zach Puller <[email protected]>

atomic bool

4e3624e

Signed-off-by: Zach Puller <[email protected]>

abellina reviewed Dec 17, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Outdated Show resolved Hide resolved

fix comment

9be9b38

Signed-off-by: Zach Puller <[email protected]>

revans2 reviewed Dec 17, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Outdated Show resolved Hide resolved

zpuller added 2 commits December 19, 2024 10:29

safe concurrency atomic swaps

47e55ad

Signed-off-by: Zach Puller <[email protected]>

no longer rely on setSpilling

d568860

Signed-off-by: Zach Puller <[email protected]>

zpuller marked this pull request as draft December 19, 2024 18:30

zpuller changed the title ~~Add an explicit spill lock to the Spill Framework~~ Perform handle spill IO outside of locked section in SpillFramework Dec 19, 2024

zpuller added 10 commits January 8, 2025 10:10

fix races

48e30fb

Signed-off-by: Zach Puller <[email protected]>

[BROKEN] columnar batch impl

8c449b3

Signed-off-by: Zach Puller <[email protected]>

fix race

3c408b7

Signed-off-by: Zach Puller <[email protected]>

remaining impl

3994fb1

Signed-off-by: Zach Puller <[email protected]>

revert file

584b5e4

Signed-off-by: Zach Puller <[email protected]>

cleaning

c33ce71

Signed-off-by: Zach Puller <[email protected]>

clean up handle closing

340c331

Signed-off-by: Zach Puller <[email protected]>

comments

1bac407

Signed-off-by: Zach Puller <[email protected]>

license header

3d3f7f7

Signed-off-by: Zach Puller <[email protected]>

Merge branch 'branch-25.02' of https://github.com/zpuller/spark-rapids …

4f18c70

…into spill_lock

zpuller added 2 commits January 14, 2025 10:45

comments

5b399bd

Signed-off-by: Zach Puller <[email protected]>

comments

d82890d

Signed-off-by: Zach Puller <[email protected]>

zpuller marked this pull request as ready for review January 14, 2025 19:28

fix

07d8e7d

Signed-off-by: Zach Puller <[email protected]>

zpuller requested review from revans2, jlowe and abellina and removed request for jlowe January 15, 2025 17:14

abellina reviewed Jan 15, 2025

View reviewed changes

revans2 reviewed Jan 15, 2025

View reviewed changes

zpuller added 2 commits January 15, 2025 13:01

pr comments

04544e9

Signed-off-by: Zach Puller <[email protected]>

pr comments

97ba88f

Signed-off-by: Zach Puller <[email protected]>

zpuller requested review from revans2 and abellina January 17, 2025 17:17

revans2 reviewed Jan 17, 2025

View reviewed changes

pr comments

2bf8755

Signed-off-by: Zach Puller <[email protected]>

mattahrens assigned zpuller Jan 21, 2025

zpuller added 3 commits January 22, 2025 07:15

add monte carlo tests

6577c7f

Signed-off-by: Zach Puller <[email protected]>

Merge branch 'branch-25.02' of https://github.com/NVIDIA/spark-rapids …

1a460bf

…into spill_lock

update license header

cb01f80

Signed-off-by: Zach Puller <[email protected]>

zpuller requested a review from revans2 January 22, 2025 16:11

abellina reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform handle spill IO outside of locked section in SpillFramework #11880

Perform handle spill IO outside of locked section in SpillFramework #11880

zpuller commented Dec 16, 2024 •

edited

Loading

revans2 left a comment

zpuller commented Dec 17, 2024

zpuller commented Dec 19, 2024

jihoonson commented Dec 19, 2024

jihoonson commented Dec 19, 2024

zpuller commented Jan 14, 2025

abellina Jan 15, 2025

zpuller Jan 15, 2025

abellina Jan 15, 2025

zpuller Jan 15, 2025

zpuller Jan 15, 2025

revans2 Jan 17, 2025

zpuller Jan 22, 2025

revans2 Jan 17, 2025

revans2 Jan 17, 2025

abellina Jan 22, 2025

abellina Jan 22, 2025

abellina Jan 22, 2025

zpuller Jan 22, 2025

abellina Jan 22, 2025

abellina Jan 22, 2025

abellina Jan 22, 2025

zpuller Jan 22, 2025

	* This is used to resolve races between closing a handle while spilling.
	* This is used to resolve races between closing a handle and spilling.

Perform handle spill IO outside of locked section in SpillFramework #11880

Are you sure you want to change the base?

Perform handle spill IO outside of locked section in SpillFramework #11880

Conversation

zpuller commented Dec 16, 2024 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

zpuller commented Dec 17, 2024

zpuller commented Dec 19, 2024

jihoonson commented Dec 19, 2024

jihoonson commented Dec 19, 2024

zpuller commented Jan 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zpuller commented Dec 16, 2024 •

edited

Loading