Skip to content

Commit

Permalink
[CELEBORN-1669] Fix NullPointerException for PartitionFilesSorter#upd…
Browse files Browse the repository at this point in the history
…ateSortedShuffleFiles after cleaning up expired shuffle key

### What changes were proposed in this pull request?

Fix `NullPointerException` for `PartitionFilesSorter#updateSortedShuffleFiles` after cleaning up expired shuffle key.

### Why are the changes needed?

`PartitionFilesSorter` sorts shuffle files in `worker-file-sorter-executor` thread and cleans up expired key in `worker-expired-shuffle-cleaner` thread. There is a case that after `worker-expired-shuffle-cleaner` cleaning up expired shuffle key, `worker-file-sorter-executor` updates sorted shuffle files, which causes `NullPointerException` at present.

```
2024-10-23 17:26:17,162 [INFO] [worker-expired-shuffle-cleaner] - org.apache.celeborn.service.deploy.worker.Worker -Logging.scala(51) -Cleaned up expired shuffle application_1724141892576_3843182_1-0
2024-10-23 17:26:17,392 [ERROR] [worker-file-sorter-executor-237572] - org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter -PartitionFilesSorter.java(752) -Sorting shuffle file for application_1724141892576_3843182_1-0-1875-0-0 /mnt/storage02/celeborn-worker/shuffle_data/application_1724141892576_3843182_1/0/1875-0-0 failed, detail:
java.lang.NullPointerException: null
    at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.updateSortedShuffleFiles(PartitionFilesSorter.java:455) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
    at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter$FileSorter.sort(PartitionFilesSorter.java:747) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
    at org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.lambda$new$1(PartitionFilesSorter.java:164) ~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_162]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_162]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_162]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_162]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_162]
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2847 from SteNicholas/CELEBORN-1669.

Authored-by: SteNicholas <[email protected]>
Signed-off-by: mingji <[email protected]>
  • Loading branch information
SteNicholas authored and FMX committed Oct 24, 2024
1 parent 216152d commit 4b150be
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,10 @@ public void initSortedShuffleFiles(String shuffleKey) {

@VisibleForTesting
public void updateSortedShuffleFiles(String shuffleKey, String fileId, long fileLength) {
sortedShuffleFiles.get(shuffleKey).add(fileId);
Set<String> shuffleFiles = sortedShuffleFiles.get(shuffleKey);
if (shuffleFiles != null) {
shuffleFiles.add(fileId);
}
sortedFileCount.incrementAndGet();
sortedFilesSize.addAndGet(fileLength);
}
Expand Down

0 comments on commit 4b150be

Please sign in to comment.