Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1530] support MPU for S3 #2830

Closed
wants to merge 15 commits into from
Closed

Conversation

zhaohehuhu
Copy link
Contributor

@zhaohehuhu zhaohehuhu commented Oct 21, 2024

What changes were proposed in this pull request?

as title

Why are the changes needed?

AWS S3 doesn't support append, so Celeborn had to copy the historical data from s3 to worker and write to s3 again, which heavily scales out the write. This PR implements a better solution via MPU to avoid copy-and-write.

Does this PR introduce any user-facing change?

How was this patch tested?

WechatIMG257

I conducted an experiment with a 1GB input dataset to compare the performance of Celeborn using only S3 storage versus using SSD storage. The results showed that Celeborn with SSD storage was approximately three times faster than with only S3 storage.

Screenshot 2024-11-16 at 13 02 10

The above screenshot is the second test with 5000 mapper and reducer that I did.

@FMX FMX changed the title support MPU for S3 [CELEBORN-1530] support MPU for S3 Oct 21, 2024
@FMX FMX self-requested a review October 21, 2024 08:53
@FMX
Copy link
Contributor

FMX commented Oct 23, 2024

Thanks for this PR. Are there any test results?

@zhaohehuhu
Copy link
Contributor Author

Thanks for this PR. Are there any test results?
Not yet. I'm ready to do a benchmark for it.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR but there are some points to polish.

</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is duplicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

<name>aws-mpu-deps</name>
</property>
</activation>
<dependencies>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these dependencies can be moved to dependencies section because this module is loaded when aws-mpu profile is activated only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


<profiles>
<profile>
<id>aws-mpu</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile name can be changed to aws.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


package org.apache.celeborn.server.common.service.mpu.bean;

public class AWSCredentials {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should not be in the common module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

worker/pom.xml Outdated
<property>
<name>aws-mpu-deps</name>
</property>
</activation>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This segment is not needed.

<activation>
        <property>
          <name>aws-mpu-deps</name>
        </property>
      </activation>

DynConstructors.builder()
.impl(
"org.apache.celeborn.S3MultipartUploadHandler",
awsCredentials.getClass(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the arguments to S3MultipartUploadHandler should be enough for this scenerio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

task = new S3FlushTask(flushBuffer, diskFileInfo.getDfsPath(), notifier, true);
task =
new S3FlushTask(
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber);
flushBuffer, notifier, true, s3MultipartUploadHandler, partNumber++);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -273,6 +310,7 @@ public void flush(boolean finalFlush, boolean fromEvict) throws IOException {
if (task != null) {
addTask(task);
flushBuffer = null;
partNumber++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

logger.warn("Abort s3 multipart upload for {}", diskFileInfo.getFilePath());
s3MultipartUploadHandler.complete();
}

if (notifier.hasException()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two if blocks can be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.lang.{Long => JLong}
import java.util.{List => JList}

case class MultipartUploadRequestParam(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused class. Can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zyclove
Copy link

zyclove commented Oct 30, 2024

@zhaohehuhu @FMX @WillemJiang
When can we expect this pr to be fully validated and merged into the main branch? It's very important.

@zhaohehuhu
Copy link
Contributor Author

@zhaohehuhu @FMX @WillemJiang When can we expect this pr to be fully validated and merged into the main branch? It's very important.

I still need more time to fully test it as S3 has some limitations related to MPU.

@FMX
Copy link
Contributor

FMX commented Nov 11, 2024

@zhaohehuhu @FMX @WillemJiang When can we expect this pr to be fully validated and merged into the main branch? It's very important.

Every PR should be production-ready before it's been merged.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Merged into main(v0.6.0).

@FMX FMX closed this in a2d3972 Nov 22, 2024
@zhaohehuhu zhaohehuhu deleted the dev-1021 branch November 22, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants