Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1116] Read authentication configs from HADOOP_CONF_DIR #2082

Closed
wants to merge 7 commits into from

Conversation

FMX
Copy link
Contributor

@FMX FMX commented Nov 8, 2023

What changes were proposed in this pull request?

  1. Make Celeborn read configs from HADOOP_COND_DIR.
  2. Remove unnecessary Kerberos configs.

Why are the changes needed?

To support HDFS with Kerberos.

Does this PR introduce any user-facing change?

NO.

How was this patch tested?

GA and cluster.

@FMX FMX changed the title [B1116] Read authentication configs from hadoop_conf_dir [CELEBORN-1116] Read authentication configs from hadoop_conf_dir Nov 8, 2023
@@ -80,7 +80,7 @@ object CelebornHadoopUtils extends Logging {
// If we are accessing HDFS and it has Kerberos enabled, we have to login
// from a keytab file so that we can access HDFS beyond the kerberos ticket expiration.
UserGroupInformation.setConfiguration(hadoopConf)
if (conf.hdfsStorageKerberosEnabled) {
if ("kerberos" == hadoopConf.get("hadoop.security.authentication")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value is case insensitive

Copy link

codecov bot commented Nov 8, 2023

Codecov Report

Merging #2082 (2bfc978) into main (efa22a4) will decrease coverage by 0.21%.
Report is 8 commits behind head on main.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #2082      +/-   ##
==========================================
- Coverage   46.70%   46.48%   -0.21%     
==========================================
  Files         165      166       +1     
  Lines       10668    10695      +27     
  Branches      972      977       +5     
==========================================
- Hits         4981     4971      -10     
- Misses       5362     5398      +36     
- Partials      325      326       +1     
Files Coverage Δ
...cala/org/apache/celeborn/common/CelebornConf.scala 87.57% <ø> (-0.04%) ⬇️
...che/celeborn/common/util/CelebornHadoopUtils.scala 12.83% <0.00%> (+0.33%) ⬆️

... and 13 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

I missed the review in the previous PR.

keytab is enforced in the current implementation for Kerberos authentication, it's a common case for long-running services, but it's also possible to use TGT cache for Kerberos authentication (I know in some companies, TGT cache refresh is handled by the infra team in container/machine setting up phase, so that the application don't need to handle that and don't need to manage and periodically renew the keytab)

For example, spark-submit allows to use --principal and --keytab, but also allows to use TGT cache (e.g. perform kinit before spark-submit) to access the kerberized services.

We'd better document such limitations in the log message and docs clearly

@cfmcgrady cfmcgrady changed the title [CELEBORN-1116] Read authentication configs from hadoop_conf_dir [CELEBORN-1116] Read authentication configs from HADOOP_CONF_DIR Nov 8, 2023
@FMX
Copy link
Contributor Author

FMX commented Nov 8, 2023

Yes, as you have said spark supports using TGT cache. Celeborn does not support that yet. If using celeborn with kerberized HDFS, just setting the principle and keytab could be enough.

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

it's quite easy to support auth from tgt cache

val kerberosEnabled = "KERBEROS".equalsIgnoreCase(hadoopConf.get("hadoop.security.authentication"))
if (kerberosEnabled) {
  val principalOpt = ...
  val keytabOpt = ...
  (principalOpt, keytabOpt) match {
    case (Some(principal), Some(keytab)) =>
      logInfo("Kerberos is enabled, attempt to login with principal: $principal and keytab: $keytab")
      UserGroupInformation.loginUserFromKeytab(principal, keytab)
    case _ =>
      logInfo("Kerberos is enabled without principal and keytab supplied, assuming keytab is managed externally")
      UserGroupInformation.getCurrentUser()
  }
}

@FMX
Copy link
Contributor Author

FMX commented Nov 8, 2023

I'll try it.

@FMX FMX closed this Nov 8, 2023
@FMX FMX reopened this Nov 8, 2023
@FMX
Copy link
Contributor Author

FMX commented Nov 8, 2023

Thanks @pan3793 , you are right. I've test this PR on a cluster and it works well.

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

@FMX thanks for the confirmation, have you tested the negative cases? e.g. access kerberized HDFS without keytab and TGT, does the error message intuitive?

@FMX
Copy link
Contributor Author

FMX commented Nov 8, 2023

I'll test the negative scenario in a moment.

@liujiayi771
Copy link
Contributor

liujiayi771 commented Nov 8, 2023

Not all users' Celeborn and HDFS are on the same cluster. If they are on different clusters, Celeborn may not have HADOOP_CONF_DIR. In this case, you can directly set the Hadoop configs.

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

@liujiayi771 what blocks you creating a folder(e.g. /opt/hadoop/conf) and put the Hadoop client configurations under that, then set export HADOOP_CONF_DIR=/opt/hadoop/conf in celeborn-env.sh?

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

In practice, we usually add HADOOP_CONF_DIR into classpath so that Hadoop client could pick it up from classpath

docs/deploy.md Outdated Show resolved Hide resolved
@liujiayi771
Copy link
Contributor

I didn't mention which approach is better, I simply suggested providing explanations for both methods.

@pan3793
Copy link
Member

pan3793 commented Nov 8, 2023

@liujiayi771 Now, the kerberos login behavior is exactly same with the Hadoop, and the feature of hadoop configuration override is mentioned at https://celeborn.apache.org/docs/latest/migration/#upgrading-from-02-to-03

Since 0.3.0, Celeborn supports overriding Hadoop configuration(core-site.xml, hdfs-site.xml, etc.) from Celeborn configuration with the additional prefix celeborn.hadoop.. On Spark client side, user should set Hadoop configuration like spark.celeborn.hadoop.foo=bar, note that spark.hadoop.foo=bar does not take effect; on Flink client and Celeborn Master/Worker side, user should set like celeborn.hadoop.foo=bar.

Also, we can emphasize that at https://celeborn.apache.org/docs/latest/configuration/ page too

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

FMX and others added 2 commits November 9, 2023 10:16
@FMX
Copy link
Contributor Author

FMX commented Nov 9, 2023

The negative case result looks OK to me. I believe that if users are using HDFS with Kerbero then they should know something about Kerberos.

23/11/09 10:14:52,256 WARN [celeborn-dispatcher-5] Client: Exception encountered while connecting to the server
org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)

waitinfuture pushed a commit that referenced this pull request Nov 9, 2023
### What changes were proposed in this pull request?
1. Make Celeborn read configs from HADOOP_COND_DIR.
2. Remove unnecessary Kerberos configs.

### Why are the changes needed?
To support HDFS with Kerberos.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #2082 from FMX/B1116.

Lead-authored-by: mingji <[email protected]>
Co-authored-by: Fu Chen <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Co-authored-by: Ethan Feng <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants