-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1116] Read authentication configs from HADOOP_CONF_DIR
#2082
Conversation
@@ -80,7 +80,7 @@ object CelebornHadoopUtils extends Logging { | |||
// If we are accessing HDFS and it has Kerberos enabled, we have to login | |||
// from a keytab file so that we can access HDFS beyond the kerberos ticket expiration. | |||
UserGroupInformation.setConfiguration(hadoopConf) | |||
if (conf.hdfsStorageKerberosEnabled) { | |||
if ("kerberos" == hadoopConf.get("hadoop.security.authentication")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value is case insensitive
Codecov Report
@@ Coverage Diff @@
## main #2082 +/- ##
==========================================
- Coverage 46.70% 46.48% -0.21%
==========================================
Files 165 166 +1
Lines 10668 10695 +27
Branches 972 977 +5
==========================================
- Hits 4981 4971 -10
- Misses 5362 5398 +36
- Partials 325 326 +1
... and 13 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
I missed the review in the previous PR. keytab is enforced in the current implementation for Kerberos authentication, it's a common case for long-running services, but it's also possible to use TGT cache for Kerberos authentication (I know in some companies, TGT cache refresh is handled by the infra team in container/machine setting up phase, so that the application don't need to handle that and don't need to manage and periodically renew the keytab) For example, We'd better document such limitations in the log message and docs clearly |
HADOOP_CONF_DIR
Yes, as you have said spark supports using TGT cache. Celeborn does not support that yet. If using celeborn with kerberized HDFS, just setting the principle and keytab could be enough. |
it's quite easy to support auth from tgt cache
|
I'll try it. |
Thanks @pan3793 , you are right. I've test this PR on a cluster and it works well. |
@FMX thanks for the confirmation, have you tested the negative cases? e.g. access kerberized HDFS without keytab and TGT, does the error message intuitive? |
I'll test the negative scenario in a moment. |
Not all users' Celeborn and HDFS are on the same cluster. If they are on different clusters, Celeborn may not have HADOOP_CONF_DIR. In this case, you can directly set the Hadoop configs. |
@liujiayi771 what blocks you creating a folder(e.g. |
In practice, we usually add |
common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
Outdated
Show resolved
Hide resolved
…HadoopUtils.scala
I didn't mention which approach is better, I simply suggested providing explanations for both methods. |
@liujiayi771 Now, the kerberos login behavior is exactly same with the Hadoop, and the feature of hadoop configuration override is mentioned at https://celeborn.apache.org/docs/latest/migration/#upgrading-from-02-to-03
Also, we can emphasize that at https://celeborn.apache.org/docs/latest/configuration/ page too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Cheng Pan <[email protected]>
The negative case result looks OK to me. I believe that if users are using HDFS with Kerbero then they should know something about Kerberos.
|
common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
Outdated
Show resolved
Hide resolved
common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
Outdated
Show resolved
Hide resolved
…HadoopUtils.scala
…HadoopUtils.scala
### What changes were proposed in this pull request? 1. Make Celeborn read configs from HADOOP_COND_DIR. 2. Remove unnecessary Kerberos configs. ### Why are the changes needed? To support HDFS with Kerberos. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? GA and cluster. Closes #2082 from FMX/B1116. Lead-authored-by: mingji <[email protected]> Co-authored-by: Fu Chen <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Co-authored-by: Ethan Feng <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]>
What changes were proposed in this pull request?
Why are the changes needed?
To support HDFS with Kerberos.
Does this PR introduce any user-facing change?
NO.
How was this patch tested?
GA and cluster.