-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect multiple jars on the classpath when init plugin [databricks] #9654
Detect multiple jars on the classpath when init plugin [databricks] #9654
Conversation
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build |
Co-authored-by: Gera Shegalov <[email protected]>
build |
Co-authored-by: Gera Shegalov <[email protected]>
build |
The integration tests fail because we add rapids4spark-version-info.properties to all intermediate jars and the final jars including rapids-4-spark-integration-tests jar which we have not published since 21.12 We can deal with whether we should continue adding version info to the intermediate jars separately. For a fix:
I lean to the second option |
We publish the integration tests jar internally to aid in nightly test runs. Another approach is to make the properties file in the dist artifact unique relative to intermediate jars. Arguably all of the property files should be unique, even across intermediates, since we sometimes decide to publish those intermediates, even if only internally. |
This sounds good to me. As for the version file generation it is easy to interpolate |
Signed-off-by: Haoyang Li <[email protected]>
build |
conf.allowMultipleJars match { | ||
case "ALWAYS" => | ||
logWarning(msg) | ||
case "SAME_REVISION" => | ||
require(revisionMap.size == 1, msg) | ||
case "NEVER" => | ||
require(revisionMap.size == 1 && revisionMap.values.forall(_.size == 1), msg) | ||
case _ => | ||
throw new IllegalArgumentException(s"Invalid value for " + | ||
s"${RapidsConf.ALLOW_MULTIPLE_JARS.key}: ${conf.allowMultipleJars}. " + | ||
s"Valid values are ALWAYS, SAME_REVISION, NEVER.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us follow the pattern of other configs with an enum. Take a look PARQUET_READER_FOOTER_TYPE and make an auxiliary function translating strings <-> enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"values are ALWAYS: allow all jars, SAME_REVISION: only allow jars with the same " + | ||
"revision, NEVER: do not allow multiple jars at all.") | ||
.stringConf | ||
.createWithDefault("SAME_REVISION") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think NEVER should be the default, and SAME_REVISION should only be set in the tests where it is hard to fix right away with an issue filed to get it fixed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will file a follow-on issue on it.
What are the possible issues if there are multiple jars of the same revision on the classpath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some problems going undetected:
- it does not detect when the dist jar is in multiple extraClassPath, $SPARK_HOME/jars, and on the user classpath via packlages and jars. We have seen bugs in such deployments [BUG] RapidsShuffleManager and external packages for Spark Standalone #5796
- when you have sql-plugin and aggregator on the classpath at the same time the user might be using an unintended flatbffers dependency, the aggregator relocates flatbuffers to an internal package through the shade plugin to make sure it uses just the one from the aggregator.
It is fine to have a separate issue to sort the right default value out.
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
val revisionMap: Map[String, Seq[URL]] = possibleRapidsJarURLs.map { url => | ||
val versionInfo = scala.io.Source.fromURL(url).getLines().toSeq | ||
lazy val revision = versionInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no point in lazy
lazy val revision = versionInfo | |
val revision = versionInfo |
.filter(_.startsWith("revision=")) | ||
.map(_.split("=").last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you define a regex
val revisionRegex = "revision=(.*)".r
you could do it in one shot
.filter(_.startsWith("revision=")) | |
.map(_.split("=").last) | |
.collect { | |
case revisionRegex(revision) => revision | |
} |
Signed-off-by: Haoyang Li <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the minimum, retest the functionality manually. Bonus points: add unit tests
NEVER fails for me with a single jar
SPARK_HOME=~/dist/spark-3.3.0-bin-hadoop3 \
~/dist/spark-3.3.0-bin-hadoop3/bin/spark-shell \
--jars dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.allowMultipleJars=NEVER
java.lang.IllegalArgumentException: requirement failed: Multiple rapids-4-spark jars found in the classpath:
revison: f5bd0c958c1afc86a3e7906684809fc62e21d8ab
jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=gshegalov
revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
branch=pr/thirtiseven/9654
date=2023-11-24T03:03:31Z
url=https://github.com/NVIDIA/spark-rapids
jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=gshegalov
revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
branch=pr/thirtiseven/9654
date=2023-11-24T03:03:31Z
url=https://github.com/NVIDIA/spark-rapids
Signed-off-by: Haoyang Li <[email protected]>
It's weird, I can't reproduce this with 330/341 and with or without upmerge. |
@thirtiseven it looks like a single shim jar does not reproduce the issue indeed. my reproducer involves a two-shim jar built with: ./build/buildall --profile=330,350 on your PR branch the problem is related to the fact that we are currently sloppy with the inclusion of the properties files
Let us consider only the files stored under |
Signed-off-by: Haoyang Li <[email protected]>
@gerashegalov Thank you! I can reproduce now. |
val possibleRapidsJarURLs = classloader.getResources(propName).asScala.toSet.toSeq.filter { | ||
url => { | ||
val urlPath = url.toString | ||
// filter out submodule jars and files stored under subdirs of '!/' | ||
!urlPath.contains("rapids-4-spark-") && urlPath.contains("!/" + propName) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too easy to follow but I get it now. In the comment can you add an example of the submodule vs main dist artifact you are looking for.
However, now it looks like we NEVER will never find out where we put submodule jars for tests by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, now it looks like we NEVER will never find out where we put submodule jars for tests by accident?
How about adding another config mode like RECURISIVE to detect all jars on classpath and print a warning log? Just need to skip this filter under that mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by RECURSIVE?
if RECURSIVE means going down a directory level inside a jar file, I think we already established that we should only look at the root entries
ending with "!/" + propName
if RECURSIVE means including the check to all sub-module artitfact id prefixed with rapids-4-spark-
I realize that we do not have a good story now that change of the version info file name to contain the artifact Id is not part of the PR. Let us defer this to the followup issue you were going to file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if RECURSIVE means including the check to all sub-module artitfact id prefixed with rapids-4-spark- I realize that we do not have a good story now that change of the version info file name to contain the artifact Id is not part of the PR. Let us defer this to the followup issue you were going to file.
Yes, I mean this way. I will add it to the follow-up issue too.
Co-authored-by: Gera Shegalov <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working through the review issues.
I believe we have a very useful MVP already, and it can be improved in the follow-up PRs.
Thanks for your patient review and useful suggestions @gerashegalov |
build |
Follow on issue: #9870 |
Closes #7309
This PR detects if there are multiple plugin/cudf/jni jars on the classpath when initializing
RapidsDriverPlugin
andRapidsExecutorPlugin
and throws an exception if so. This PR also added a config to allow this behavior.Note that this PR did not accommodate for the fact that we might get a duplication spread over user and system classloaders. So one in a system classloader via extraClassPath or because it's in $SPARK_HOME/jars, and another one via spark.jars. I will try to check this or file a follow-on issue on this.
Note that the exception will only be thrown if the first plugin jar on the classpath is after this PR, so the same issue may occur again in the short term.
Tests