-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributions for copied content are missing #414
Comments
@ThinkOpenly, great catch. We need to give this some thought. This is where we likely need to separate the "DB" content from tooling and IF we must keep them together, then we need to clearly articulate in the repo licenses and associated README content these various licenses. @kbroch-rivosinc, the RVI Staff will ultimately need to weigh in and help provide guidance here as RISC-V does not have an OSPO. We do, however, have plenty of experienced folks inside the LF who can assist as well. So, let's start with whether we think this mixing of content is necessary to long term project. Thought here, @dhower-qc? |
Do you mean DB content from ISA content?
How do we decide on appropriate attribution? Who decides what is acceptable? (Honest question.) Is my suggestion above insufficient (if vague)? Perhaps the covered content needs to be described more succinctly:
It looks like some (not all) extensions and profile classes already call out their licensing, e.g.
...similarly the ISA verbatim content, in Maybe similar attribution could be added to the instructions, CSRs, and the rest of the extensions, and call it a day? I think we also need to add attribution for the Sail code in the instructions, by the way.
If the goal is "one-stop shopping", I'd argue that the mixed content is necessary. If we extrapolate, it could be a desirable outcome that the DB contains the canonical information, and the documentation (ISA) pulls what it needs from the DB. |
I mean "DB" as in the tooling associated with Unified DB, not the content. The ISA material is a good example of what I'd call content. More generally, I'd argue all yaml file information is "content" versus the tools/code that operate on the yaml files, such as Antora and other things. Make sense?
RVI, specifically me and Andrea as Management have a responsibility to protect our IP. Thus, questions around licensing and IP come to us. As an example, I'm the one who reviews all specification contributions as part of the process and ensures they are made by members. This work is part of the "Specification Policies" review that occurs before Freeze and Ratification. So, I'm "here to help". ;-)
While a one-stop shopping goal is always preferable, the question always becomes "from whose perspective?" As an example, we don't put our webpages in the Apache webserver repo, nor, do Internet users care. FWIW, the Debug spec has set some precedence for mixed licensing of tools and content. But honestly, we haven't done much. |
I do think there is great technical value to keep text content and executable content together long term. We should be able to come up with a way to separately attribute. For example, we can have BSD-3 apply to all Ruby/Python/C++/IDL/ files. We could specify that content in YAML files has mixed licensing -- Sail appears to be some form of BSD-2, IDL is BSD-3, prose can be CC4.0, metadata (e.g., encodings) is ?? (ask a lawyer). We should capture attribution of prose that is copied/derived from the ISA manual. |
Does a separation of tools an content offer much benefit? The tools may have a single license within the project scope, but the content, as we're seeing, has multiple origins with their own respective licenses. The need to identify how different subsets of overall project content is licensed persists.
It's at least clear to me that "all" of the information particular to an instruction should be in one place: the name, description, syntax, operands/types, encoding, semantics, encompassing extension, implementation notes, etc. In that way, downstream uses (documentation, assemblers, disassemblers, simulators/emulators, hardware) can all go to one well-curated and validated source.
Any thoughts on my simple proposal, above:
Is what is currently done for the extensions, profile classes, and verbatim ISA content sufficient? |
Discussed with @dhower-qc about providing a POC of a technical solution to try and faithfully denote the copyright/licensing of the content (docs and code) of this repo (I had originally said I would mock it up in doc-sig repo but I think providing a PR here would be more useful). Goals of POC
Non-goals
ImplementationApproach can be summarized by this example file (NOTE: the dual license on this line):
I'm happy to put in a PR for this if others think it would be useful or we can just discuss other existing open source projects that deal with multi-license issues. |
@kbroch-rivosinc, I was starting to reach a similar conclusion. Can we articulate a "requirement" that anything "imported" includes metadata about the importing license? It seems that we'd want to annotate that data somehow (footnote, twistie, etc.) |
reuse has the notion of snippets https://reuse.software/spec-3.3/#comment-headers to denote sections of a file under different copyright/licensing. This would work for the yaml files that include the "imported" content. |
Nice. Any idea what the "visible" (pdf or html) output looks like, @kbroch-rivosinc? |
I don't think at the moment there's anything in main branch but I'm sure it could be added if needed. Here's an example of a currently generated instruction page: https://riscv-software-src.github.io/riscv-unified-db/manual/html/isa/20240411/insts/addi.html Just like Antora says at the bottom about being MPL-2.0, the template that generates adoc for the addi inst. could mention Sail licensing in that section and isa-manual licensing in an section from the manual. |
I've pushed the POC to a draft PR for those to look at. I think it accomplished what I listed above and I also added an example of putting a snippet comment section in one file. Again this is using the reuse tool which is actively developed and it uses SPDX which is backed by the LF: https://reuse.software/faq/#what-is-spdx Not saying the PR should be accepted but if something like it was then other features would be:
Here's what I see if I run
If you run
|
Missing from the comment above is the implementation that can obviously be found in the PR. In a few lines:
It certainly solves the problem in the YAML source. Subjectively, it's slightly ugly, but not too bad. :-) Note that "Sail copyright holders" is a fairly lengthy text (currently 39 lines/entities). (So, it'll get uglier unless this text is just a short reference to the full text.) It is not integrated as usable YAML data, so downstream can't easily determine the license associated with the YAML values by simply reading the YAML as data, since the annotations are embedded exclusively in comments. To best "protect the IP", I think we want a solution that makes it easy for downstream users of the YAML to be able to easily associate a license with the covered content. |
Describe the bug
We copy a lot of content from the ISA here. The ISA is licensed under "Creative Commons Attribution 4.0 International License". This repository, as far as I see, is entirely licensed under BSD-3-Clause-Clear (see "LICENSE"). This license is similar, but not the same, and we seem to be lacking the proper attribution for the content. https://creativecommons.org/share-your-work/use-remix/.
Additional context
Something like:
in "LICENSE", perhaps?
The text was updated successfully, but these errors were encountered: