Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

Closed
pnacht opened this issue Mar 18, 2024 · 3 comments · Fixed by #4218
Closed
Labels

Comments

@pnacht
Copy link
Contributor

pnacht commented Mar 18, 2024

Is your feature request related to a problem? Please describe.
The finding.Finding.Remediation.Patch field is meant to store machine-readable patches to fix the finding:

type Remediation struct {
// Patch for machines.
Patch *string `json:"patch,omitempty"`

Fixing script injection in workflows is pretty straightforward and can be solved procedurally, but the probe's findings don't include the patch.

Describe the solution you'd like
The findings generated by hasDangerousWorkflowScriptInjection should include a remediation patch that is machine-readable, and preferably human-recognizable. I'd propose the "unified diff" format used by git diff or diff -u.

The utopian solution would allow us to create a single "global" patch that maintainers could then apply to the project, entirely fixing all the dangerous script injections. However, my understanding of the probe architecture leads me to believe we can only fix individual findings, one at a time.

I have already created some first-draft code that can perform this operation. Running it, we get output such as:

{
  # ...
  "findings":
    [
      {
        # ...
        "remediation":
          {
            "patch": "--- a/.github/workflows/pull_request_open_polling.yml\n+++ b/.github/workflows/pull_request_open_polling.yml\n...",
            # ...
          },
      },
      {
        # ...
        "remediation":
          {
            "patch": "--- a/.github/workflows/pull_request_update_polling.yml\n+++ b/.github/workflows/pull_request_update_polling.yml\n...",
            # ...
          },
      },
    ],
}

Pretty-printing each of those patches, we get a git diff-like* "unified" output that can be used to fix the relevant finding:

--- a/.github/workflows/pull_request_open_polling.yml
+++ b/.github/workflows/pull_request_open_polling.yml
@@ -8,10 +8,13 @@
   pull_request:
     types: [opened, reopened]
 
+env:
+  PR_HEAD_REF: ${{ github.event.pull_request.head.ref }}
+
 jobs:
   build:
     runs-on: jenkins-polling
 
     steps:
       - name: Run PullRequestJenkinsPoller
-        run: ... ${{ github.event.pull_request.head.ref }} ...
+        run: ... $PR_HEAD_REF ...

I'll be happy to send this code as a PR once I've polished it up.

However, there are several issues with my current implementation that I wish to discuss before moving forward:

git diff-like output

The biggest problem is generating the git diff output. It'd be great if we could use a dependency to create this output given two versions of a file, but a reasonable dependency doesn't really exist, as far as I can tell:

  • go-git (already used by Scorecard) can generate slices describing all the changes between the files (https://github.com/go-git/go-git/blob/master/utils/diff/diff.go), but it can't create the output in the "unified" format we need for git apply. That functionality is basically just a wrapper for sergi/go-diff, which isn't currently used by Scorecard, so this would need to be added to Scorecard's go.mod.
  • hexops/gotextdiff and akedrou/textdiff are effectively identical packages that can create the necessary git diff output. However, these packages simply export an internal module used by Go tools (x/tools/internal/diff). However, given their "one-and-done" concept, they've been archived by their maintainers (and, unsurprisingly, have poor Scorecard scores: 2.9).

I've explored two alternatives:

  • Using hexops/gotextdiff works just fine to generate the "unified diff", as far as I can tell. But it really is a lousy dependency to add to Scorecard.
  • I've also written some code to generate the "unified diff" ourselves based on go-git's output. It works for the handful of tests I've run, but it'd probably have to be more thoroughly tested to make sure it covers edge cases, etc.

How to store expected unit test outputs

This code would naturally need to be tested. Each test case will be composed of a workflow that needs to be fixed and all the information we'd receive from the finding (location, unsafe variable, etc). However, the unit tests also need to know what the expected output of each test would be. We need to decide how we wish to store this information. As I see it, our options are:

  • we can store the expected "fixed workflow" (i.e. workflow-fixed.yml), and then generate the git diff-like output during the test and compare it to ours. Conceptually, this would be the best solution, but the only way I see to generate that output would be to... run git diff using exec.Cmd() or something, which has all the downsides of using exec.
    In this particular case, I don't think it'd be toooooo problematic, though? Any machine running tests on Scorecard likely has git installed on the $PATH (and we can simply skip the test if it's not found), and the "unified format" has been stable since its release in 1990, so there isn't much risk of different results on machines with different versions of git.
  • store the expected git diff file (generated by running git diff on the broken workflow and a fixed version – which is untracked) and compare our output directly to it. This makes the test code simple, but means we're storing a human-legible-but-unfriendly format. It's also somewhat brittle: if we modify the original workflow (i.e. adding comments), that will change the git diff output as well (modified line positions, maybe even changing the diff itself), so the .diff file would also need to be updated, which may not be immediately obvious.
  • store the .diff file and the fixed version of the workflow (i.e. workflow-fixed.yml). The fixed workflow wouldn't actually be used for the tests, but would serve as useful context, showing what we expect the output to be. But this would also be vulnerable to drift over time: we might change the broken workflow and the diff and forget to update the fixed workflow.
    • This solution could be enhanced by adding another step to make all that runs git diff (again, assuming it exists) on the broken and fixed versions of the workflows and updates the respective .diff files (this would need to be verified in the CI/CD tests)

Let me know which options you prefer for these two questions and I'll send a PR taking them into account.


* One perhaps-notable difference between this output and an actual git diff is that git diff adds the "scope" after the @@ anchors (i.e. @@ -1,2 +3,4 @@ func foo() {), while my code does not. However, this data isn't actually used by git apply or patch, so this is a meaningless difference.

Copy link

This issue has been marked stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the Stale label Jul 11, 2024
@pnacht
Copy link
Contributor Author

pnacht commented Jul 11, 2024

PR #4218 was sent last week to resolve this issue. Awaiting maintainer review after v5.0 is released.

@github-actions github-actions bot removed the Stale label Jul 12, 2024
Copy link

github-actions bot commented Nov 9, 2024

This issue has been marked stale because it has been open for 60 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants