Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PurlFetcher::Client::ReleaseTags.release call gets 502 Bad Gateway response #1327

Open
honeybadger bot opened this issue Jul 13, 2024 · 3 comments
Open

Comments

@honeybadger
Copy link

honeybadger bot commented Jul 13, 2024

failure info

Error: RuntimeError: unexpected response: 502 {"status"=>"error", "message"=>"Bad Gateway"}

Backtrace:

[GEM_ROOT]/gems/purl_fetcher-client-1.5.3/lib/purl_fetcher/client.rb:88 :in `put`
[GEM_ROOT]/gems/purl_fetcher-client-1.5.3/lib/purl_fetcher/client/release_tags.rb:25 :in `release`
[GEM_ROOT]/gems/purl_fetcher-client-1.5.3/lib/purl_fetcher/client/release_tags.rb:11 :in `release`
[PROJECT_ROOT]/lib/robots/dor_repo/release/release_publish.rb:24 :in `perform_work`
[GEM_ROOT]/gems/lyber-core-7.5.0/lib/lyber_core/robot.rb:58 :in `block in perform` 

def perform_work
logger.debug "release-publish working on #{druid}"
index = targets_for(release: true)
delete = targets_for(release: false)
PurlFetcher::Client.configure(url: Settings.purl_fetcher.url, token: Settings.purl_fetcher.token)
PurlFetcher::Client::ReleaseTags.release(druid:, index:, delete:)
end

https://github.com/sul-dlss/purl_fetcher-client/blob/c4b8058ef114b34a3869fac4ead2180fcfffc5d0/lib/purl_fetcher/client/release_tags.rb#L23-L27

View full backtrace and more info at honeybadger.io

troubleshooting so far

I was able to recreate the error manually in common-accessioning robot console on prod, by making the underlying calls in the robot step's perform_work method, but without the surrounding workflow stuff in the base lyber_core perform method. Example in Slack: https://stanfordlib.slack.com/archives/C09M7P91R/p1720822349423099?thread_ts=1720748791.183479&cid=C09M7P91R

$ ROBOT_ENVIRONMENT=production ./bin/console
[1] pry(main)> rp = Robots::DorRepo::Release::ReleasePublish.new
[2] pry(main)> druid = 'druid:mw847pg5827'
[3] pry(main)> rp.instance_variable_set(:@druid, druid)
[5] pry(main)> index = rp.targets_for(release: true)
[6] pry(main)> delete = rp.targets_for(release: false)
[7] pry(main)> PurlFetcher::Client.configure(url: Settings.purl_fetcher.url, token: Settings.purl_fetcher.token)
[8] pry(main)> PurlFetcher::Client::ReleaseTags.release(druid:, index:, delete:)
D, [2024-07-12T15:05:58.991887 #2774976] DEBUG -- : Starting an release request for: druid:mw847pg5827
RuntimeError: unexpected response: 502 {"status"=>"error", "message"=>"Bad Gateway"}
from common-accessioning/shared/bundle/ruby/3.3.0/gems/purl_fetcher-client-1.5.3/lib/purl_fetcher/client.rb:88:in `put'

However, I wasn't able to reproduce this using what I think would be a similar raw curl request from the command-line on the same VM; in that case, I got a 4xx error instead. Which struck me as odd since a 502 would indicate an error from something like a web server that sits in front of purl-fetcher from the perspective of network calls coming to it, right? but a 4xx indicates the request making it through. And I couldn't see why A Faraday request from Ruby land would behave so differently from a CLI curl. Example curl:

$ curl -X PUT -H "Accept: application/json" -H "Authorization: Bearer $(echo $SETTINGS__PURL_FETCHER__TOKEN)" 'https://purl-fetcher-prod.stanford.edu/v1/released/druid:mw847pg5827'
{"status":400,"error":"Bad Request"}

other context

Slack thread where @andrewjbtw happened upon this: https://stanfordlib.slack.com/archives/C09M7P91R/p1720817745891829?thread_ts=1720748791.183479&cid=C09M7P91R

Facet for what's still in error: https://argo.stanford.edu/catalog?f%5Bwf_wps_ssim%5D%5B%5D=releaseWF%3Arelease-publish%3Aerror

druids that've run into the error as of 2024-07-12:

Andrew suspects ongoing user versions work and something something meta.json as a possible culprit. See this HB alert: https://app.honeybadger.io/projects/48916/faults/108553257 (Errno::ENOENT: No such file or directory @ rb_sysopen - /purl/document_cache/mw/847/pg/5827/meta.json)

I guess maybe my malformed curl request isn't making it to the point of even trying to read the meta.json, but then once an attempt is made to read that file with a valid request to purl-fetcher, purl-fetcher then tries to get it but runs into a 502 from another service? In a quick grep to see what HTTP statuses purl-fetcher returns, I found this error code related behavior:

purl-fetcher % git grep status app  # results snipped to remove things that aren't status code returns
app/controllers/concerns/authenticated.rb:    return render json: { error: 'Not Authorized' }, status: :unauthorized unless token
app/controllers/v1/purls_controller.rb:      render json: true, status: :accepted
app/controllers/v1/purls_controller.rb:      return render json: { error: "already deleted" }, status: :conflict if @purl.deleted?
app/controllers/v1/released_controller.rb:      render json: true, status: :accepted
app/controllers/v1/resources_controller.rb:      render json: true, location: @purl, status: :created

status also showed up as taking the ResourcesController#build_error method's error_code param, so see what error codes are passed to build_error...

purl-fetcher % git grep build_error
app/controllers/v1/resources_controller.rb:      render build_error('500', e, 'Error matching uploading files to file parameters.')
app/controllers/v1/resources_controller.rb:      render build_error('400', e, 'Bad request')
@jmartin-sul jmartin-sul changed the title [common-accessioning/prod] RuntimeError: unexpected response: 502 {"status"=>"error", "message"=>"Bad Gateway"} PurlFetcher::Client::ReleaseTags.release call gets 502 Bad Gateway response Jul 13, 2024
@jmartin-sul jmartin-sul moved this to New Issues (Needs Triage) in Infrastructure Portfolio Production Priorities Jul 13, 2024
@jmartin-sul
Copy link
Member

if we realize this is a purl-fetcher bug, we should probably transfer this GH issue to the purl-fetcher repository. but leaving here for now since it's where we discovered the problem.

@andrewjbtw
Copy link

Is this close-able? I just noticed it on the prod priorities board and July 12 feels like a long time ago given all the versioning work. I suspect we addressed this.

@jmartin-sul
Copy link
Member

Is this close-able? I just noticed it on the prod priorities board and July 12 feels like a long time ago given all the versioning work. I suspect we addressed this.

i don't know, but it does feel plausible that versioning work since july has fixed this. no objection to closing. hopefully if this error pops up again, honeybadger will collate the new alert to the original one linked from this issue, since this issue is linked from that HB alert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New Issues (Needs Triage)
Development

Successfully merging a pull request may close this issue.

2 participants