Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very High Memory Usage Using Syft #233

Open
dor-hayun opened this issue Jun 9, 2024 · 8 comments
Open

Very High Memory Usage Using Syft #233

dor-hayun opened this issue Jun 9, 2024 · 8 comments
Assignees
Labels
bug Something isn't working performance

Comments

@dor-hayun
Copy link

What happened:
High memory consumption when scanning 5GB image

What you expected to happen:
controlled memory consumption

Steps to reproduce the issue:
scan very big sized images and check the memory consumption

Anything else we need to know?:
image

Environment:

  • Output of syft version: 1.5.0
  • OS (e.g: cat /etc/os-release or similar):
    seems like there are many dotnet packages in the images
@dor-hayun dor-hayun added the bug Something isn't working label Jun 9, 2024
@abhiseksanyal
Copy link

abhiseksanyal commented Jun 10, 2024

I am seeing the same issue where it took almost 20 GB RSS memory

I tested with syft 1.5.0 and 1.4.1 on an Ubuntu 22.04 on an EC2 system with 32 GB of RAM and 8 vCPU and saw both of them having this issue

Test was run against a 5.5+ GB image on GCR that has Maven components on an RHEL 7.9 Image

image

syft ran for quite some time and then exited without any error

@wagoodman wagoodman moved this to Ready in OSS Jul 9, 2024
@wagoodman wagoodman moved this from Ready to In Progress in OSS Jul 10, 2024
@wagoodman wagoodman self-assigned this Jul 10, 2024
@wagoodman
Copy link
Contributor

wagoodman commented Jul 10, 2024

An initial look shows that, depending on the image being scanned, the CSV reader used within the mimetype detector lib is what's eating much of the total allocated space

Screenshot 2024-07-10 at 9 36 42 AM

It seems like we're using an older version of mimetype that does not incorporate gabriel-vasile/mimetype#355 . When I bump the dependency and incorporate this fix though, I see the memory allocated within the mimetype.DetectReader() call drop from 1.1GB to 740MB, which is an improvement, but I was expecting much less consumption.

I'll see what else I can do here, but since much of the consumption is from the CSV and TSV detectors alone, I'm considering dropping those detectors entirely (which would require a fork in the short term).

@wagoodman
Copy link
Contributor

wagoodman commented Jul 10, 2024

I've got a prototype csv/tsv detector that is pretty bare-boned, but it drops the total memory allocation from 740MB to 330MB. I'll see if I can get that PR tested and in the upstream.

@wagoodman
Copy link
Contributor

@abhiseksanyal the screenshot is showing 9GB being used -- are you describing two different invocations?

syft ran for quite some time and then exited without any error

did syft display an SBOM result? Or exited without error or SBOM result?

@wagoodman
Copy link
Contributor

The PR that attempts to reduce total memory allocation is stalled for a while anchore/mimetype#2

@wagoodman wagoodman moved this from In Progress to Stalled in OSS Jul 17, 2024
@dor-hayun
Copy link
Author

@wagoodman thank you very much, will it be part of the next release of Syft?

@dor-hayun
Copy link
Author

Hi @wagoodman , any update here?

@abhiseksanyal
Copy link

@abhiseksanyal the screenshot is showing 9GB being used -- are you describing two different invocations?

syft ran for quite some time and then exited without any error

did syft display an SBOM result? Or exited without error or SBOM result?

@wagoodman : I took the screenshot before it hit the peak
When I tested in a different setup that had 32 GB RAM, syft exited without an error and generated the SBOM

@willmurphyscode willmurphyscode added the needs-discussion to be discussed at upcoming community gardening label Oct 5, 2024
@willmurphyscode willmurphyscode removed the needs-discussion to be discussed at upcoming community gardening label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
Status: Stalled
Development

Successfully merging a pull request may close this issue.

4 participants