Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstandard support #95

Open
wants to merge 6 commits into
base: 2.5.5-dev2
Choose a base branch
from
Open

Conversation

maxsam4
Copy link

@maxsam4 maxsam4 commented Aug 1, 2020

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.

This PR adds support for searching through zst files using the zstandard Python bindings.

It works and gets the job done for now. Additional things that can be done:

  • Merge localzstdsearch.py and localgzipsearch.py to reduce code duplication
  • Add test case(s)
  • [Stretch goal] Handle encodings other than utf-8

@khast3x
Copy link
Owner

khast3x commented Aug 5, 2020

Hello,

Thank you for the PR. I don't use this type of compression but seems very promising.
As this is the first PR that adds a feature, I am not quite sure how I will integrate it, so I need to figure some things out. h8mail's design is focused on requiring requests only, and I would like to keep it so.

I think I will have the user manually install the zstandard lib with pip3 (will be documented in the wiki), and check if the lib is installed when using this option.
Before I get to it I have a few things I need to integrate first, but I have read your PR and thinking through it.
Do you convert your local data breaches to zstandard before archiving them? What is your workflow?

Thanks again, much appreciated 👍

@maxsam4
Copy link
Author

maxsam4 commented Aug 5, 2020

I think I will have the user manually install the zstandard lib with pip3 (will be documented in the wiki), and check if the lib is installed when using this option.

Sounds fair.

Do you convert your local data breaches to zstandard before archiving them? What is your workflow?

Yep, I compress all my local data with zstd. zstd is basically a better alternative to gzip/zlib. It creates smaller archives that decompress much faster. In fact, I get better performance searching over a zstd compressed file than searching over a decompressed file because I get limited by my disk io in both cases.

Thanks for creating h8mail btw! It makes life much easier. I especially love the fact that it supports multi-threading when searching over local data.

@@ -3,9 +3,9 @@
</h1>

[![platforms](https://img.shields.io/badge/platforms-Windows%20%7C%20Linux%20%7C%20OSX-success.svg)](https://pypi.org/project/h8mail/) [![PyPI version](https://badge.fury.io/py/h8mail.svg)](https://badge.fury.io/py/h8mail)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/h8mail.svg)](https://pypi.org/project/h8mail/) [![Downloads](https://pepy.tech/badge/h8mail)](https://pepy.tech/project/h8mail) [![travis](https://img.shields.io/travis/khast3x/h8mail.svg)](https://travis-ci.org/khast3x/h8mail)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@

if user_args.single_file:
local_found = local_search_single_zstd(res, targets)
else:
local_found = local_zstd_search(res, targets)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khast3x khast3x changed the base branch from master to 2.5.5-dev2 January 28, 2021 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants