Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use common data between Perl/Ruby/Go versions (eg. JSON)? #13

Open
bohwaz opened this issue Oct 9, 2024 · 2 comments
Open

Use common data between Perl/Ruby/Go versions (eg. JSON)? #13

bohwaz opened this issue Oct 9, 2024 · 2 comments

Comments

@bohwaz
Copy link

bohwaz commented Oct 9, 2024

Hi,

just was this new Go version thanks to your reply. I was interested in reusing some of the data from Sisimai in my PHP code, and I'm wondering why you didn't go the route of having a common set of data that could be reused between the various language versions of Sisimai?

For example you could have a set of JSON files that list the various strings and pairs for matching reasons, like I extracted here: https://gist.github.com/bohwaz/9c5b8354089a15033ea1a97a267cabfb#file-reasons-json

And probably the same with other parts, like matching Rhost errors, which are mostly just large arrays. Some parts might be too complex to make "generic" like in Lhost. But having the same common data for each library would avoid having to replicate changes on strings from one library to the other, reducing duplicate code efforts.

For example you would have a "sisimai-data" repo that would be pulled by various libraries to have up to date data for matching reasons, Rhost, etc.

Maybe there is an obvious reasons you didn't go this route that I can't see right now?

Anyway, thank your for your work, very interesting and useful :)

@azumakuniyuki
Copy link
Member

@bohwaz Apologies for the delayed response.

I think that unifying fixed strings, starting with error message patterns, is a good idea. After initially releasing the Perl version of Sisimai, I created a Ruby version on a whim because I wanted to run it on AWS Lambda. At that time, I separated set-of-emails as a repository for test emails common to both.

Since error messages were implemented using a large number of regular expressions at the time, I was concerned that using them in a common external file would cause excessive I/O at runtime and slow things down. Therefore, I decided that hardcoding them in the repository was the most reasonable approach. I thought that since error message patterns are rarely updated, I could just copy them if needed.

Now that all error message patterns have been changed to fixed strings, I may reconsider this if it doesn't cause any performance issues, including I/O.
However, my current thinking is that I strongly prefer to keep all files necessary for installation, testing (make test), and execution in a single repository.

By the way, the process of copying and pasting changes to a separate repository, while seemingly unproductive, acts as a self-contained code review and can surprisingly lead to finding improvements in the code.

Thank you for your ideas and feedback!

@bohwaz
Copy link
Author

bohwaz commented Oct 26, 2024

Thank you, all perfectly understandable points.

Maybe a solution would be to have a central repo of strings, and each library could generate a native source file from this repo, eg. a hash table, which would be versioned in git. This way you would have zero performance issue as the strings would be in the code, but you wouldn't have to manually match the strings between different libraries.

This would also work for your requirement to keep all files in the same repo, as you would have a copy of the strings in each repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants