Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complaint generator #41

Open
baltpeter opened this issue Nov 30, 2023 · 9 comments
Open

Complaint generator #41

baltpeter opened this issue Nov 30, 2023 · 9 comments
Assignees

Comments

@baltpeter
Copy link
Member

How do we go from a HAR file and the output of TrackHAR run on that HAR to a complaint?

@baltpeter baltpeter self-assigned this Nov 30, 2023
@baltpeter
Copy link
Member Author

One of the first issues I'm running into is the question of what output formats we want to produce, and, relatedly, which library/software we want to use to produce those.

Obviously, plain text is not enough as an output format. The DPAs typically only support a very limited set of formats for attachments. I think it's pretty safe to say that we will need to at least be able to produce PDFs.

Do we want to allow users to edit the generated complaint? That obviously wouldn't be possible with PDFs. We could also output ODT (or DOCX, I guess…), which the user could easily edit and convert to PDF themselves. But I would guess that users wanting to edit the complaint is rather unusual, and thus making them convert the ODT manually would be quite (unnecessarily) annoying. So, we would need to produce both ODT and PDF. And obviously, the PDF should be the same as when the user manually saves the ODT to PDF.

But that means we would have to run LibreOffice. I'm pretty sure that there is even an official CLI for this purpose, but a) for various reasons I would much rather be able to run this client-side, and b) I'm concerned about the security implications of running LibreOffice on at least partially user-influenced files on our servers. That seems like a bad idea.

And another thing to consider: I would like to have a separate HAR "renderer" library that can generate a nice human-readable document from a HAR files. That should also be able to at least output PDF files. But I think it would also be nice to have an HTML output option so we could offer an online HAR viewer. For this use case, having ODT as the "input format", from which the other formats are generated really doesn't make sense.

@baltpeter
Copy link
Member Author

baltpeter commented Nov 30, 2023

Oh, and another potentially annoying constraint: We need quite advanced reference features that probably aren't supported by too many solutions: We definitely need hierarchically numbered section headings that we can reference elsewhere in the text. And I would also definitely like to have margin/paragraph numbers and be able to reference them elsewhere, as well.

image

@baltpeter
Copy link
Member Author

Options I'm considering:

Typst

  • There is a TS library for running it client-side that can produce PDFs.
  • It compiles really fast.
  • While I haven't tried it yet, the language is supposed to be really powerful yet easy to read and write. I would imagine that we could implement our desired references with little effort.
  • However, it can (at the moment) only output PDFs. HTML output is planned at some later date but not yet implemented. I don't expect there to ever be a (native) ODT/DOCX output (why would there be?).

Carbone

  • That's what we're using to build the sample letter ODTs and DOCXs in website (https://github.com/datenanfragen/website/blob/aee6acea516ef8e5db5509db42ca5c303944da4b/scripts/build-sample-letters/build.js).
  • It's focused on simple replacements of variables in a template document (meant for report generation), not built for creating full documents, adding many new paragraphs, images etc. dynamically. It might be possible to bend it to our use case, but I imagine it would be quite annoying. Especially with the references. We would probably have to implement our own reference engine and then just flatten that into plain text to insert.
  • While it boasts to support PDF output, it does so by using the LibreOffice CLI to convert ODT/DOCX to PDF.
  • I'm not sure whether it's possible to run Carbone in the browser.

Pandoc

  • It does appear possible to compile Pandoc for the browser, to JS even: https://news.ycombinator.com/item?id=19071537
  • However, Pandoc itself cannot produce PDFs. It relies on external programs for that and I don't think any of the options are suitable for our use case.
  • I'm also not sure whether it is possible to express margin numbers in a manner that Pandoc understands and is able to reliably convert between formats. I suspect we would also need to flatten them beforehand.

PSPDFKit

Misc NPM libraries

@baltpeter
Copy link
Member Author

Based on that, I don't really think that there's any solution that ticks all of our boxes. I think Typst looks like the best option. I guess we'll have to do without an ODT export (at least for now). And I guess having a more focused HAR to PDF library might even be nicer than a HAR online viewer that also happens to export to PDF, especially since it's not exactly hard to find other HAR viewers. And rendering a HAR in the browser and for PDF are actually quite different use cases, that should probably display everything differently.

@baltpeter
Copy link
Member Author

baltpeter commented Dec 5, 2023

I expected paragraph numbering to be really easy to implement in Typst (just set up a counter and a #show rule on par), but alas it isn't. That causes an infinite recursion and is a known bug (https://old.reddit.com/r/typst/comments/16ltmtx/documentwide_enumeration/k1gzzzr/?context=3, typst/typst#229, typst/typst#519).

That basically only leaves us with two options: Continue without paragraph numbering and hope that this is fixed before we actually need the exported PDFs, or wrap every paragraph in a custom function (it would also be possible to use a #set rule on enum instead like in typst/typst#2506 but the difference in syntax isn't much of an improvement considering we're generating the Typst code and also this would mean that we couldn't use actual numbered lists anywhere in the document). Quite unfortunate. :/

@baltpeter
Copy link
Member Author

Progress is being made. I'm starting with the technical reports. Here's an example of what we can generate already: PDF

Code in tweaselORG/ReportHAR#1.

@baltpeter
Copy link
Member Author

baltpeter commented Dec 11, 2023

We also need to deal with escaping. I had already discussed that a bit in #42 (comment).

Here, things are a bit more complicated since I a) would really like to enable autoescaping and b) we have both values in plain text and code blocks that need to be escaped.

Nunjucks does have an autoescape feature but it is only designed for HTML templates and thus of no help for us (which is why I had initially disabled it). From looking at the code, the escaping is hardcoded and there is no way to replace it with a custom escaping function. But manually escaping each value is cumbersome and error-prone. So I decided to use patch-package.

I implemented that in tweaselORG/ReportHAR@3fad5cb with a slightly more robust version than what I had initially for our HAR renderer.

@baltpeter
Copy link
Member Author

We now also have controller notices: PDF

(oops, looks like I forgot to post this yesterday? o.o)

@baltpeter
Copy link
Member Author

After a lot of work, we now have: complaints!

PDF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant