Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HAR to PDF library #42

Closed
baltpeter opened this issue Nov 30, 2023 · 13 comments
Closed

Implement HAR to PDF library #42

baltpeter opened this issue Nov 30, 2023 · 13 comments
Assignees

Comments

@baltpeter
Copy link
Member

As explained in #41 (comment)

@baltpeter baltpeter self-assigned this Nov 30, 2023
@baltpeter
Copy link
Member Author

Typst.ts has two main approaches for how it can be used: by importing an "all-in-one" bundle that does a whole bunch of plumbing for you, or by manually doing the necessary work yourself.

With the AIO bundle, compiling a string to PDF is easy:

import { $typst } from '@myriaddreamin/typst.ts/dist/esm/contrib/snippet.mjs';

export const compile = async () => {
    const source = `
= Introduction
#lorem(60)
`;

    return await $typst.pdf({ mainContent: source });
};

However, the maintainer recommends going the manual route. I struggled a bit getting that to work, so I wanted to document what I found out in case anyone else has the same problem. While the code above worked just fine for me, trying to manually recreate that, I always got an entirely blank PDF. I ended up duplicating snippet.mts locally and removing stuff until I finally found the problem. The one thing different between my code and snippet.mts was this:

https://github.com/Myriad-Dreamin/typst.ts/blob/9644b7088696a0d2fb14b5ee664f855270c78fd5/packages/typst.ts/src/contrib/snippet.mts#L534

This ensures (quite opaquely) that ccOptions.beforeBuild is not undefined. ccOptions here is what gets passed to cc.init(). I wasn't doing that, I didn't provide an argument since it isn't required. I didn't investigate this more thoroughly, but from a quick glance, not having beforeBuild seems to cause fonts not to be loaded (https://github.com/Myriad-Dreamin/typst.ts/blob/9644b7088696a0d2fb14b5ee664f855270c78fd5/packages/typst.ts/src/compiler.mts#L162-L186).

Anyway, here is a working minimal example of how to render a PDF with typst.ts using the manual approach:

import { createTypstCompiler } from '@myriaddreamin/typst.ts/dist/esm/compiler.mjs';

export const compile = async () => {
    const mainFilePath = '/main.typ';
    const source = `
= Introduction
#lorem(60)
`;

    const cc = createTypstCompiler();
    await cc.init({ beforeBuild: [] });

    cc.addSource(mainFilePath, source);

    return await cc.compile({ mainFilePath, format: 'pdf' });
};

@baltpeter
Copy link
Member Author

In my legal research (#38), I found one example where the Austrian DPA "printed" a HAR in one of their decisions: https://noyb.eu/sites/default/files/2023-03/Bescheid%20redacted.pdf#page=24

I'll use that as a rough example of how to structure the output.

Other than that, I'll reference Firefox's and Chrome's dev tools.

@baltpeter
Copy link
Member Author

Unfortunately, I'm running into a fairly major issue with Typst quite early on. I'm currently working on defining a template for what requests should be rendered as.

The problem is that we are regularly dealing with very long non-word strings and Typst doesn't handle those well. By default, if it can't find a nice place to break, it will just let them overflow:

image

Now, you can enable hyphenation:

#set text(hyphenate: true)

https://graph.facebook.com/v14.0/app?access_token=138566025676|e251d7dad1e2b26389ad8a43557aa256&fields=supports_implicit_sdk_logging,gdpv4_nux_content,gdpv4_nux_enabled,android_dialog_configs,android_sdk_error_categories,app_events_session_timeout,app_events_feature_bitmask,auto_event_mapping_android,seamless_login,smart_login_bookmark_icon_url,smart_login_menu_icon_url,restrictive_data_filter_params,aam_rules,suggested_events_setting&format=json&sdk=android

image

This is better but the hyphens are very unfortunate, since they may well change the meaning (cf. typst/typst#674). I may be missing something but I haven't found a way to enable hyphenation but hide the actual hyphens.

Worse yet, hyphenation doesn't appear to work in raw/code blocks:

#set text(hyphenate: true)

```
https://graph.facebook.com/v14.0/app?access_token=138566025676|e251d7dad1e2b26389ad8a43557aa256&fields=supports_implicit_sdk_logging,gdpv4_nux_content,gdpv4_nux_enabled,android_dialog_configs,android_sdk_error_categories,app_events_session_timeout,app_events_feature_bitmask,auto_event_mapping_android,seamless_login,smart_login_bookmark_icon_url,smart_login_menu_icon_url,restrictive_data_filter_params,aam_rules,suggested_events_setting&format=json&sdk=android
```

image

#set raw(hyphenate: true) produces an error.

typst/typst#1271 (comment) suggests appending a zero-width space after each character, which is easy enough to do:

#show raw: t => for c in t.text [#c.replace(c, c + sym.zws)]

```
https://graph.facebook.com/v14.0/app?access_token=138566025676|e251d7dad1e2b26389ad8a43557aa256&fields=supports_implicit_sdk_logging,gdpv4_nux_content,gdpv4_nux_enabled,android_dialog_configs,android_sdk_error_categories,app_events_session_timeout,app_events_feature_bitmask,auto_event_mapping_android,seamless_login,smart_login_bookmark_icon_url,smart_login_menu_icon_url,restrictive_data_filter_params,aam_rules,suggested_events_setting&format=json&sdk=android
```

image

This works and produces a passable result. And this would be fine if we were producing PDFs for print only. However, if you now copy this text, you also copy all those zero-width spaces, which is just wrong.

And interestingly, these zero-width spaces are rendered with a very much non-zero width when copied into LibreOffice or gedit:

image

@baltpeter
Copy link
Member Author

The same problem applies when using a soft hyphen instead of a zws, except now we get hyphens instead of hard breaks (plus an unnecessary hyphen at the end that I don't quite understand, but anyway):

#show raw: t => for c in t.text [#c.replace(c, c + sym.hyph.soft)]

```
https://graph.facebook.com/v14.0/app?access_token=138566025676|e251d7dad1e2b26389ad8a43557aa256&fields=supports_implicit_sdk_logging,gdpv4_nux_content,gdpv4_nux_enabled,android_dialog_configs,android_sdk_error_categories,app_events_session_timeout,app_events_feature_bitmask,auto_event_mapping_android,seamless_login,smart_login_bookmark_icon_url,smart_login_menu_icon_url,restrictive_data_filter_params,aam_rules,suggested_events_setting&format=json&sdk=android
```

image

@baltpeter
Copy link
Member Author

Okay, having had a little time to ponder this over the weekend and considering that there aren't really any other good alternatives to Typst here (#41 (comment)), I guess I am just going to accept this for the moment in the interest of moving along. For now, I'll implement the workaround and we'll have to hope that Typst fixes that sooner rather than later.

@baltpeter
Copy link
Member Author

Here's my current draft for what the export will look like (mashed together from various requests): https://typst.app/project/rOUJm_yR0ubv0bgmgM4rsn

@baltpeter
Copy link
Member Author

I stumbled across annoying differences in how different HAR implementations encode POST data: tweaselORG/TrackHAR#58 (comment)

Based on that investigation, I think we can use the following heuristic: We display params if it is set and non-empty, and text otherwise. In all the experiments, if params was set, either text was empty or params was a parsed version of text (and thus more "useful").

@baltpeter
Copy link
Member Author

I've got a first prototype of this working. Here's the output for a traffic capture from the Airbnb app on Android: PDF

Two more things I need to implement before I think we can consider this good enough for now:

  • Escaping (quite important! :D)
  • Translations

@baltpeter
Copy link
Member Author

Ok, for escaping, as far as I can tell (typst/typst#266), the best way would be to throw everything that could potentially contain Typst syntax into raw blocks. (The issue mentions that you can call .text on them but I think we want to render all instances as monospace anyway, so we shouldn't need that.)

As per the documentation: "Within raw blocks, everything (except for the language tag, if applicable) is rendered as is, in particular, there are no escape sequences." So, this is perfectly fine, for example:

`#set heading(numbering: "")`

image

There is only one issue: What if there are backticks in our string?

Since there are no escape sequences in raw blocks at all, `\`` isn't going to work. But, as per the example in the docs, you can do this:

abc``` `backticks` ```def

The leading and trailing space will be trimmed, yielding:

image

This works for up to two consecutive backticks in the string. But what if there is three or more? That breaks again and I didn't find any documentation or issue advising on what to do in that case.

However I found that we can use our zws hack again: There is only a problem for consecutive backticks. So, if we prepend a zero-width space to every backtick, we're fine again:

``` test`​``test ```

image

@baltpeter
Copy link
Member Author

Yup, escaping seems to work fine now. I've created an evil.har that tries its darndest to break things…

…but doesn't succeed at doing that, anymore: PDF

@baltpeter
Copy link
Member Author

Having had a quick look at the available options for translation libraries, these all seem waaay too complicated for this project. We don't even need the (few) features preact-i18n offers! Guess I'll just roll my own. :D

@baltpeter
Copy link
Member Author

The translations I'm coming up seem utterly ridiculous. But I did compare mine to Firefox's and they are just as ridiculous (well, Firefox just straight up doesn't have translations for many dev tools strings, but the ones that are translated, anyway). shrug

@baltpeter
Copy link
Member Author

Alright, I'll consider this done now.

In the interest of making progress with the actual project and since I could well imagine that we'll iterate on the output some more in the future, I won't release this as standalone library yet but just keep the code in the complaint generator for now.

I've opened #43 to remind us to release a library in the future, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant