Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting keys are language dependent #48

Open
qnga opened this issue Feb 4, 2020 · 15 comments
Open

Sorting keys are language dependent #48

qnga opened this issue Feb 4, 2020 · 15 comments

Comments

@qnga
Copy link
Member

qnga commented Feb 4, 2020

Currently RWPM supports sortAs in subjects, titles and contributors independently of their localized names. But sorting key is in fact language-dependent and should be supported as such.

I think, for example title, should be used as follows:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

Or

title: {
  "name": "Le Tour du monde en quatre-vingts jours",
  "sortAs": "Tour du monde en quatre-vingts jours, Le"
}

Or

title: {
  "name": "Around the World in Eighty Days"
}

In Kotlin app, everything is ready for that. We have an object LocalizedString that contains objects Translation which may contain a sorting key besides canonical string.

@HadrienGardeur
Copy link
Member

I understand your point but that's really not the usage for this.

Sorting keys are mostly used by reading apps to handle the following actions in a bookshelf:

  • order publications by title or author
  • filter publications for a specific author/subject/series

For this specific use case, having multiple sorting keys is more confusing than helpful.

When we look at EPUB files or OPDS feeds, it's already a miracle when they include both a sorting key and multiple translations. I don't think we can ever expect to get sort keys for each translation (I'm not even sure if that's doable with EPUB 3.x).

It's also worth pointing out that behind the scene, we're actually working with JSON-LD and not JSON.

The following example would not be proper JSON-LD since language maps in JSON-LD can only support literals and not objects with JSON-LD 1.1:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

This limitation means that currently, we can't really support @direction either (see #33)

@qnga
Copy link
Member Author

qnga commented Feb 4, 2020

Indeed, it seems to be few use cases. Maybe a bilingual edition?
A more JSON-LD compliant alternative would be to make sortAs a language map, as title is now. The shortcut syntax

"sortAs"  = "Tour du monde en quatre-vingts jours, Le"

would still be able to be used.

@qnga
Copy link
Member Author

qnga commented Feb 4, 2020

We could also align with W3C Publication manifest which uses an array of LocalizableString objects to support text direction.

@HadrienGardeur
Copy link
Member

I'm a bit wary of revisiting this right now:

  • we can cover the most common use cases (multiple Japanese scripts, bilingual editions)
  • language maps provides us with a much nicer JSON syntax than the current W3C approach
  • I don't know how an app would actually implement multiple sorting keys

@chocolatkey
Copy link
Member

chocolatkey commented Feb 5, 2020

This comes at a perfect time. I'm currently implementing such a system:
image
Where it would be nice to fit that data in a webpub. Since I hadn't gotten to the point of generating them yet, I hadn't even considered the fact that sortAs is a string only in the schema. Is there a way I could fit all this data in? Internally, the data is given like this:

[
    {
        "name": "The Combat Baker and Automaton Waitress",
        "sortAs": "Combat Baker and Automaton Waitress, The",
        "language": "en"
    },
    {
        "name": "戦うパン屋と機械じかけの看板娘",
        "sortAs": "タタカウパンヤトオートマタンウェイトレス",
        "language": "ja"
    }
]

I don't know how an app would actually implement multiple sorting keys

In my use case, the sorting key of the user's publisher or client language is used

@qnga
Copy link
Member Author

qnga commented Feb 5, 2020

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Feb 5, 2020

Thanks @chocolatkey for chiming in and proving me wrong regarding use cases 😉

Could you provide some additional context for this use case? It looks to me that you're trying to do the following:

  • this is a title where both the English and Japanese names of the publication are available
  • for the Japanese title, the "real" title has a mix of Kanji, Hiragana and Katakana
  • the sorting key for the Japanese title translates the real title to Katakana only

If we move sortAs away from being a literal and add the same language map approach that we use for title and name, this could be represented as:

{
  "title": {
    "en": "The Combat Baker and Automaton Waitress",
    "ja": "戦うパン屋と機械じかけの看板娘"
  },
  "sortAs": {
    "en": "Combat Baker and Automaton Waitress, The",
    "ja": "タタカウパンヤトオートマタンウェイトレス"
  }
}

@chocolatkey
Copy link
Member

chocolatkey commented Feb 5, 2020

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

In the system I am creating, the publishers (of translated Japanese doujin content) are going to have the ability to privately share a review copy of the publications with the original authors, and potentially have a mini-library for them. The original authors are usually not well-versed in English, so the original title needs to be present so the original and localized title can be displayed side-by-side. The katakana is included there for filtering and sorting purposes for when the titles are displayed in Japanese, both in that private frontend as well as the admin backend.

If we move sortAs away from being a literal and add the same language map approach that we use for title and name [...]

I think this would be a good idea, because it's backwards-compatible with the existing schema. My cases tend to be edge cases (which is probably good for probing at the limits of standard), but most people will just need "sortAs": "Single String, The". The way @HadrienGardeur represented my data in the example snippet is perfect.

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Feb 8, 2020

OK then let's vote through this issue using 👍and 👎on this message. I'll also bring it up in our weekly call.

Who's in favor of turning sortAs into a language map?

@qnga
Copy link
Member Author

qnga commented Apr 15, 2020

I drafted a proposal: https://github.com/qnga/webpub-manifest/blob/proposal/sortAs/proposals/001-multilingual-sortAs.md

@chocolatkey Would you add something more precise about your use case?

Here is an internal PR for suggestions and comments: qnga#1

@chocolatkey
Copy link
Member

@qnga what additional information would you like about my use case besides what I said previously? I can't think of much I didn't say

@qnga
Copy link
Member Author

qnga commented Apr 16, 2020

I was suggesting you may explain it right in the proposal. But it might not be necessary.

@chocolatkey
Copy link
Member

@qnga aha now it's clear. Would you like me to fork your fork and submit a PR or comment in your internal PR?

@llemeurfr
Copy link
Contributor

Note that the Go implementation of a Publication already has a "MultiLanguage" struct which is currently applied to the title and subtitle properties and could easily be applied to sortAs as well. Therefore the move is not hurtful for the Go code.

@qnga
Copy link
Member Author

qnga commented Apr 16, 2020

The easiest way is adding suggestion snippets in comments of the PR qnga#1

Thanks Laurent for the feedback about the Go implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants