Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using unicode emoji in blog posts deletes remaining text input? #30

Open
ksangwin opened this issue Sep 17, 2024 · 1 comment
Open

Using unicode emoji in blog posts deletes remaining text input? #30

ksangwin opened this issue Sep 17, 2024 · 1 comment
Labels
database change Requires a change to the database and therefore a significant update process

Comments

@ksangwin
Copy link

ksangwin commented Sep 17, 2024

Hey, I just edited a blog post by adding a short message to the start. After the first sentence, I input the unicode emoji "🫠". It seemed to display correctly in the text field, but after submitting, I found that all text that came after that unicode emoji was deleted (ie, 99% of my blog post).

I suppose I'm not surprised that blog posts don't support emojis (although it would be a nice quality of life improvement if they did), but I'm a little miffed that I've lost 2 long blog posts this way. I kinda wish Grawlix wouldn't just drop the entire remainder of my post? I would have expected the emoji to display an unprintable character or &#129760 while still leaving the rest of the post in tact.

At least, I'm assuming it's the emoji that's been the issue. When I view the post and go to edit it again, I see the last character in the Post content is the whitespace before the emoji.

Possibly unrelated, but I've noticed some of my longer blog titles have been getting truncated as well. I'm assuming I'm hitting a character limit? Or maybe I'm typing some special character that it doesn't like? Either way, that hasn't been a great user experience when I go to make an announcement and find half my headline is gone.

Would it be worthwhile to add some kind of preview feature for new blog posts? At least that way, I could see ahead of time if my inputs aren't going to be accepted and I'll have chance to fix them before it goes live.

@eishiya
Copy link
Collaborator

eishiya commented Sep 18, 2024

The problem seems to be that the database is set yo use utf8mb3 encoding (it uses "utf8", which is an alias for "utf8mb3"), which is a maximum of 3 bytes per character, but emoji and other characters outside the Basic Multilingual Plane require 4 bytes. We would need to change the database to use "utf8mb4" encoding instead if we want to support storing emoji in the database. A change to the database means a new firstrun script and a new upgrade script, so it's something we're putting off, it would be a significant breaking change.

An alternative fix that doesn't require database changes would be to have Grawlix replace 4-byte characters with their corresponding HTML entities, e.g. 🫠 for 🫠 as you suggested. This would be pretty ugly when you want to edit the post (unless we resolve HTML entities back to characters, which is a whole other can of worms), but better than breaking things. On the other hand, it adds a lot of string processing that shouldn't really be necessary, and Grawlix already applies HTML input sanitisation...

utf8mb3 is deprecated, so a switch to utf8mb4 will be required eventually anyway, so I think a change to the database is the best route to take.

In the interim, you can use emoji as HTML entities yourself, BUT they'll be displayed as actual emoji when you edit the blog post, and unless you replace them with HTML entities again every time you edit, they'll get eaten 🙃

Regarding post titles: There is a 64-character limit, as that's the length of the field in the database. This would also require changing the database, so it's something we'd have to save for when we're ready to do a breaking change. Please consider opening an issue for that so we don't forget!
In the interim, it would be nice to at least limit the text field in the admin panel to those same 64 characters (not as easy as you might think, again due to encoding issues - the database accepts a maximum of 64 bytes, which isn't the same as 64 characters).
And a related tidbit, in case it's of interest: The blog post text is stored as type TEXT, which means it allows a maximum length of 65535 bytes (which again, may mean fewer characters than that).

@eishiya eishiya added the database change Requires a change to the database and therefore a significant update process label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database change Requires a change to the database and therefore a significant update process
Projects
None yet
Development

No branches or pull requests

2 participants