Discussion:
[dev] freetype2/fc pain
AR Garbe
2018-09-23 04:10:37 UTC
Permalink
Hi there,

I have been revising dmenu/dwm/libsl in terms of simplicity due to a
migration to OpenBSD recently.

I can't get my head around on how much the elegance and clarity of
dwm/dmenu/libsl code has suffered from the introduction of freetype2
and fc usage.

Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.

I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.

What do you guys think about this idea?

I barely use multihead setups and I don't give a f*ck about
anti-aliasing. This whole freetype2 move seems utterly wrong. I didn't
see it as critical before, but now I more and more conclude it has to
go.

Best regards,
Anselm
AR Garbe
2018-09-23 06:53:13 UTC
Permalink
Post by AR Garbe
I can't get my head around on how much the elegance and clarity of
dwm/dmenu/libsl code has suffered from the introduction of freetype2
and fc usage.
[..]
Post by AR Garbe
I barely use multihead setups and I don't give a f*ck about
anti-aliasing. This whole freetype2 move seems utterly wrong. I didn't
see it as critical before, but now I more and more conclude it has to
go.
I did investigate the options and made up my mind. Here is my verdict:

The idea behind libsl has to be improved in code and I will work on
this. The drw.h API is not strictly enough defined and both dwm and
dmenu access certain aspects of drw.h that they shouldn't, which makes
it currently impossible to cleanly implement either simple plain X11
support or let the Xft/fc abomination survive in one possible
direction or to introduce a different implementation like cairo-based.

I will reassess if the xlib dependent part in dwm can be separated
further as well, to allow a more agnostic WM core.

I know that I did raise the multihead question a couple of times in
the past, and mostly the picture I gathered was 50/50 -- one half uses
Xinerama setups, the other doesn't. Thus my old idea of arranging the
code in a different way might be an answer, which would allow building
dwm single-headed (without Screen) and multi-headed (witch Screen
derived from Xinerama).

I think this whole effort will lead to 6.2 rather than some fork. But
I want it be easier to built a clean dwm without the cruft in setups
where most of the cruft is (fortunately still) absent.

BR,
Anselm
Quentin Rameau
2018-09-23 11:58:46 UTC
Permalink
Hello Anselm,
Post by AR Garbe
The idea behind libsl has to be improved in code and I will work on
this. The drw.h API is not strictly enough defined and both dwm and
dmenu access certain aspects of drw.h that they shouldn't, which makes
it currently impossible to cleanly implement either simple plain X11
support or let the Xft/fc abomination survive in one possible
direction or to introduce a different implementation like cairo-based.
Maybe drw is not needed at all and could be put back into main code.
Post by AR Garbe
I will reassess if the xlib dependent part in dwm can be separated
further as well, to allow a more agnostic WM core.
Hum, this was tried for st, and imo it didn't bring much benefit, only
a half-separation of code into files, but not totally separated.
Post by AR Garbe
I know that I did raise the multihead question a couple of times in
the past, and mostly the picture I gathered was 50/50 -- one half uses
Xinerama setups, the other doesn't. Thus my old idea of arranging the
code in a different way might be an answer, which would allow building
dwm single-headed (without Screen) and multi-headed (witch Screen
derived from Xinerama).
I use Xinerama, but I can live with it being stripped off into the
patches section.
Would it really bring much more simplicity?
Post by AR Garbe
I think this whole effort will lead to 6.2 rather than some fork. But
I want it be easier to built a clean dwm without the cruft in setups
where most of the cruft is (fortunately still) absent.
Yes, no need for a fork, I think we all agree on the font madness and
will find common choices
AR Garbe
2018-09-23 16:09:44 UTC
Permalink
Post by Quentin Rameau
Post by AR Garbe
The idea behind libsl has to be improved in code and I will work on
this. The drw.h API is not strictly enough defined and both dwm and
dmenu access certain aspects of drw.h that they shouldn't, which makes
it currently impossible to cleanly implement either simple plain X11
support or let the Xft/fc abomination survive in one possible
direction or to introduce a different implementation like cairo-based.
Maybe drw is not needed at all and could be put back into main code.
This is another option, though there are commonalities between dwm and
dmenu in terms of font drawing, which is why it once was put into a
common place.

It is a good question how much stricter abstraction would harm or
benefit. I'm in the process to determine that.
Post by Quentin Rameau
Post by AR Garbe
I will reassess if the xlib dependent part in dwm can be separated
further as well, to allow a more agnostic WM core.
Hum, this was tried for st, and imo it didn't bring much benefit, only
a half-separation of code into files, but not totally separated.
Well I gave up on this already. The issue is that any potential
abstraction of X would not be a good interface for other non-X
environments like considering a suckless wayland compositor or
implementing the dwm on top of something else. It is too much
overengineering for no benefit.
Post by Quentin Rameau
Post by AR Garbe
I know that I did raise the multihead question a couple of times in
the past, and mostly the picture I gathered was 50/50 -- one half uses
Xinerama setups, the other doesn't. Thus my old idea of arranging the
code in a different way might be an answer, which would allow building
dwm single-headed (without Screen) and multi-headed (witch Screen
derived from Xinerama).
I use Xinerama, but I can live with it being stripped off into the
patches section.
Would it really bring much more simplicity?
This rearrangement of the code actually looks promising -- I'm not
talking about putting Xinerama support away as separate patch, but
rather putting it into a separate file and having a simple dwm per
screen (with some extra sugar to allow sending windows across the
screens like before if Xinerama is enabled).

Best regards,
Anselm
Roberto E. Vargas Caballero
2018-09-23 08:59:10 UTC
Permalink
Hi,
Post by AR Garbe
I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.
I feel the same about st. I also think that adding freetype was a really
bad idea. Maybe we should begin to think about going one step back.
Post by AR Garbe
I barely use multihead setups and I don't give a f*ck about
Well, multihead is a very common setup today, but as always it is only
a question of being used to don't use it.
Post by AR Garbe
anti-aliasing. This whole freetype2 move seems utterly wrong. I didn't
see it as critical before, but now I more and more conclude it has to
go.
Indeed.

Roberto.
AR Garbe
2018-09-23 16:14:11 UTC
Permalink
Hi Roberto,

On Sun, 23 Sep 2018 at 01:59, Roberto E. Vargas Caballero
Post by Roberto E. Vargas Caballero
Post by AR Garbe
I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.
I feel the same about st. I also think that adding freetype was a really
bad idea. Maybe we should begin to think about going one step back.
I totally agree. I'd be in favour in a st just using plain X fonts.
This emoji unicode porn and anti-aliasing TTF support doesn't make
sense to me.

The pattern here is thinking really longterm. The XFont* calls have
been pretty damn stable for two decades, whereas this freetype shit
breaks every year. On each new install I need to fiddle with font
paths font caches, weird config files etc. It has to stop.

Best regards,
Anselm
Eric Pruitt
2018-09-23 18:56:27 UTC
Permalink
Post by AR Garbe
I totally agree. I'd be in favour in a st just using plain X fonts.
This emoji unicode porn and anti-aliasing TTF support doesn't make
sense to me.
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.

Eric
AR Garbe
2018-09-24 06:19:46 UTC
Permalink
Post by Eric Pruitt
Post by AR Garbe
I totally agree. I'd be in favour in a st just using plain X fonts.
This emoji unicode porn and anti-aliasing TTF support doesn't make
sense to me.
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.

I weigh code clarity and consistency higher than being able to render
all kinds of glyphs through fallback font declarations.
If a user needs to display cyrillic, there are plain old xfonts with
excellent support.

Best regards,
Anselm
Eric Pruitt
2018-09-24 06:31:29 UTC
Permalink
Post by AR Garbe
Post by Eric Pruitt
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
Post by AR Garbe
I weigh code clarity and consistency higher than being able to render
all kinds of glyphs through fallback font declarations.
If a user needs to display cyrillic, there are plain old xfonts with
excellent support.
If supporting languages spoken by billions of people isn't reason enough
for you, then there's not much point in me arguing because I can't come
up with a better rationale.

Eric
Eric Pruitt
2018-09-24 06:34:38 UTC
Permalink
Post by Eric Pruitt
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
I forgot to add that supporting Japanese, Chinese and Korean is THE
reason I wrote "Add Xft and follback-fonts support to graphics lib" /
14343e69cc596b847f71f1e825d3019ab1a29aa8 in dwm. I was tired of seeing
missing glyph rectangles all the time, and the Pango patches I
originally used no longer applied to HEAD.

Eric
Silvan Jegen
2018-09-24 07:15:55 UTC
Permalink
Post by Eric Pruitt
Post by AR Garbe
Post by Eric Pruitt
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
Just chiming in to say that I am using st with Japanese/Chinese fonts every
day as well.

I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.


Cheers,

Silvan
Hiltjo Posthuma
2018-09-24 07:35:11 UTC
Permalink
Post by Silvan Jegen
Post by Eric Pruitt
Post by AR Garbe
Post by Eric Pruitt
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
Just chiming in to say that I am using st with Japanese/Chinese fonts every
day as well.
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
Cheers,
Silvan
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).

There are also some fonts which support many glyphs like:
http://unifoundry.com/unifont/index.html
--
Kind regards,
Hiltjo
s***@gmail.com
2018-09-24 12:28:21 UTC
Permalink
Post by Silvan Jegen
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
I remember the time I did not know st existed: I used mlterm (which deals with
even worse than rendering glyphs: asian/indian/thai/arabic/etc input methods).

If I recall properly, its config file allows to map
style(bold|non-bold)/unicode ranges to a specific font (fontconfig naming). I
did that to display asian glyphs in lynx and ncurses links.
Rendering/layout-ing correctly fonts from an international point of view is a
huge task, and only _one_ software component does that: harfbuzz, a massive c++
pile of ...

Could add a table in st config header to do so, but with x11 font naming.
Terminals require only bold and/or non-bold versions of a font if I am correct.
I don't recall a terminfo application that could use more styles (italic with
vim for syntax coloring and style highlighting?).

If I recall again properly, the xserver core font system does map fontconfig
fonts into into core x11 fonts naming anyway.

For a x11 terminal, that should be fairly enough, but we have to keep in mind
that for a suckless wayland terminal (if wayland reaches its goals and becomes
really stable in time), the rendering (probably pixmap and freetype) and input
management (probably libinput), will have to be dealt by the terminal code
itself. I have native x11 on my system only because I play video games and
xwayland with vulkan/GL hardware acceleration is far from optimal last time I
checked.
--
Sylvain
Petr Šabata
2018-09-24 12:34:34 UTC
Permalink
Post by s***@gmail.com
Post by Silvan Jegen
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
I remember the time I did not know st existed: I used mlterm (which deals with
even worse than rendering glyphs: asian/indian/thai/arabic/etc input methods).
Out of curiosity, what special things does mlterm do there?
I can input "exotic" characters in st just fine (via XIM).

P
Roberto E. Vargas Caballero
2018-09-24 15:14:55 UTC
Permalink
Post by Hiltjo Posthuma
Post by Silvan Jegen
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
Of course no, I am not talking about that. Only saying that freetype was a really
bad idea and we should try to search something different.
Post by Hiltjo Posthuma
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).
Yes, this is as also a good idea. Maybe we can implement some table
to relate glyph and font.

Regards,
Eon S. Jeon
2018-09-24 16:26:52 UTC
Permalink
Post by Roberto E. Vargas Caballero
Post by Silvan Jegen
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
Of course no, I am not talking about that. Only saying that freetype was a really
bad idea and we should try to search something different.
I think fontconfig is to blame here. It’s what st is using to find fallbacks. It’s just that we don’t want such highly sophisticated fallback mechanism. We all love simple infallible code, so that we can always blame users. ;)

(Note that we still need fontconfig for loading and configuring fonts. It still makes our lives easier.)

Also, I believe freetype is essential. st has one visual component and that’s text. st should render it good. At least that’s what people will consider suck-less.
Roberto E. Vargas Caballero
2018-09-25 08:10:18 UTC
Permalink
I think fontconfig is to blame here. It?s what st is using to find fallbacks. It?s just that we don?t want such highly sophisticated fallback mechanism. We all love simple infallible code, so that we can always blame users. ;)
(Note that we still need fontconfig for loading and configuring fonts. It still makes our lives easier.)
Also, I believe freetype is essential. st has one visual component and that?s text. st should render it good. At least that?s what people will consider suck-less.
We can go to simple x fonts, is much simpler. Suckless is about simplicity of the code.
Eon S. Jeon
2018-09-25 11:09:30 UTC
Permalink
Post by Roberto E. Vargas Caballero
I think fontconfig is to blame here. It?s what st is using to find fallbacks. It?s just that we don?t want such highly sophisticated fallback mechanism. We all love simple infallible code, so that we can always blame users. ;)
(Note that we still need fontconfig for loading and configuring fonts. It still makes our lives easier.)
Also, I believe freetype is essential. st has one visual component and that?s text. st should render it good. At least that?s what people will consider suck-less.
We can go to simple x fonts, is much simpler. Suckless is about simplicity of the code.
Simplicity is one thing, it become pointless if software is useless. Useless software sucks and it will only shame this community. We must make useful software simple, not the other way around.

Even when X core fonts simply works, that’s not how the rest of world works. Even I can’t imagine myself running xfontsel anymore.

Also, if st is to support wayland, FT/FC are inevitable. We will end up being there, and we can just work on it now.

Again, we need freetype. We better look into how we can make freetype/fontconfig related code simple, instead of trying to revert back to early 90s.
David Demelier
2018-10-19 12:45:58 UTC
Permalink
Post by AR Garbe
Post by Eric Pruitt
Post by AR Garbe
I totally agree. I'd be in favour in a st just using plain X fonts.
This emoji unicode porn and anti-aliasing TTF support doesn't make
sense to me.
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Hello,

Unfortunately, I remember having terminus not rendering some kind of
unicode characters not even emojis. I think it was some drawing characters.

Many tools like to use those characters (just even tree).

Is freetype/fontconfig the only way to implement fallback characters? I
remember when dwm switched to Xft a while back ago, I was surprised at
first but if that choice was made I think there were a lot of good
reasons isn't it?

Also, I'm not sure if many bitmap fonts render correctly on 4k screens.
I just love how Fira Mono renders on mine. But I should test some.
--
David
s***@gmail.com
2018-10-19 13:52:33 UTC
Permalink
Unfortunately, I remember having terminus not rendering some kind of unicode
characters not even emojis. I think it was some drawing characters.
If those "characters" are actually using "combined" unicode rendering, namely
using more than 1 unicode encoding point and font glyph (rough estimate), it's
over: you would need an unicode text shaper (which would use freetype as a
backend) to combine/transform "properly" the font glyphs into the 1 "char"
("extend grapheme cluster" in unicode terminology) you want.

If those "characters" are actually mapped to only 1 font glyph, well, they were
probably filtered out for some reasons (could be missing from the font files)
or it's a "limitation".
--
Sylvain
AR Garbe
2018-10-19 19:48:35 UTC
Permalink
Hi David,
Post by David Demelier
Post by AR Garbe
Post by Eric Pruitt
Post by AR Garbe
I totally agree. I'd be in favour in a st just using plain X fonts.
This emoji unicode porn and anti-aliasing TTF support doesn't make
sense to me.
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Unfortunately, I remember having terminus not rendering some kind of
unicode characters not even emojis. I think it was some drawing characters.
Many tools like to use those characters (just even tree).
Are you referring to curses border characters? They shouldn't be a
problem at all with plain old xfonts.
Post by David Demelier
Also, I'm not sure if many bitmap fonts render correctly on 4k screens.
I just love how Fira Mono renders on mine. But I should test some.
To me xfonts (like terminus) look way more crisp on such monitors
(with say size 24px or something similar depending on resolution) than
those heavily AA TTF freetype fonts.

But a matter of taste I presume.

BR,
Anselm
Hiltjo Posthuma
2018-09-23 11:49:01 UTC
Permalink
Post by AR Garbe
Hi there,
I have been revising dmenu/dwm/libsl in terms of simplicity due to a
migration to OpenBSD recently.
I can't get my head around on how much the elegance and clarity of
dwm/dmenu/libsl code has suffered from the introduction of freetype2
and fc usage.
Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.
I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.
What do you guys think about this idea?
I barely use multihead setups and I don't give a f*ck about
anti-aliasing. This whole freetype2 move seems utterly wrong. I didn't
see it as critical before, but now I more and more conclude it has to
go.
Best regards,
Anselm
Hi Anselm,

I agree with all the points about Freetype2. It has been bothering me for some
time too.

I think we should remove the drw.{c,h} abstractions also. The abstractions go
against the principle of having mostly one file to hack on and increasing
readability.

At the time it seemed like a good idea (also to me) to have libdrw to support
Wayland for example. It sounded nice in theory.

I think we should keep Xinerama though, but it wouldn't be an issue if it is a
wiki patch either.

For st there was a X11/terminal code split to support Wayland, automated
testing of terminal emulator code. Now there are abstractions but it is not
useful. Maybe it should be reverted also?

If/when we revert the code we should take good care to not introduce
regressions and review and test carefully. It has happened many times big code
changes and code reverts caused regressions.
--
Kind regards,
Hiltjo
Roberto E. Vargas Caballero
2018-09-23 12:02:34 UTC
Permalink
Post by Hiltjo Posthuma
For st there was a X11/terminal code split to support Wayland, automated
testing of terminal emulator code. Now there are abstractions but it is not
useful. Maybe it should be reverted also?
The abstractions in st are really small and I think in this case was more
about splitting the big file than introducing abstractions. The functions
that are called from st.c are the same before or after the split.
s***@gmail.com
2018-09-23 13:35:50 UTC
Permalink
Post by Hiltjo Posthuma
For st there was a X11/terminal code split to support Wayland, automated
testing of terminal emulator code. Now there are abstractions but it is not
useful. Maybe it should be reverted also?
Hi,

st has a clean wayland fork? BTW, suckless wayland compositor, still too early
to talk about it?
--
Sylvain
AR Garbe
2018-09-23 16:22:00 UTC
Permalink
Post by s***@gmail.com
st has a clean wayland fork? BTW, suckless wayland compositor, still too early
to talk about it?
I think a suckless wayland compositor - if it is something to be
worked on, should become a separate project.

Best regards,
Anselm
s***@gmail.com
2018-09-23 17:20:27 UTC
Permalink
Post by s***@gmail.com
st has a clean wayland fork? BTW, suckless wayland compositor, still too early
to talk about it?
And has st a wayland backend or fork?
--
Sylvain
Silvan Jegen
2018-09-23 20:07:45 UTC
Permalink
Post by s***@gmail.com
Post by s***@gmail.com
st has a clean wayland fork? BTW, suckless wayland compositor, still too early
to talk about it?
And has st a wayland backend or fork?
There is no "official" suckless project but the author of the velox[0]
compositor also maintains a Wayland st fork[1]. The st fork uses an
older version of a Wayland protocol but there already exists a pull
request to remedy that.

The author of the fork also maintains a simple Wayland drawing library[2]
(which is also used in the Wayland st fork) and a Wayland compositor[3]
library.


Cheers,

Silvan


[0] https://github.com/michaelforney/velox
[1] https://github.com/michaelforney/st
[2] https://github.com/michaelforney/wld
[3] https://github.com/michaelforney/swc
Roberto E. Vargas Caballero
2018-09-24 15:16:09 UTC
Permalink
Post by Silvan Jegen
The author of the fork also maintains a simple Wayland drawing library[2]
(which is also used in the Wayland st fork) and a Wayland compositor[3]
library.
The author is the current maintainer of sbase.
Silvan Jegen
2018-09-24 18:49:46 UTC
Permalink
Post by Roberto E. Vargas Caballero
Post by Silvan Jegen
The author of the fork also maintains a simple Wayland drawing library[2]
(which is also used in the Wayland st fork) and a Wayland compositor[3]
library.
The author is the current maintainer of sbase.
Yes, that is true. Still waiting on his feedback for the sbase testing
approach though :P

Michael has also written a "tiny" backend for netsurf[0] that works with
Wayland (in the oasis branch). Very interesting stuff!


Cheers,

Silvan

[0] https://github.com/michaelforney/netsurf
Roberto E. Vargas Caballero
2018-09-24 15:15:32 UTC
Permalink
Post by s***@gmail.com
And has st a wayland backend or fork?
Yes, Michael Forney wrote it.
Manu Raster
2018-09-24 11:27:08 UTC
Permalink
Post by Hiltjo Posthuma
Post by Silvan Jegen
Post by Eric Pruitt
Post by AR Garbe
Post by Eric Pruitt
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
Just chiming in to say that I am using st with Japanese/Chinese fonts every
day as well.
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
Cheers,
Silvan
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).
Very good! That's where the problem should be addressed. Solving font
problems pre-runtime at font-file level saves many lines of code.

Normally, in non-asian setups only a fraction of the glyphs beyond
ascii are used at a time and those few can easily be merged in
pre-runtime if not already present e.g. some emojis.

It also keeps font files small and avoids loading many megabytes of
unneeded glyphs into memory.

Regards,

Manu
Eon S. Jeon
2018-09-24 15:22:14 UTC
Permalink
Post by Manu Raster
Post by Hiltjo Posthuma
Post by Silvan Jegen
Post by Eric Pruitt
Post by AR Garbe
Post by Eric Pruitt
It's not just about Emoji or anti-aliasing. If you work with languages
that use non-Latin characters, support for fallback fonts is a must.
Well, are you using st with glyphs that require fallback fonts?
I wonder if at suckless we should aim for the general purpose.
Yes, st's fallback font support is the main reason I began to use it. I
use st and dwm with Japanese and Chinese text almost every single day.
Just chiming in to say that I am using st with Japanese/Chinese fonts every
day as well.
I don't think we should throw out support for a feature that more than a billion
people on the planet rely on. That doesn't mean that we can't rethink how we
go about supporting that feature though.
Cheers,
Silvan
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).
Very good! That's where the problem should be addressed. Solving font
problems pre-runtime at font-file level saves many lines of code.
Normally, in non-asian setups only a fraction of the glyphs beyond
ascii are used at a time and those few can easily be merged in
pre-runtime if not already present e.g. some emojis.
Hello, Manu.

Sorry, but merging font is not a good option.

Each font contains settings like height, padding and hinting parameters, which are optimized for its glyphs by designers. So, merging fonts (or importing glyphs from other fonts) likely to cause character misalignment and hinting problems, especially when merging fonts of different languages.

Fontforge might do the trick, but it takes skills to roll that monster, and hinting is still difficult to customize AFAIK.


Cheers,
Eon
s***@gmail.com
2018-09-24 15:34:06 UTC
Permalink
Hi,
It seems my previous message did not went through.

I was showing how mlterm reach this goal:
in a config file, in a table in the config header for st, a mapping
between style(bold/non-bold)/unicode range to a font name.
--
Sylvain
Hadrien Lacour
2018-09-24 15:44:40 UTC
Permalink
Couldn't this be done like rxvt-unicode (or the current st fontarray patch)? You
specify a list of fonts, and the program iterates on it until it finds one that
provide the required character. With a very basic cache, it's pretty simple and
doesn't causes problems.

Of course, the range:font mapping is more granular, but I find it a little bit
more complex to configure than this type of fallback.
s***@gmail.com
2018-09-24 16:08:17 UTC
Permalink
Post by Hadrien Lacour
Of course, the range:font mapping is more granular, but I find it a little bit
more complex to configure than this type of fallback.
It does as some asian fonts do contain some latin glyphs. You have to specify
the unicode range, or to be sure the x11 font names have orthogonal encoding fields.
To prepare such font list, you may have to split font files per
encoding/unicode range, something like that.
--
Sylvain
Hadrien Lacour
2018-09-24 16:28:16 UTC
Permalink
Post by s***@gmail.com
Post by Hadrien Lacour
Of course, the range:font mapping is more granular, but I find it a little bit
more complex to configure than this type of fallback.
It does as some asian fonts do contain some latin glyphs. You have to specify
the unicode range, or to be sure the x11 font names have orthogonal encoding fields.
To prepare such font list, you may have to split font files per
encoding/unicode range, something like that.
--
Sylvain
Then you put your latin font first, since those usually don't contain CJK
characters. The only case where it could gives problems is if you don't want
fallback but really a merger of your fonts: e.g. using Terminus but wanting a
different font for cyrilic.

But yeah, I'm in favor of the range scheme, since it solves more problems and
may make the implementation cleaner. If there are alphabet aliases or comments
concerning common ranges in the config file, it should be pretty easy to use.
Ori Bernstein
2018-09-24 17:00:18 UTC
Permalink
Post by Hadrien Lacour
Then you put your latin font first, since those usually don't contain CJK
characters. The only case where it could gives problems is if you don't want
fallback but really a merger of your fonts: e.g. using Terminus but wanting a
different font for cyrilic.
But yeah, I'm in favor of the range scheme, since it solves more problems and
may make the implementation cleaner. If there are alphabet aliases or comments
concerning common ranges in the config file, it should be pretty easy to use.
As usual, plan 9 has a good solution:

cpu% cat /lib/font/bit/dejavusans/unicode.16.font
20 16
<snip>
0x03a9 0x03a9 dejavusans.16.03a9
0x0101 0x0201 dejavusans.16.0101
0x0020 0x007e dejavusans.16.0020
0x0000 0x0000 dejavusans.16.0000
0xf400 0xf500 ../dejavu/dejavu.16.f400
0x2e18 0x2f18 ../dejavu/dejavu.16.2e18
0x2b00 0x2c00 ../dejavu/dejavu.16.2b00
0x28a2 0x29a2 ../dejavu/dejavu.16.28a2
0x27a1 0x28a1 ../dejavu/dejavu.16.27a1
0x0000 0x0100 ../dejavu/dejavu.16.0000
0x3000 0x30fe ../shinonome/k16.3000
0x4e00 0x4ffe ../shinonome/k16.4e00
0x5005 0x51fe ../shinonome/k16.5005
0x5200 0x53fa ../shinonome/k16.5200

Which has this meaning:

cpu% man 6 font
<snip>
...The format of the file is a header followed by any number
of subfont range specifications. The header contains two
numbers: the height and the ascent, both in pixels. The
height is the inter-line spacing and the ascent is the
distance from the top of the line to the base- line. These
numbers are chosen to display consistently all the subfonts
of the font. A subfont range specification contains two or
three numbers and a file name. The numbers are the
inclusive range of characters covered by the sub- font...
</snip>

There's little about this that is exclusive to Plan 9
fonts, or even to bitmap fonts, although if used with
truetype fonts, there may (or may not) need to be some
extra data stored.

A tool to generate this can be done with fontconfig, to get
the existing rendering, or can be implemented with a naive
ordered ranking of fonts, picking the first font that covers
a glyph.

For a slightly more sophisticated option, an ordered ranking
that attempts to minimize the number of "holes" in unicode
ranges could be written, ranking the scripts first in order
of script coverage, using the script table here:

https://www.unicode.org/Public/11.0.0/ucd/Scripts.txt

And then breaking ties by order of preference. That gives
you fewer jarring transitions, where one font happens to
implement a small number of glyphs but others cover the
range fully.
--
Ori Bernstein
Eric Pruitt
2018-09-25 03:01:31 UTC
Permalink
Post by Eon S. Jeon
Hello, Manu.
Sorry, but merging font is not a good option.
Each font contains settings like height, padding and hinting
parameters, which are optimized for its glyphs by designers. So,
merging fonts (or importing glyphs from other fonts) likely to cause
character misalignment and hinting problems, especially when merging
fonts of different languages.
Agreed. I actually tried doing this a few years ago. I had managed to
automate the process to some degree using Fontforge, but I ran into all
kinds of rendering issues that I didn't have the patience to debug.

Eric
Cág
2018-09-24 13:41:32 UTC
Permalink
Post by AR Garbe
Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.
What do you guys think about this idea?
A couple of ideas:
1. Having Xft and Xinerama support in the patches section
2. Create ifdefs for Xft as they are now for Xinerama (don't bite me)

--
caóc
AR Garbe
2018-09-24 17:10:08 UTC
Permalink
Hi there,
Post by Cág
Post by AR Garbe
Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.
What do you guys think about this idea?
1. Having Xft and Xinerama support in the patches section
I wouldn't suggesting going that far, rather separating the related
handling a bit further.
Post by Cág
2. Create ifdefs for Xft as they are now for Xinerama (don't bite me)
Well, this is something that needs to be determined. I more and more
conclude, that the libsl idea wasn't so bad and that it should be
solved by introducing a suckless text drawing library as true static
linkage dependency for dwm, dmenu and st. This library could become
something like harfbuzz for the terminal case, but way simpler and
take onboard the ideas discussed in this thread with different
strategies to have a list walk through to find the proper glyph of a
certain range. But it should also have a proper default that just
relies on the XFont API, like it used to be in d*. No fuzz, if your
system is limited to Xlib only.

This is kind of my conclusion now. The interesting aspect is coming up
with a nice .h file that would allow to clean up dwm/dmenu/st from
font handling and text drawing.
It is a bit of effort.

Best regards,
Anselm
Manu Raster
2018-09-24 18:12:11 UTC
Permalink
Post by Eon S. Jeon
Post by Manu Raster
Post by Hiltjo Posthuma
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).
Very good! That's where the problem should be addressed. Solving font
problems pre-runtime at font-file level saves many lines of code.
Normally, in non-asian setups only a fraction of the glyphs beyond
ascii are used at a time and those few can easily be merged in
pre-runtime if not already present e.g. some emojis.
Hello, Manu.
Greetings Eon,
Post by Eon S. Jeon
Sorry, but merging font is not a good option.
For only a couple of characters or a set of new emojis, imo it
is. Normally the goal is just to fill occasional gaps in running
text. Whenever a tofu is rendered, the tofu's codepoint appends a file
which becomes the not too long and complex list of missing glyphs. To be
acquired by the user, of course.
Post by Eon S. Jeon
Each font contains settings like height, padding and hinting
parameters, which are optimized for its glyphs by designers. So,
merging fonts (or importing glyphs from other fonts) likely to cause
character misalignment and hinting problems, especially when merging
fonts of different languages.
This is true for fonts for typesetting books but system fonts used to be
simple bitmap definitions of fixed measures. If we confine the user to
languages in phonetic writing systems in a latin-based script such as
english we are well off. Mixing radically different languages and
writing systems in one program is never needed but introduces a lot of
complexity (fallback, tables, caching, xml) and multiple and full blown
font files.
Post by Eon S. Jeon
Fontforge might do the trick, but it takes skills to roll that
monster, and hinting is still difficult to customize AFAIK.
Not that devil. Such a tool needs to be developed in-house. ;) If needed
at all. A file with glyphs to plunder, the list of tofus and a text
editor is already sufficient. The format to be used is bdf, a text
file. (Bitmap Distribution Format. X11 then uses the binary and
compressed pcf-version of it).

M.
Eon S. Jeon
2018-09-25 14:17:51 UTC
Permalink
Post by Manu Raster
Post by Eon S. Jeon
Post by Manu Raster
Post by Hiltjo Posthuma
I agree its useful. (Complex) fall-back font support has been on my mind also.
An idea could be of instead of supporting fallback fonts we could write some
font merge script (pre-runtime).
Very good! That's where the problem should be addressed. Solving font
problems pre-runtime at font-file level saves many lines of code.
Normally, in non-asian setups only a fraction of the glyphs beyond
ascii are used at a time and those few can easily be merged in
pre-runtime if not already present e.g. some emojis.
Hello, Manu.
Greetings Eon,
Post by Eon S. Jeon
Sorry, but merging font is not a good option.
For only a couple of characters or a set of new emojis, imo it
is. Normally the goal is just to fill occasional gaps in running
text. Whenever a tofu is rendered, the tofu's codepoint appends a file
which becomes the not too long and complex list of missing glyphs. To be
acquired by the user, of course.
Then every tofu means users must restart st to see the actual character, losing all states in the process. It might be simple but not sensible.
Post by Manu Raster
Post by Eon S. Jeon
Each font contains settings like height, padding and hinting
parameters, which are optimized for its glyphs by designers. So,
merging fonts (or importing glyphs from other fonts) likely to cause
character misalignment and hinting problems, especially when merging
fonts of different languages.
This is true for fonts for typesetting books but system fonts used to be
simple bitmap definitions of fixed measures. If we confine the user to
languages in phonetic writing systems in a latin-based script such as
english we are well off. Mixing radically different languages and
writing systems in one program is never needed but introduces a lot of
complexity (fallback, tables, caching, xml) and multiple and full blown
font files.
I use all of CJK and am not the only one who use non-Latin language in terminal. In the past, I too assumed I would use only English on terminal, but I cannot control what other people send to me. It is really stupid if I have to spin up xterm occasionally just to read those things.
s***@gmail.com
2018-09-25 15:52:19 UTC
Permalink
Post by Eon S. Jeon
I use all of CJK and am not the only one who use non-Latin language in
terminal. In the past, I too assumed I would use only English on terminal,
but I cannot control what other people send to me. It is really stupid if I
have to spin up xterm occasionally just to read those things.
st can display basic CJK glyphs. Just need the fonts.
I use google noto fonts, but you must remove the one related to color emojis or
st will crash if it attempts to display one.
In lynx or ncurses links, I have basic CJK glyphs displayed (actually way more,
thai...).
--------
Now I digress:
font complex rendering is not supported, since it would need the c++
abomination which is harfbuzz. I did a C implementation of its C interface
(optional in the EFL and freetype, but mandatory for GTK+) for basic rendering
and believe me, c++ is something which came from Hell (I am currently in llvm,
and... I did not believe some ppl could to worse than harbuzz with c++, but
this language allows unlimited perversion of the mind). The one and only
software component which is "generalizing" the rendering insanities of many
written, unicode supported, languages is harfbuzz.
A line is to be drawn here.
--
Sylvain
s***@gmail.com
2018-09-25 21:25:12 UTC
Permalink
Hi again,

I did refresh my knowledge on unicode/font stuff, and yes, st will be screwed:

An unicode string has 4 canonical normalizations. But only one (NFD) seems to
be futur proof regarding what features will be supported by font files
(opentype(microsoft tm)/open font format).

Ofc, this is the one canonical normalizations which hard depends on harfbuzz
shaping in freetype. For instance the glyph 'é' won't be anymore 1 glyph (a
"pre-combined" glyph) in the font file but will be the combined rendering of
'e' + 'combining accent' glyphs which only harbuzz understands and not freetype
alone. Font designers are pushed to avoid making "pre-combined" glyphs:
pre-combined glyphs are not allowed in unicode anymore (actually, it has been
the case for quite some time).
And that's the simple case of combined glyphs...

Additionally, xml smile/svg vector rendering was introduced in the otf/ttf font
format with animated color emojis: A futur "clean" pure xml font format is
lurking on the horizon (open type 2?).

The unicode canonical normalization also affects input: the application won't
receive anymore 1 unicode code point for a "pre-combined" symbol 'é', but 2
unicode code points 'e' + 'combining accent'.

st is surrounded.

The suckless futur proof solution: it is over, st goes 7bits ascii only with
it's own bitmap fonts... non english-only terminal users will just trash it.

... or a suckless futur proof unicode/font stack will have to be coded:
- unicode normalizer (NFD) (like ICU)
- a full xml smile/svg vector renderer (like librsvg/expat for the svg part)
- a ttf/otf -> xml svg translator (in freetype).

... or st becomes like surf: an app which is a thin suckless wrapper around a
huge pile of ... You know what: st would be better of being a thin wrapper
around libvte then, because it would be even thiner.

:(
--
Sylvain
AR Garbe
2018-09-25 22:16:26 UTC
Permalink
Hi Silvain,
Post by s***@gmail.com
The suckless futur proof solution: it is over, st goes 7bits ascii only with
it's own bitmap fonts... non english-only terminal users will just trash it.
Sounds like a better plan for longevity.
Post by s***@gmail.com
- unicode normalizer (NFD) (like ICU)
- a full xml smile/svg vector renderer (like librsvg/expat for the svg part)
- a ttf/otf -> xml svg translator (in freetype).
Sounds like a dumb idea to me ;)
Post by s***@gmail.com
... or st becomes like surf: an app which is a thin suckless wrapper around a
huge pile of ... You know what: st would be better of being a thin wrapper
around libvte then, because it would be even thiner.
We need a simple terminal that sucks less. No extra dependencies.

Thanks for sharing your thoughts. To me it becomes more obvious now
what the next steps are.

Best regards,
Anselm
AR Garbe
2018-09-25 23:28:22 UTC
Permalink
Hi Laslo,
On Tue, 25 Sep 2018 21:25:12 +0000
struct sfl { ... };
sfl_init(struct sfl *s, char **files, size_t nfiles);
sfl_draw(...);
sfl_free(struct sfl *s);
This is something I was considering, however it looks like the water
of the babie's bathtub is poisoned with freetype2/fc bacteria. I don't
wanna introduce abstractions that might be premature, hence I suggest
to fully revert back to XFont* API until something like this is proven
to be work properly. It can only improve the performance of st as well
;)

Best regards,
Anselm
AR Garbe
2018-09-25 23:35:57 UTC
Permalink
On Tue, 25 Sep 2018 16:28:22 -0700
Post by AR Garbe
This is something I was considering, however it looks like the water
of the babie's bathtub is poisoned with freetype2/fc bacteria. I don't
wanna introduce abstractions that might be premature, hence I suggest
to fully revert back to XFont* API until something like this is proven
to be work properly. It can only improve the performance of st as well
;)
what a fitting analogy! :D
I am also in favor of going back to XFont. It's the best compromise, but
we should not forget the drawbacks.
Hehe, well there are drawbacks, but at least we are well aware of the
limitations and we won't see people complaining about crashes on emoji
glyphs ever again ;)

-Anselm
s***@gmail.com
2018-09-26 02:08:03 UTC
Permalink
Post by AR Garbe
This is something I was considering, however it looks like the water
of the babie's bathtub is poisoned with freetype2/fc bacteria. I don't
wanna introduce abstractions that might be premature, hence I suggest
to fully revert back to XFont* API until something like this is proven
to be work properly. It can only improve the performance of st as well
;)
Actually, the real pollution comes from harfbuzz(freetype2 and fc are specks in
comparison). But all that is for the rendering part only.

I don't remember how terminal input is done, but it seems that NFD normalized
unicode does not work: you would get 2 unicode code points for a 'é' input, the
'e' + 'combining accent', namely terminal input would have to be transactional
at the char level, or the terminal would render 'e' then the 'combining accent' and
not 'é' as it should be. Or maybe unicode does contain "transactional code
points" already.

:(
--
Sylvain
s***@gmail.com
2018-09-26 02:21:31 UTC
Permalink
On Tue, 25 Sep 2018 21:25:12 +0000
Post by s***@gmail.com
- a full xml smile/svg vector renderer (like librsvg/expat for the svg part)
No, forget about SVG fonts. Nobody sane would think about implementing
this while keeping simplicity and security in mind.
But they are already going this way: the color and animated emojis are xml
smile/svg documents which are embedded into oft/ttf files.

It is just a matter of time before the vector parts of otf/ttf files are
xml-ized with font specific augmented information.
--
Sylvain
s***@gmail.com
2018-09-26 12:37:19 UTC
Permalink
The vast majority of fonts uses the "native" OTF/TTF format anyway and
will in the future, because anything else would be a waste of resources
both on the font-developer-side and the rendering-part.
This is where I am not that confident looking at how things are going.

I guess the right thing would be to start a vector renderer (a lean cairo
without freetype), which understands the otf/ttf vector tables (not the too old
ones)/NFD unicode in a step by step process, starting with the "easy" combined
glyph rendering step (it would cover many common glyphs already). This renderer
would not include the tons of quirk fixes freetype does include, but fonts are
not hardware, they can be fixed quickly and redistributed (no need to be
microsoft bug compatible)

If microsoft decides to do the same thing with otf that what they did for
.doc->.docx (xml-ization), only the front-end would have to be adapted, but the
vector "hard" work would have be mostly done.

That does not solve the issue of unicode NFD input though. Terminal line
discipline is basically disabled since the line editing is done on application
side (with readline or libedit). I guess xkb can emit a transactional series of
unicode code points, then the xserver and wayland (xkb with libinput) should be
fine. But how readline or libedit can be told that a "char" is actually several
unicode code points? This is true while echo-ing back that "char" to the
rendering part on the line too. Really, I hope unicode does include some
"transactional mark unicode points".

Would have been fun to deal with that, but I am currently in that c++ brain
diareha which is llvm.

Oh! I forgot, a little thing on the side, librsvg is being rewritten in... rust
(The rust bootstrap SDK is a joke, have a look at it) xD
--
Sylvain
s***@gmail.com
2018-09-27 19:40:06 UTC
Permalink
Hi again,

I did dive a bit deeper in latest unicode, and it's even worst of what I
thought.

To deal with real unicode input/output and to split it in "extended graphem
clusters" (an unicode "char"), you need a finite state machine (I guess that's
what Lalso was referering to). And it's the same for the "line returns"
handling.

Additionnaly, unicode NFC normalization is kind of useless (the one chosen for
the web), since they have forbidden pre-combined glyph for a long time, you end
up implementing NFD stuff anyway (that move was obviously malicious).

So, the real culprits are actually written languages: they suck. Namely, you
cannot write suckless code for tons of written languages, and on top of that,
simple written languages handling being generalized with some of the most complex
written languages, handling properly those simple written languages will use
the same complex/generalized definitions and mecanisms.

On the rendering side, those complex mecanisms allow font designers to spare a
good chunk of work: the one required for pre-combined glyphs. Expect in fonts
less and less pre-combined glyphs, with a uniq unicode points mapping to them,
and that even for simple written languages. And expect lighter font files.

It means there is no good real middle ground (a good middle ground in the web
would be, basic xhtml without javascript).

And st in all that?
Do like linux line discipline drivers? Namely do handle utf8 encoded
unicode code points (no extended graphem cluster) only, and actually do work on ascii?

For suckless, as a consistant whole, it means:
- It becomes an ascii only framework (Anselm, seems to like this), and will be
kind of useless for any text interacting application going beyond ascii
(i.e. no more mutt with non ascii email, no more lynx with non ascii only web
page...). A zero-i18n framework. In the case of wayland st: own
ascii bitmap fonts and own font renderer.

- suckless gets its own unicode handling code (libicu/freetype+harfbuzz
look-alike implementation).
--
Sylvain
s***@gmail.com
2018-09-28 02:05:20 UTC
Permalink
...
The function bound() just operates on relatively small LUTs and is
pretty efficient. If we implement a font drawing library in some way,
we will have to think about how we do this special handling right.
Extended grapheme clusters fortunately really stand for themselves and
can be a good "atom" to base font rendering on.
Agreed: the "atom" would be this "extended grapheme cluster", and from this
point of view, a terminal would be a grid of "space" and "extended grapheme".
...
Javascript has its purposes if applied lightly and always as an
afterthought (i.e. the page works 100% without Javascript).
Unfortunately, I am still working out some issues before sueing the french
administration for that...
This is not a bash or anything but really just due to the fact that all
this processing on higher layers is a question of efficiency,
especially when e.g. the UNIX system tools are used with plain ASCII
data 99% of the time, not requiring all the UTF-8 processing.
For pure system tools ofc. But then I would need an i18n terminal for mutt,
lynx, etc.
I would not favor such a solution, but this is just my opinion.
Idem, for the previous reasons.
...
I've not yet dared to touch NFD or generally normalization and string
comparison, but for simple stream-based operations and to get a grasp
of a stream and where the bounds for extended grapheme clusters are
you, by definition of bound(), only need to know the current and
previous code point to know when a "drawn character" is finished.
Still even there we would need bounds, as Unicode sets no limit for the
size of an extended grapheme cluster. But this is a "problem" of the
implementing application itself and not of the library, which I strive
to have no memory allocations at all.
Well, there is something about stream safe unicode application. Basically, it
is a buffer of 128 bytes (32 unicode points) with a continuation mark if a
"extented grapheme cluster" is not finished at the end of the buffer. It seems
related only to stream normalization on the fly, though.

I did not go that deep into the "extended grapheme cluster" boundaries
computation, it seems that everything we need is there, but it raises many
more questions, for instance:
- how this finite state machine is resilient to garbage data?
- can we locate "extended grapheme cluster" boundaries on non normalized unicode?
- can we normalize on the fly a "extented grapheme cluster"?
- etc...

regards,
--
Sylvain
s***@gmail.com
2018-09-28 13:38:03 UTC
Permalink
On Fri, 28 Sep 2018 02:05:20 +0000
...
Post by s***@gmail.com
Well, there is something about stream safe unicode application.
Basically, it is a buffer of 128 bytes (32 unicode points) with a
continuation mark if a "extented grapheme cluster" is not finished at
the end of the buffer. It seems related only to stream normalization
on the fly, though.
At this point we need to just question this insanity. As I like to
jokingly say, even some African tribe with a very delicate language
would not have grapheme clusters longer than say 10 code points or so.
Everything even above 10-20 elements screams unicode exploit (remember
those accent-trees that used to flood online chats?) and it would
definitely be enough to just have a fixed size buffer (varied in
config.h) for grapheme clusters.
That's what the specs says: "extended grapheme cluster" (EGC) should not go
beyond 10 unicode points "in theory". This stream-safe thingy seems to apply to
non normalized unicode stream with it's 32 unicode points and continuation
mark.

With that "continuation mark", an EGCs can go to "infinity and
beyond"... and the application is in charge of the size of the "infinity and
beyond" (aka, _you better deal with microsoft, apple, google and mozilla
"infinity and beyond"_).

I am in favor of a hard limit of 32 unicode points, with a nice 128 bytes
shifting buffer (AVX/MMX register size if I recall properly). The "continuation
mark" would switch the state machine in "discarding" mode, and certainly not in
"infinity and beyond" memory allocation. The parser would need to switch to a
discarding state till the "infinity and beyond" EGC terminator bound or some
corruption.
Depends on the level. A safe UTF-8 dencoder catches garbage on its
level, and will replace it with an Unicode "invalid" code point (forgot
the name).
On higher levels, everything is within the bounds of the Unicode spec
and the "invalid" code point is just another code point.
I wonder how this is handled in lynx, ncurses, vim, readline, libedit, etc...
Wild guess: their "atom" in only 1 unicode point. Probably some work will have
to be done here... (and their maintainers won't be happy...)
...
Post by s***@gmail.com
- can we normalize on the fly a "extented grapheme cluster"?
Yes, but don't worry about that too much as we don't need normalization
as much as you probably think.
Agreed, as far as I can think of, with my limited knowledge on unicode, it
would be kind of required only for the EGC renderer in order to help the
"rendering correctness".
Additionally, ill EGCs with tons of combining code points (less than 32 though)
will likely be "compressed" by this normalization.
...
regards,
--
Sylvain
s***@gmail.com
2018-09-29 12:59:15 UTC
Permalink
On Fri, 28 Sep 2018 13:38:03 +0000
No, not even that. We only need normalization really if we want to do
"perceptual" string comparisons, which is generally questionable for
UNIX tools.
mmmh... for the reason I stated before, the fonts files will probably be more
and more NFD normalization only (lighter font files, and significantly less
work to do for font designers). Font files will miss more and more pre-combined
(legacy) glyphs: full decomposition in base glyphs will be more and more
required.

I have not gone into the details of the EGC boundaries algorithm, but I'm really
curious to how the unicode consortium algorithm can know that an unicode point
is an EGC terminator without looking the next unicode point.
--
Sylvain
s***@gmail.com
2018-09-29 20:59:04 UTC
Permalink
On Sat, 29 Sep 2018 12:59:15 +0000
Dear Sylvain,
Post by s***@gmail.com
mmmh... for the reason I stated before, the fonts files will probably
be more and more NFD normalization only (lighter font files, and
significantly less work to do for font designers). Font files will
miss more and more pre-combined (legacy) glyphs: full decomposition
in base glyphs will be more and more required.
no, that's unlikely, as they cannot impose the data format that is
still prominently used for all data exchange. The only thing that might
happen is that font libraries will need to do some normalization, but
maybe we are discussing nonsense here and the TTF format has some kind
of way to refer to other glyphs and combine them or something.
That what I did think first, but much of "lambda user" software uses
super-sucking renderers (harfbuzz/graphite/apple one/uniscribe/other) doing
glyph combining. Therefore, for fonts with missing pre-combined glyphs those
renderers will NFD normalized (with huge tables for CJK) them and use the
combining glyphs which are more likely to be in the font file.
Post by s***@gmail.com
I have not gone into the details of the EGC boundaries algorithm, but
I'm really curious to how the unicode consortium algorithm can know
that an unicode point is an EGC terminator without looking the next
unicode point.
It does in fact. The algorithm works by determining if _between_ to code
points there is an EGC terminator.
Ok, then the consequences are spectacular: anything interactive has to work
with context tracking, and this, unicode point per unicode point.

You cannot know if an EGC is complete till you have the next unicode point
following the last unicode point of this very EGC.

For instance a terminal cell would have to be redrawn for each unicode point
transmitted by the terminal application (invalid or not), because it cannot
presume it's the last one but must display something. The good thing about
that: it allows unicode point per unicode point input (a bit like if a 8 bits
char was transmitted/inputed bit per bit and drawn for each bit received).

Where some maintainers may cringe: something that was very easily in sync with
non-"real"-unicode text, the grid segmentation of unicode text of terminal text
editors must exactly match the one from any properly i18n-ed terminal. I foresee
that some quite significant "funny" things are going to happen here.

Now, regarding suckless, from my super not legitimate, wannabe and humble
opinion:

If I presume suckless being limited to system tools, and I don't think being
that wrong, then I 100% agree on the 100% ascii. st should go full ascii, but
since there are plans to support wayland (which is consistent for a graphical
terminal emulator), it means suckless own ascii only font format (100% text
plz) and a custom renderer. Reverting to XFont will cleanup the base code for a
futur really suckless wayland backend.
But for consistency, some apps should be put in a non-suckless (or i18n)
category, a bit like gnu/non-gnu (i.e. surf). In this category, the terminal
ppl may find some full i18n, terminal based and hybrid (similar to mplayer,
with i18n subs) and user oriented apps, because full i18n support cannot be
suckless since most written languages suck really hard.

regards,
--
Sylvain
David Demelier
2018-10-19 14:03:41 UTC
Permalink
Post by AR Garbe
Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.
I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.
Are you saying that you would like to remove the support for multiple
screens?

If yes, please don't minimize the number of multiheads setups, both at
work and at home I have two monitors. Or as Quentin said at least in a
separate patch.

However, I don't care if dwm runs on two screens without being able to
have a window in the “middle” split in both views.

Regards,
--
David
AR Garbe
2018-10-19 19:51:23 UTC
Permalink
Post by David Demelier
Post by AR Garbe
Back in the days I also concluded that the introduction of Xinerama
and multihead support was a bad idea after all.
I'm really at a point to consider forking dwm and dmenu to simply rely
on X11 as it used to be, perhaps with going the extra mile to remove
Xinerama support as well and to rely on single headed setups.
Are you saying that you would like to remove the support for multiple
screens?
No. However I've come up with a smaller dwm.c that excludes most of
the Screen abstraction and allows to be compiled without XINERAMA
support enabled and a xinerama.c that introduces the Screen primitives
and contains the sendmon() et. al. functions.

It's still not final yet, but I'm considering this as right move.
Post by David Demelier
If yes, please don't minimize the number of multiheads setups, both at
work and at home I have two monitors. Or as Quentin said at least in a
separate patch.
No worries, xinerama will be supported, but I wanted to have a leaner
core dwm without xinerama.

Best regards,
Anselm

Loading...