Discussion:
[sbase] Proposal of suckless compression
(too old to reply)
Ralph Eastwood
2014-09-23 11:30:21 UTC
Permalink
Hi,

Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.

As Gzip,Bzip2 and XZ rely on rather complicated code bases, I propose
that a different algorithm (probably based off ROLZ), be used
instead, with a focus on a suckless implementation as opposed to
speed/compression ratio. Having said that, most ROLZ implementations
do tend to have greater speed and higher compression ratio than Gzip.

I know I could easily create a suckless implementation, but would it
be of interest to be included in sbase or is it beyond the scope of
the collection of utilities?

Cheers,
Ralph
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
FRIGN
2014-09-23 11:39:45 UTC
Permalink
On Tue, 23 Sep 2014 12:30:21 +0100
Ralph Eastwood <***@gmail.com> wrote:

Hey Ralph,
Post by Ralph Eastwood
Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.
As Gzip,Bzip2 and XZ rely on rather complicated code bases, I propose
that a different algorithm (probably based off ROLZ), be used
instead, with a focus on a suckless implementation as opposed to
speed/compression ratio. Having said that, most ROLZ implementations
do tend to have greater speed and higher compression ratio than Gzip.
maybe you have followed the discussion about the imagefile-format.
We currently compress to bz2 by default, but are always happy to see
how other algorithms fare with it.
A ROLZ-implementation could be pretty great!
Post by Ralph Eastwood
I know I could easily create a suckless implementation, but would it
be of interest to be included in sbase or is it beyond the scope of
the collection of utilities?
If it's so simple, why don't you set it up and we test it out?
I'd love to have a simple, suckless compression-algorithm and
see no problem in adding one to the suckless-universe.

Cheers

FRIGN
--
FRIGN <***@frign.de>
Wolfgang Corcoran-Mathe
2014-09-24 00:18:04 UTC
Permalink
Post by FRIGN
I'd love to have a simple, suckless compression-algorithm and
see no problem in adding one to the suckless-universe.
Is there something wrong with sflate?
--
Wolfgang Corcoran-Mathe
Dimitris Papastamos
2014-09-23 11:44:45 UTC
Permalink
Post by Ralph Eastwood
Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.
As Gzip,Bzip2 and XZ rely on rather complicated code bases, I propose
that a different algorithm (probably based off ROLZ), be used
instead, with a focus on a suckless implementation as opposed to
speed/compression ratio. Having said that, most ROLZ implementations
do tend to have greater speed and higher compression ratio than Gzip.
I know I could easily create a suckless implementation, but would it
be of interest to be included in sbase or is it beyond the scope of
the collection of utilities?
I'd be very interested including this tool in sbase.
Dmitrij D. Czarkoff
2014-09-24 00:29:17 UTC
Permalink
Post by Ralph Eastwood
Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.
IMO generating compressed tarballs with rare compression scheme sucks.
Aren't tarballs supposed to be lingua franca of data storage?
--
Dmitrij D. Czarkoff
Ralph Eastwood
2014-09-24 04:42:45 UTC
Permalink
On 24 September 2014 01:18, Wolfgang Corcoran-Mathe
Post by Wolfgang Corcoran-Mathe
Is there something wrong with sflate?
I had missed sflate, thanks! Although, having now looked, I'm
envisioning something smaller than sflate.
Post by Wolfgang Corcoran-Mathe
IMO generating compressed tarballs with rare compression scheme sucks.
Aren't tarballs supposed to be lingua franca of data storage?
I don't think the compression format is defined by POSIX; as far as I
can see XZ is really recent but has gained traction in some
distributions. In terms of actual usefulness, this compression scheme
would be a nice addition for suckless based tools to use e.g. package
manager etc. Use outside of suckless is harder to gauge; there are a
lot of compression methods out there.
Post by Wolfgang Corcoran-Mathe
Post by Ralph Eastwood
Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.
IMO generating compressed tarballs with rare compression scheme sucks.
Aren't tarballs supposed to be lingua franca of data storage?
--
Dmitrij D. Czarkoff
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
Dimitris Papastamos
2014-09-24 08:47:56 UTC
Permalink
Post by Ralph Eastwood
I don't think the compression format is defined by POSIX; as far as I
can see XZ is really recent but has gained traction in some
distributions. In terms of actual usefulness, this compression scheme
would be a nice addition for suckless based tools to use e.g. package
manager etc. Use outside of suckless is harder to gauge; there are a
lot of compression methods out there.
That was my intention too, we have a package manager[0] for morpheus[1]
that currently uses libarchive. I'd like to see how well it works with
this new tool/lib. Since sbase is envisioned to be part of the base system
I'd not worry too much about the compression format. After all, this
is highly experimental. There's a possibility that this might turn
out to suck. We'll see.

libarchive seems relatively sensible to use and has focus on static
linking and code size reduction but it still is VERY huge (nearing 100ksloc
or so?)

[0] http://git.2f30.org/pkgtools
[1] http://morpheus.2f30.org/
Dimitris Papastamos
2014-09-24 08:50:00 UTC
Permalink
On a side note, I think you should write this as a few routines
in their own .c and place them under util/. It would be useful to
be able to grab the routines and embed them in another project.
Just create a .h to expose any functions/data structures required
and write the tool on top of that.
Hiltjo Posthuma
2014-09-24 11:02:10 UTC
Permalink
Post by Ralph Eastwood
Some time ago, there was some discussion about sbase's tar with
compression. I was wondering if this compression tool would
necessarily have to be a standard gzip/bzip2/xz implementation.
For sbase I think it should be, because gzip and bzip2 are the norm.
Not everything that is the norm is sane or even nice ofcourse, but for
sbase I'd want a minimal stable set of unix tools that work well.
Post by Ralph Eastwood
As Gzip,Bzip2 and XZ rely on rather complicated code bases, I propose
that a different algorithm (probably based off ROLZ), be used
instead, with a focus on a suckless implementation as opposed to
speed/compression ratio. Having said that, most ROLZ implementations
do tend to have greater speed and higher compression ratio than Gzip.
I know I could easily create a suckless implementation, but would it
be of interest to be included in sbase or is it beyond the scope of
the collection of utilities?
It is beyond the scope imho.

FWIW I think this should not be in sbase. A tar and gzip
implementation though would be nice to have. The tricky part for tar
might be to have it's behaviour to be mostly compatible with existing
implementations[0].

Maybe it's an idea to polish the "flate"[1] code instead?

The SLOC of libarchive is high because it supports alot of formats
(the ones that are not needed can be disabled). From what I've seen
the API code is quite clean. pkgtools from morpheus works quite well
with it too![2]

[0] http://www.linuxquestions.org/questions/slackware-14/why-pkgtools-still-using-tar-1-13-a-721813/
[1] http://git.suckless.org/flate/
[2] http://git.2f30.org/pkgtools/
Ralph Eastwood
2014-09-24 11:55:58 UTC
Permalink
Post by Hiltjo Posthuma
For sbase I think it should be, because gzip and bzip2 are the norm.
Not everything that is the norm is sane or even nice ofcourse, but for
sbase I'd want a minimal stable set of unix tools that work well.
Although the norm changes - if 'compress' wasn't patent encumbered, I guess
there would be wide support for it still.
Post by Hiltjo Posthuma
FWIW I think this should not be in sbase. A tar and gzip
implementation though would be nice to have. The tricky part for tar
might be to have it's behaviour to be mostly compatible with existing
implementations[0].
I think I would stick with tar anyway, and there is a tar[0]
implementation in sbase anyway.


I guess even if the preference for it not to be included in sbase, it
can find a separate home in the suckless world.

I'm just choosing my entropy coder at the moment - so far the simplest
implementations for this boils down to a bitwise arithmetic coder
(patents for this has expired now), (bytewise) range coder and rANS
[1]. Huffman, although faster than any of these is 150-200 lines of
code at a glance at flate, and I don't think that's the most optimal
version.

In terms of code complexity, the bitwise arithmetic coder is simplest,
but also the slowest of the bunch. rANS is faster than both (and has
more optimising potential) but has the downside that the input stream
must be in reverse order of the output stream, which is easily worked
around by buffering/encoding in blocks. Range coding has the best
compromise, as the streams can encoded forward which gives the nice
property of writing almost as soon as possible (there is some
buffering in edge cases, but only by a few bytes). I think I can
implement range coding in < 100 lines.

The stream format can be quite simple: MAGIC|===DATA===|EOS|MD5SUM



[0] http://git.suckless.org/sbase/plain/tar.c
[1] https://github.com/rygorous/ryg_rans
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
Dmitrij D. Czarkoff
2014-09-24 12:14:55 UTC
Permalink
Post by Ralph Eastwood
Although the norm changes - if 'compress' wasn't patent encumbered, I guess
there would be wide support for it still.
And there is. Check "-Z" option in the manual of you tar.
--
Dmitrij D. Czarkoff
Ralph Eastwood
2014-09-24 12:35:03 UTC
Permalink
Post by Dmitrij D. Czarkoff
And there is. Check "-Z" option in the manual of you tar.
GNU tar has the option, but also searches for the 'compress' binary,
which isn't always installed by default.
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
Ralph Eastwood
2014-09-24 12:51:01 UTC
Permalink
See [0] about implementations; OpenBSD even had 'gzip' aliased to?
compress. It appears that the lz77/deflate gzip is a GNUism.

Nothing bad about that - but I think although the current norm
dominates, it is only the current norm and people can shift.

[0] http://en.wikipedia.org/wiki/Gzip
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
Dmitrij D. Czarkoff
2014-09-24 15:41:29 UTC
Permalink
OpenBSD even had 'gzip' aliased to? compress.
Had? From gzip(1) manual:

| HISTORY
| gzip compatibility was added to compress(1) in OpenBSD 3.4. The
| `g' in this version of gzip stands for ``gratis''.

$ ls -1i /usr/bin/{compress,gzip}
1455743 /usr/bin/compress
1455743 /usr/bin/gzip
It appears that the lz77/deflate gzip is a GNUism.
All three implementations (by GNU, NetBSD and OpenBSD) use deflate.
Naturally this is GNUism - GNU version is canonical one.
Nothing bad about that - but I think although the current norm
dominates, it is only the current norm and people can shift.
People can shift, but archives can't. Most of tarballs that already
available as .tgz or .tar.gz will remain in that format forever, and
having to use GNU tar in order to use them is unacceptable.

Sure, one can do just "cat tarball.tgz | gunzip | tar x", which is even
more UNIXy then "tar xzf tarball.tgz", but the latter form is what
literally all UNIX users do for at least a decade.

P.S.: GNU and FreeBSD implementations of tar have "-J" option for
xz-compressed tarballs. NetBSD and OpenBSD don't have it. I am not
sure whether this allows to say that ".txz" tarballs are not a norm yet.
--
Dmitrij D. Czarkoff
Ralph Eastwood
2014-09-24 15:57:25 UTC
Permalink
Post by Dmitrij D. Czarkoff
People can shift, but archives can't. Most of tarballs that already
available as .tgz or .tar.gz will remain in that format forever, and
having to use GNU tar in order to use them is unacceptable.
Yeah, that does make sense. I've not really looked into it, but does
flate have edge cases
it does not handle as yet?
Post by Dmitrij D. Czarkoff
P.S.: GNU and FreeBSD implementations of tar have "-J" option for
? xz-compressed tarballs. NetBSD and OpenBSD don't have it. I am not
? sure whether this allows to say that ".txz" tarballs are not a norm yet.

The introduction of bzip2 and xz always surprised me. Perhaps the
authors of those
formats were the only ones that approached GNU to have them included.
--
Tai Chi Minh Ralph Eastwood
***@gmail.com
Dmitrij D. Czarkoff
2014-09-24 17:21:54 UTC
Permalink
Post by Ralph Eastwood
The introduction of bzip2 and xz always surprised me. Perhaps the
authors of those formats were the only ones that approached GNU to
have them included.
Actually GNU tar supports several compression tools:

* gzip
* xz
* bzip2
* lzip
* lzma
* lzop
* compress

Bzip2 and xz-utils are merely most popular. Bzip2 was patent-free
compression tools with best ratio for quite some time, and xz-utils most
likely owe to popularity of 7-zip on Windows. (AFAIK it is also the
tool providing the best compression ratio on UNIX now.)
--
Dmitrij D. Czarkoff
Loading...