Hi there,
in the past we've repeatedly discussed the option of using a different
compression algorithm (e.g. lz4), but every time the discussion died off
because of fear of possible patent issues [1] [2] and many other
threads. Have we decided it's not worth the risks, making patches in
this area futile?
The reason why I'm asking about this is the multivariate statistics
patch - while optimizing the planning overhead, I realized that
considerable amount of time is spent decompressing the statistics
(serialized as bytea), and using an algorithm with better decompression
performance (lz4 comes to mind) would help a lot. The statistics may be
a few tens/hundreds kB, and in the planner every millisecond counts.
Would a differentiated approach work? That is, either adding an initdb
option allowing the user to choose an alternative compression algorithm
(and thus let him consider the possible patent issues), or using
different algorithms for different pieces of data (e.g. keep pglz for
the user data, and lz4 for statistics).
The first option is quite trivial to implement - I already have an
experimental patch implementing that (attached, but a bit dirty). The
second option is probably more difficult (we'd have to teach tuple
toaster about multiple compression algorithms and pass that information
somehow). Also, I'm not sure it'd make the patent concerns go away ...
I'm a bit confused though, because I've noticed various other FOSS
projects adopting lz4 over the past few years and I'm yet to find a
project voicing the same concerns about patents. So either they're
reckless or we're excessively paranoid.
Also, lz4 is not the only compression algorithm available - I've done a
bunch of tests with lz4, lz4hc, lzo and snappy, and lzo actually
performed better than lz4 (not claiming that's a universal truth). But I
suppose that the patent concerns are not somehow specific to lz4 but
about the compression in general.
[1] http://www.postgresql.org/message-id/50EA7976.5060809@lab.ntt.co.jp
[2]
http://www.postgresql.org/message-id/20130614230142.GC19641@awork2.anarazel.de
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services