Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id 9e1077bc-c5c1-425b-07ad-424d3522363f@enterprisedb.com
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
Responses Re: Zedstore - compressed in-core columnar storage  (Jacob Champion <pchampion@vmware.com>)
Re: Zedstore - compressed in-core columnar storage  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
Hi,

Thanks for the updated patch. It's a quite massive amount of code - I I
don't think we had many 2MB patches in the past, so this is by no means
a full review.

1) the psql_1.out is missing a bit of expected output (due to 098fb0079)

2) I'm getting crashes in intarray contrib, due to hitting this error in
lwlock.c (backtrace attached):

    /* Ensure we will have room to remember the lock */
    if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
        elog(ERROR, "too many LWLocks taken");

I haven't investigates this too much, but it's regular build with
asserts and TAP tests, so it should be simple to reproduce using "make
check-world" I guess.


3) I did a very simple benchmark, loading a TPC-H data (for 75GB),
followed by pg_dump, and the duration (in seconds) looks like this:

           master    zedstore/pglz    zedstore/lz4
  -------------------------------------------------
   copy      1855            68092            2131
   dump       751              905             811

And the size of the lineitem table (as shown by \d+) is:

  master: 64GB
  zedstore/pglz: 51GB
  zedstore/lz4: 20GB

It's mostly expected lz4 beats pglz in performance and compression
ratio, but this seems a bit too extreme I guess. Per past benchmarks
(e.g. [1] and [2]) the difference in compression/decompression time
should be maybe 1-2x or something like that, not 35x like here.

[1]
https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de

[2]
https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de

Furthermore, the pglz compression is not consuming the most CPU, at
least that's what perf says:

    24.82%  postgres          [.] encode_chunk_varlen
    20.49%  postgres          [.] decode_chunk
    13.01%  postgres          [.] merge_attstream_guts.isra.0
    12.68%  libc-2.32.so      [.] __memmove_avx_unaligned_erms
     8.72%  postgres          [.] encode_chunk_fixed
     6.16%  postgres          [.] pglz_compress
     4.36%  postgres          [.] decode_attstream_cont
     2.27%  postgres          [.] 0x00000000000baff0
     1.84%  postgres          [.] AllocSetAlloc
     0.79%  postgres          [.] append_attstream
     0.70%  postgres          [.] palloc

So I wonder if this is a sign of a deeper issue - maybe the lower
compression ratio (for pglz) triggers some sort of feedback loop in
zedstore, or something like that? Not sure, but this seems strange.


4) I looked at some of the code, like merge_attstream etc. and I wonder
if this might be related to some of the FIXME comments. For example this
bit in merge_attstream seems interesting:

    * FIXME: we don't actually pay attention to the compression anymore.
    * We never repack.
    * FIXME: this is backwords, the normal fast path is if (firsttid1 >
lasttid2)

But I suppose that should affect both pglz and lz4, and I'm not sure how
up to date those comments actually are.

BTW the comments in general need updating and tidying up, to make
reviews easier. For example the merge_attstream comment references
attstream1 and attstream2, but those are not the current parameters of
the function.


5) IHMO there should be a #define specifying the maximum number of items
per chunk (60). Currently there are literal constants used in various
places, sometimes 60, sometimes 59 etc. which makes it harder to
understand the code. FWIW 60 seems a bit low, but maybe it's OK.


6) I do think ZSAttStream should track which compression is used by the
stream, for two main reasons. Firstly, there's another patch to support
"custom compression" methods, which (also) allows multiple compression
methods per column. It'd be a bit strange to support that for varlena
columns in heap table, and not here, I guess. Secondly, I think one of
the interesting columnstore features down the road will be execution on
compressed data, which however requires compression method designed for
that purpose, and it's often datatype-specific (delta encoding, ...).

I don't think we need to go as far as supporting "custom" compression
methods here, but I think we should allow different built-in compression
methods for different attstreams.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Krasiyan Andreev
Date:
Subject: Re: Implement for window functions
Next
From: Vik Fearing
Date:
Subject: Re: Implement for window functions