Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id CAHyXU0yvpbTCqYS-ebKO6+u-5KWKYfHoh9jwaq08aS7Va-_Rjg@mail.gmail.com
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Mon, Nov 16, 2020 at 10:07 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
> On 11/16/20 1:59 PM, Merlin Moncure wrote:
> > On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>            master    zedstore/pglz    zedstore/lz4
> >>   -------------------------------------------------
> >>    copy      1855            68092            2131
> >>    dump       751              905             811
> >>
> >> And the size of the lineitem table (as shown by \d+) is:
> >>
> >>   master: 64GB
> >>   zedstore/pglz: 51GB
> >>   zedstore/lz4: 20GB
> >>
> >> It's mostly expected lz4 beats pglz in performance and compression
> >> ratio, but this seems a bit too extreme I guess. Per past benchmarks
> >> (e.g. [1] and [2]) the difference in compression/decompression time
> >> should be maybe 1-2x or something like that, not 35x like here.
> >
> > I can't speak to the ratio, but in basic backup/restore scenarios pglz
> > is absolutely killing me; Performance is just awful; we are cpubound
> > in backups throughout the department.  Installations defaulting to
> > plgz will make this feature show very poorly.
> >
>
> Maybe. I'm not disputing that pglz is considerably slower than lz4, but
> judging by previous benchmarks I'd expect the compression to be slower
> maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly
> for the compression ratio - lz4 is great, but it seems strange it's 1/2
> the size of pglz. Which is why I'm speculating that something else is
> going on.
>
> As for the "plgz will make this feature show very poorly" I think that
> depends. I think we may end up with pglz doing pretty well (compared to
> heap), but lz4 will probably outperform that. OTOH for various use cases
> it may be more efficient to use something else with worse compression
> ratio, but allowing execution on compressed data, etc.

hm, you might be right.  Doing some number crunching, I'm getting
about 23mb/sec compression on a 600gb backup image on a pretty typical
aws server.  That's obviously not great, but your numbers are much
worse than that, so maybe something else might be going on.

> I think we may end up with pglz doing pretty well (compared to heap)

I *don't* think so, or at least I'm skeptical as long as insertion
times are part of the overall performance measurement.  Naturally,
with column stores, insertion times are often very peripheral to the
overall performance picture but for cases that aren't I suspect the
results are not going to be pleasant, and advise planning accordingly.

Aside, I am very interested in this work. I may be able to support
testing in an enterprise environment; lmk if interested -- thank you

merlin



pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: Heads-up: macOS Big Sur upgrade breaks EDB PostgreSQL installations
Next
From: Tomas Vondra
Date:
Subject: Re: planner support functions: handle GROUP BY estimates ?