Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Alexandra Wang
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id CACiyaSr3EEMR=wjdhf9XZiBuOgB0bqdwjPyS5Yh63d-fpACBPQ@mail.gmail.com
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Zedstore - compressed in-core columnar storage
List pgsql-hackers


On Sun, Aug 18, 2019 at 12:35 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

 . I was missing a way to check for compression ratio;

Here are the ways to check compression ratio for zedstore:

Table level:
select sum(uncompressedsz::numeric) / sum(totalsz) as compratio from
pg_zs_btree_pages(<tablename>);

Per column level:
select attno, count(*), sum(uncompressedsz::numeric) / sum(totalsz) as
compratio from pg_zs_btree_pages(<tablename>) group by attno order by attno;
 
it looks like zedstore
   with lz4 gets ~4.6x for our largest customer's largest table.  zfs using
   compress=gzip-1 gives 6x compression across all their partitioned tables,
   and I'm surprised it beats zedstore .
 
What kind of tables did you use? Is it possible to give us the schema
of the table? Did you perform 'INSERT INTO ... SELECT' or COPY?
Currently COPY give better compression ratios than single INSERT
because it generates less pages for meta data. Using the above per column
level compression ratio will provide which columns have lower
compression ratio.

We plan to add other compression algorithms like RLE and delta
encoding which should give better compression ratios for column store
along with LZ4. 

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.