Re: [Proposal] Page Compression for OLTP - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [Proposal] Page Compression for OLTP
Date
Msg-id alpine.DEB.2.22.394.2005210914440.2856263@pseudo
Whole thread Raw
In response to [Proposal] Page Compression for OLTP  (chenhj <chjischj@163.com>)
Responses Re: [Proposal] Page Compression for OLTP
List pgsql-hackers
Hello,

My 0.02€, some of which may just show some misunderstanding on my part:

  - you have clearly given quite a few thoughts about the what and how…
    which makes your message an interesting read.

  - Could this be proposed as some kind of extension, provided that enough
    hooks are available? ISTM that foreign tables and/or alternative
    storage engine (aka ACCESS METHOD) provide convenient APIs which could
    fit the need for these? Or are they not appropriate? You seem to
    suggest that there are not.

    If not, what could be done to improve API to allow what you are seeking
    to do? Maybe you need a somehow lower-level programmable API which does
    not exist already, or at least is not exported already, but could be
    specified and implemented with limited effort? Basically you would like
    to read/write pg pages to somewhere, and then there is the syncing
    issue to consider. Maybe such a "page storage" API could provide
    benefit for some specialized hardware, eg persistent memory stores,
    so there would be more reason to define it anyway? I think it might
    be valuable to give it some thoughts.

  - Could you maybe elaborate on how your plan differs from [4] and [5]?

  - Have you consider keeping page headers and compressing tuple data
    only?

  - I'm not sure there is a point in going below the underlying file
    system blocksize, quite often 4 KiB? Or maybe yes? Or is there
    a benefit to aim at 1/4 even if most pages overflow?

  - ISTM that your approach entails 3 "files". Could it be done with 2?
    I'd suggest that the possible overflow pointers (coa) could be part of
    the headers so that when reading the 3.1 page, then the header would
    tell where to find the overflow 3.2, without requiring an additional
    independent structure with very small data in it, most of it zeros.
    Possibly this is not possible, because it would require some available
    space in standard headers when the is page is not compressible, and
    there is not enough. Maybe creating a little room for that in
    existing headers (4 bytes could be enough?) would be a good compromise.
    Hmmm. Maybe the approach I suggest would only work for 1/2 compression,
    but not for other target ratios, but I think it could be made to work
    if the pointer can entail several blocks in the overflow table.

  - If one page is split in 3 parts, could it creates problems on syncing,
    if 1/3 or 2/3 pages get written, but maybe that is manageable with WAL
     as it would note that the page was not synced and that is enough for
     replay.

  - I'm unclear how you would manage the 2 representations of a page in
    memory. I'm afraid that a 8 KiB page compressed to 4 KiB would
    basically take 12 KiB, i.e. reduce the available memory for caching
    purposes. Hmmm. The current status is that a written page probably
    takes 16 KiB, once in shared buffers and once in the system caches,
    so it would be an improvement anyway.

  - Maybe the compressed and overflow table could become bloated somehow,
    which would require the vaccuuming implementation and add to the
    complexity of the implementation?

  - External tools should be available to allow page inspection, eg for
    debugging purposes.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Is it useful to record whether plans are generic or custom?