Re: Compressed pluggable storage experiments - Mailing list pgsql-hackers

From Natarajan R
Subject Re: Compressed pluggable storage experiments
Date
Msg-id CAPqxBt6to3CH-gqkKCLDmuq+8y_1uXgKZEGyLPrmDkrUWaUfaA@mail.gmail.com
Whole thread Raw
In response to Re: Compressed pluggable storage experiments  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi all, This is a continuation of the above thread...

>> > 4. In order to use WAL-logging each page must start with a standard 24
>> > byte PageHeaderData even if it is needless for storage itself. Not a
>> > big deal though. Another (acutally documented) WAL-related limitation
>> > is that only generic WAL can be used within extension. So unless
>> > inserts are made in bulks it's going to require a lot of disk space to
>> > accomodate logs and wide bandwith for replication.
>>
>> Not sure what to suggest.  Either you should ignore this problem, or
>> you should fix it.

I am working on an environment similar to the above extension(pg_cryogen which experiments pluggable storage api's) but don't have much knowledge on pg's logical replication.. 
Please suggest some approaches to support pg's logical replication for a table with a custom access method, which writes generic wal record.

On Wed, 17 Aug 2022 at 19:04, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
>> On 2019-Oct-10, Ildar Musin wrote:
>>
>> > 1. Unlike FDW API, in pluggable storage API there are no routines like
>> > "begin modify table" and "end modify table" and there is no shared
>> > state between insert/update/delete calls.
>>
>> Hmm.  I think adding a begin/end to modifytable is a reasonable thing to
>> do (it'd be a no-op for heap and zheap I guess).
>
>I'm fairly strongly against that. Adding two additional "virtual"
>function calls for something that's rarely going to be used, seems like
>adding too much overhead to me.
>

That seems a bit strange to me. Sure - if there's an alternative way to
achieve the desired behavior (clear way to finalize writes etc.), then
cool, let's do that. But forcing people to use invonvenient workarounds
seems like a bad thing to me - having a convenient and clear API is
quite valueable, IMHO.

Let's see if this actually has a measuerable overhead first.

>
>> > 2. It looks like I cannot implement custom storage options. E.g. for
>> > compressed storage it makes sense to implement different compression
>> > methods (lz4, zstd etc.) and corresponding options (like compression
>> > level). But as i can see storage options (like fillfactor etc) are
>> > hardcoded and are not extensible. Possible solution is to use GUCs
>> > which would work but is not extremely convinient.
>>
>> Yeah, the reloptions module is undergoing some changes.  I expect that
>> there will be a way to extend reloptions from an extension, at the end
>> of that set of patches.
>
>Cool.
>

Yep.

>
>> > 3. A bit surprising limitation that in order to use bitmap scan the
>> > maximum number of tuples per page must not exceed 291 due to
>> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
>> > 8kb page size. In case of 1mb page this restriction feels really
>> > limiting.
>>
>> I suppose this is a hardcoded limit that needs to be fixed by patching
>> core as we make table AM more pervasive.
>
>That's not unproblematic - a dynamic limit would make a number of
>computations more expensive, and we already spend plenty CPU cycles
>building the tid bitmap. And we'd waste plenty of memory just having all
>that space for the worst case.  ISTM that we "just" need to replace the
>TID bitmap with some tree like structure.
>

I think the zedstore has roughly the same problem, and Heikki mentioned
some possible solutions to dealing with it in his pgconfeu talk (and it
was discussed in the zedstore thread, I think).

>
>> > 4. In order to use WAL-logging each page must start with a standard 24
>> > byte PageHeaderData even if it is needless for storage itself. Not a
>> > big deal though. Another (acutally documented) WAL-related limitation
>> > is that only generic WAL can be used within extension. So unless
>> > inserts are made in bulks it's going to require a lot of disk space to
>> > accomodate logs and wide bandwith for replication.
>>
>> Not sure what to suggest.  Either you should ignore this problem, or
>> you should fix it.
>
>I think if it becomes a problem you should ask for an rmgr ID to use for
>your extension, which we encode and then then allow to set the relevant
>rmgr callbacks for that rmgr id at startup.  But you should obviously
>first develop the WAL logging etc, and make sure it's beneficial over
>generic wal logging for your case.
>

AFAIK compressed/columnar engines generally implement two types of
storage - write-optimized store (WOS) and read-optimized store (ROS),
where the WOS is mostly just an uncompressed append-only buffer, and ROS
is compressed etc. ISTM the WOS would benefit from a more elaborate WAL
logging, but ROS should be mostly fine with the generic WAL logging.

But yeah, we should test and measure how beneficial that actually is.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: shared-memory based stats collector - v70