Re: Compressed pluggable storage experiments - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Compressed pluggable storage experiments
Date
Msg-id 20191018102505.b67rcudveao7fwyd@alap3.anarazel.de
Whole thread Raw
In response to Re: Compressed pluggable storage experiments  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Compressed pluggable storage experiments
List pgsql-hackers
Hi,

On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
> On 2019-Oct-10, Ildar Musin wrote:
> 
> > 1. Unlike FDW API, in pluggable storage API there are no routines like
> > "begin modify table" and "end modify table" and there is no shared
> > state between insert/update/delete calls.
> 
> Hmm.  I think adding a begin/end to modifytable is a reasonable thing to
> do (it'd be a no-op for heap and zheap I guess).

I'm fairly strongly against that. Adding two additional "virtual"
function calls for something that's rarely going to be used, seems like
adding too much overhead to me.


> > 2. It looks like I cannot implement custom storage options. E.g. for
> > compressed storage it makes sense to implement different compression
> > methods (lz4, zstd etc.) and corresponding options (like compression
> > level). But as i can see storage options (like fillfactor etc) are
> > hardcoded and are not extensible. Possible solution is to use GUCs
> > which would work but is not extremely convinient.
> 
> Yeah, the reloptions module is undergoing some changes.  I expect that
> there will be a way to extend reloptions from an extension, at the end
> of that set of patches.

Cool.


> > 3. A bit surprising limitation that in order to use bitmap scan the
> > maximum number of tuples per page must not exceed 291 due to
> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
> > 8kb page size. In case of 1mb page this restriction feels really
> > limiting.
> 
> I suppose this is a hardcoded limit that needs to be fixed by patching
> core as we make table AM more pervasive.

That's not unproblematic - a dynamic limit would make a number of
computations more expensive, and we already spend plenty CPU cycles
building the tid bitmap. And we'd waste plenty of memory just having all
that space for the worst case.  ISTM that we "just" need to replace the
TID bitmap with some tree like structure.


> > 4. In order to use WAL-logging each page must start with a standard 24
> > byte PageHeaderData even if it is needless for storage itself. Not a
> > big deal though. Another (acutally documented) WAL-related limitation
> > is that only generic WAL can be used within extension. So unless
> > inserts are made in bulks it's going to require a lot of disk space to
> > accomodate logs and wide bandwith for replication.
> 
> Not sure what to suggest.  Either you should ignore this problem, or
> you should fix it.

I think if it becomes a problem you should ask for an rmgr ID to use for
your extension, which we encode and then then allow to set the relevant
rmgr callbacks for that rmgr id at startup.  But you should obviously
first develop the WAL logging etc, and make sure it's beneficial over
generic wal logging for your case.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] Race condition in logical walsender causes longpostgresql shutdown delay
Next
From: Alvaro Herrera
Date:
Subject: Re: v12.0: segfault in reindex CONCURRENTLY