Re: Compressed pluggable storage experiments - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Compressed pluggable storage experiments |
Date | |
Msg-id | 20191019122323.syfhef6uilbfgkpg@development Whole thread Raw |
In response to | Re: Compressed pluggable storage experiments (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Compressed pluggable storage experiments
|
List | pgsql-hackers |
On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote: >Hi, > >On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote: >> On 2019-Oct-10, Ildar Musin wrote: >> >> > 1. Unlike FDW API, in pluggable storage API there are no routines like >> > "begin modify table" and "end modify table" and there is no shared >> > state between insert/update/delete calls. >> >> Hmm. I think adding a begin/end to modifytable is a reasonable thing to >> do (it'd be a no-op for heap and zheap I guess). > >I'm fairly strongly against that. Adding two additional "virtual" >function calls for something that's rarely going to be used, seems like >adding too much overhead to me. > That seems a bit strange to me. Sure - if there's an alternative way to achieve the desired behavior (clear way to finalize writes etc.), then cool, let's do that. But forcing people to use invonvenient workarounds seems like a bad thing to me - having a convenient and clear API is quite valueable, IMHO. Let's see if this actually has a measuerable overhead first. > >> > 2. It looks like I cannot implement custom storage options. E.g. for >> > compressed storage it makes sense to implement different compression >> > methods (lz4, zstd etc.) and corresponding options (like compression >> > level). But as i can see storage options (like fillfactor etc) are >> > hardcoded and are not extensible. Possible solution is to use GUCs >> > which would work but is not extremely convinient. >> >> Yeah, the reloptions module is undergoing some changes. I expect that >> there will be a way to extend reloptions from an extension, at the end >> of that set of patches. > >Cool. > Yep. > >> > 3. A bit surprising limitation that in order to use bitmap scan the >> > maximum number of tuples per page must not exceed 291 due to >> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on >> > 8kb page size. In case of 1mb page this restriction feels really >> > limiting. >> >> I suppose this is a hardcoded limit that needs to be fixed by patching >> core as we make table AM more pervasive. > >That's not unproblematic - a dynamic limit would make a number of >computations more expensive, and we already spend plenty CPU cycles >building the tid bitmap. And we'd waste plenty of memory just having all >that space for the worst case. ISTM that we "just" need to replace the >TID bitmap with some tree like structure. > I think the zedstore has roughly the same problem, and Heikki mentioned some possible solutions to dealing with it in his pgconfeu talk (and it was discussed in the zedstore thread, I think). > >> > 4. In order to use WAL-logging each page must start with a standard 24 >> > byte PageHeaderData even if it is needless for storage itself. Not a >> > big deal though. Another (acutally documented) WAL-related limitation >> > is that only generic WAL can be used within extension. So unless >> > inserts are made in bulks it's going to require a lot of disk space to >> > accomodate logs and wide bandwith for replication. >> >> Not sure what to suggest. Either you should ignore this problem, or >> you should fix it. > >I think if it becomes a problem you should ask for an rmgr ID to use for >your extension, which we encode and then then allow to set the relevant >rmgr callbacks for that rmgr id at startup. But you should obviously >first develop the WAL logging etc, and make sure it's beneficial over >generic wal logging for your case. > AFAIK compressed/columnar engines generally implement two types of storage - write-optimized store (WOS) and read-optimized store (ROS), where the WOS is mostly just an uncompressed append-only buffer, and ROS is compressed etc. ISTM the WOS would benefit from a more elaborate WAL logging, but ROS should be mostly fine with the generic WAL logging. But yeah, we should test and measure how beneficial that actually is. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: