Re: 9.3: load path to mitigate load penalty for checksums - Mailing list pgsql-hackers

From Noah Misch
Subject Re: 9.3: load path to mitigate load penalty for checksums
Date
Msg-id 20120607021620.GA22160@tornado.leadboat.com
Whole thread Raw
In response to 9.3: load path to mitigate load penalty for checksums  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: 9.3: load path to mitigate load penalty for checksums
List pgsql-hackers
On Mon, Jun 04, 2012 at 06:26:04PM -0700, Jeff Davis wrote:
> I propose a special user-initiated loading mode at the table
> granularity. During this time, readers must ignore PD_ALL_VISIBLE,
> HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers
> may set all of those bits before the writing transaction commits,
> obviating the need to rewrite (and WAL) the data again later. Ideally,
> there would be no work for VACUUM to do after the data load (unless a
> transaction aborted).

Note that, currently, only VACUUM sets PD_ALL_VISIBLE and visibility map bits.
Would you make something else like heap_multi_insert() be able to do so?

> Goals:
> =================================
> 
>   * Table granularity (doesn't affect other tables at all)
>   * Allows concurrent loaders
>   * Allows loading into existing tables with existing data
>   * Online (allow reads to proceed, even if degraded)

+1

> Obviously, readers and writers would need a mechanism to honor those
> flags, but I haven't dug into the details yet (additional routines in
> tqual.c?).

Avoiding measurable overhead in tuple visibility checks when the feature is
inactive may well prove to be a key implementation challenge.

> FINALIZE LOAD would first move from state 2 to state 3 by acquiring a
> ShareUpdateExclusiveLock on the table setting optimistichints = false.
> 
> Then, it would move from state 3 to state 0 by first waiting for all
> transactions that currently hold a lock on the table, to ensure they see
> the optimistichints=false flag.

This is certainly necessary, but ...

> Then, it would remember the current xid
> as max_loader_xid, and wait until the global xmin is greater than
> max_loader_xid. This should ensure that all snapshots regard all loading
> transactions as complete.

... this might not be.  Each backend could decide, based on its own xmin,
whether to ignore PD_ALL_VISIBLE in a given table.  In other words, your
ignorehints flag could be an xmin set to InvalidTransactionId during stages 1
and 2 and to the earliest safe xmin during stages 0 and 3.

>  * INITIATE and FINALIZE probably need to use PreventTransactionChain()
> and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
> for too long. Also, we want to keep people from using it in the same
> transaction as the loading xact, because they might not realize that
> they would get a concurrency of 1 that way (because of the
> ShareUpdateExclusiveLock).

Yes.  You need to commit the transaction modifying pg_class so other backends
can observe the change, at which point you can gather the list to wait on.

Consider splitting the INITIATE UI into two interfaces, one that transitions
from state 0 to state 1 and another that expects state 1 and blocks until we
reach state 2.  You then have no need for PreventTransactionChain(), and the
interfaces could even be normal functions.  It's less clear how reasonably you
could do this for the FINALIZE step, given its implicit VACUUM.  It could be
achieved by having the user do the VACUUM and making the new interface merely
throw an error if a VACUUM is still needed.  The trivial usage pattern might
look like this:

SELECT pg_initiate_load('bigtbl');
SELECT pg_wait_load('bigtbl'); -- not a great name
COPY bigtbl FROM STDIN;
SELECT pg_stop_load('bigtbl');
VACUUM bigtbl;
SELECT pg_finalize_load('bigtbl');

It's definitely less elegant, alas.  Perhaps offer the interface you've
proposed and have it do the above under the hood.  That way, users with
complex needs have the flexibility of the lower-level interfaces while those
who can tolerate PreventTransactionChain() have simplicity.

Thanks,
nm


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Time for pgindent run?
Next
From: Jim Nasby
Date:
Subject: Re: How could we make it simple to access the log as a table?