Re: 9.3: load path to mitigate load penalty for checksums - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: 9.3: load path to mitigate load penalty for checksums
Date
Msg-id 1339395372.17552.19.camel@jdavis
Whole thread Raw
In response to Re: 9.3: load path to mitigate load penalty for checksums  (Noah Misch <noah@leadboat.com>)
Responses Re: 9.3: load path to mitigate load penalty for checksums
List pgsql-hackers
On Wed, 2012-06-06 at 22:16 -0400, Noah Misch wrote:
> Note that, currently, only VACUUM sets PD_ALL_VISIBLE and visibility map bits.
> Would you make something else like heap_multi_insert() be able to do so?

That was the plan (roughly). I was thinking about doing it at the time a
new page was allocated.

> Avoiding measurable overhead in tuple visibility checks when the feature is
> inactive may well prove to be a key implementation challenge.

Perhaps a rudimentary CLOG cache, or some other way to mitigate CLOG
access could make it bearable.

Although I would like it to be an online operation, I'm not quite as
concerned about reads. I'd like to mitigate any major penalty, but if
reads are expensive during a load, than so be it.

> > Then, it would remember the current xid
> > as max_loader_xid, and wait until the global xmin is greater than
> > max_loader_xid. This should ensure that all snapshots regard all loading
> > transactions as complete.
> 
> ... this might not be.  Each backend could decide, based on its own xmin,
> whether to ignore PD_ALL_VISIBLE in a given table.  In other words, your
> ignorehints flag could be an xmin set to InvalidTransactionId during stages 1
> and 2 and to the earliest safe xmin during stages 0 and 3.

That's a good idea. It might make it easier to implement, and removing a
step from finalization is certainly a big plus.

> >  * INITIATE and FINALIZE probably need to use PreventTransactionChain()
> > and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
> > for too long. Also, we want to keep people from using it in the same
> > transaction as the loading xact, because they might not realize that
> > they would get a concurrency of 1 that way (because of the
> > ShareUpdateExclusiveLock).
> 
> Yes.  You need to commit the transaction modifying pg_class so other backends
> can observe the change, at which point you can gather the list to wait on.
> 
> Consider splitting the INITIATE UI into two interfaces, one that transitions
> from state 0 to state 1 and another that expects state 1 and blocks until we
> reach state 2.  You then have no need for PreventTransactionChain(), and the
> interfaces could even be normal functions.  It's less clear how reasonably you
> could do this for the FINALIZE step, given its implicit VACUUM.  It could be
> achieved by having the user do the VACUUM and making the new interface merely
> throw an error if a VACUUM is still needed.  The trivial usage pattern might
> look like this:
> 
> SELECT pg_initiate_load('bigtbl');
> SELECT pg_wait_load('bigtbl'); -- not a great name
> COPY bigtbl FROM STDIN;
> SELECT pg_stop_load('bigtbl');
> VACUUM bigtbl;
> SELECT pg_finalize_load('bigtbl');
> 
> It's definitely less elegant, alas.  Perhaps offer the interface you've
> proposed and have it do the above under the hood.  That way, users with
> complex needs have the flexibility of the lower-level interfaces while those
> who can tolerate PreventTransactionChain() have simplicity.

I think that's a reasonable suggestion. I am going back and forth a
little on this one. It's got the benefit that you can see the internal
states more clearly, and it's easier to tell what's going on, and it's
better if we want to do more sophisticated testing.

The main drawback here is that it's exposing more to the user. I
imagined that we might want to push other kinds of optimizations into
the load path, and that might upset the interface you've described
above. Then again, we'll probably need the normal, load, and transition
states regardless, so maybe it's an empty concern.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: log_newpage header comment
Next
From: Amit Kapila
Date:
Subject: Re: Resource Owner reassign Locks