Re: MVCC catalog access - Mailing list pgsql-hackers

From Andres Freund
Subject Re: MVCC catalog access
Date
Msg-id 20130702135223.GB19837@alap2.anarazel.de
Whole thread Raw
In response to Re: MVCC catalog access  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: MVCC catalog access
List pgsql-hackers
On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
> On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Some things that might be worth changing when committing:
> > * Could you add a Assert(!RelationHasSysCache(relid)) to
> >   RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
> >   missed by the next person adding a syscache and that seems like it
> >   could have ugly and hard to detect consequences.
> 
> There's a cross-check in InitCatalogCache() for that very issue.

Great.

> > * maybe use bsearch(3) instead of open coding the binary search? We
> >   already use it in the backend.
> 
> I found comments elsewhere indicating that bsearch() was slower than
> open-coding it, so I copied the logic used for ScanKeywordLookup().

Hm. Ok.

> > * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
> >   GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
> >   consistency mechanisms in GetCatalogSnapshot() only work for those, so
> >   there doesn't seem to be a valid usecase for non-catalog relations.
> 
> It'll actually work find as things stand; it'll just take a new
> snapshot every time.

Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)

> - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
> default snapshot.  That seems like a crappy idea.  I propose that we
> either set that pointer to NULL and let the server core dump if the
> snapshot doesn't get set or (maybe better) add a new special snapshot
> called SnapshotError which just errors out if you try to use it for
> anything, and initialize to that.

I vote for SnapshotError.

> - I'm not quite sure what to do about get_actual_variable_range().
> Taking a new MVCC snapshot there seems like it might be pricey on some
> workloads.  However, I wonder if we could use SnapshotDirty.
> Presumably, even uncommitted tuples affect the amount of
> index-scanning we have to do, so that approach seems to have some
> theoretical justification.  But I'm worried there could be unfortunate
> consequences to looking at uncommitted data, either now or in the
> future.  SnapshotSelf seems less likely to have that problem, but
> feels wrong somehow.

I don't like using SnapshotDirty either. Can't we just use the currently
active snapshot? Unless I miss something this always will be called
while we have one since when we plan we've done an explicit
PushActiveSnapshot() and if we need to replan stuff during execution
PortalRunSelect() will have pushed one.

> - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
> argument to heap_get_latest_tid().  I don't know what these functions
> are supposed to be good for, but taking a new snapshot for every
> function call seems to guarantee that someone will find a way to use
> these functions as a poster child for how to brutalize PGXACT, so I
> don't particularly want to do that.

Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Custom gucs visibility
Next
From: Robert Haas
Date:
Subject: Re: New regression test time