Re: MVCC catalog access - Mailing list pgsql-hackers

From Robert Haas
Subject Re: MVCC catalog access
Date
Msg-id CA+Tgmoaug2vg569gYhJd7Q7bb8NEL=SB51V=bGak2p5gp+mR0A@mail.gmail.com
Whole thread Raw
In response to Re: MVCC catalog access  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: MVCC catalog access
List pgsql-hackers
On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
>> >   GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
>> >   consistency mechanisms in GetCatalogSnapshot() only work for those, so
>> >   there doesn't seem to be a valid usecase for non-catalog relations.
>>
>> It'll actually work find as things stand; it'll just take a new
>> snapshot every time.
>
> Ok. Doesn't really change my opinion that it's a crappy idea to use it
> otherwise ;)

I agree, but I don't see an easy way to write the assertion you want
using only the OID.

>> - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
>> default snapshot.  That seems like a crappy idea.  I propose that we
>> either set that pointer to NULL and let the server core dump if the
>> snapshot doesn't get set or (maybe better) add a new special snapshot
>> called SnapshotError which just errors out if you try to use it for
>> anything, and initialize to that.
>
> I vote for SnapshotError.

OK.

>> - I'm not quite sure what to do about get_actual_variable_range().
>> Taking a new MVCC snapshot there seems like it might be pricey on some
>> workloads.  However, I wonder if we could use SnapshotDirty.
>> Presumably, even uncommitted tuples affect the amount of
>> index-scanning we have to do, so that approach seems to have some
>> theoretical justification.  But I'm worried there could be unfortunate
>> consequences to looking at uncommitted data, either now or in the
>> future.  SnapshotSelf seems less likely to have that problem, but
>> feels wrong somehow.
>
> I don't like using SnapshotDirty either. Can't we just use the currently
> active snapshot? Unless I miss something this always will be called
> while we have one since when we plan we've done an explicit
> PushActiveSnapshot() and if we need to replan stuff during execution
> PortalRunSelect() will have pushed one.

We could certainly do that, but I'd be a bit reluctant to do so
without input from Tom.  I imagine there might be cases where it could
cause a regression.

>> - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
>> argument to heap_get_latest_tid().  I don't know what these functions
>> are supposed to be good for, but taking a new snapshot for every
>> function call seems to guarantee that someone will find a way to use
>> these functions as a poster child for how to brutalize PGXACT, so I
>> don't particularly want to do that.
>
> Heikki mentioned that at some point they were added for the odbc
> driver. I am not particularly inclined to worry about taking too many
> snapshots here.

Well, if it uses it with any regularity I think it's a legitimate
concern.   We have plenty of customers who use ODBC and some of them
allow frightening numbers of concurrent server connections.  Now you
may say that's a bad idea, but so is 1000 backends doing
txid_current() in a loop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Large object + FREEZE?
Next
From: Andrew Dunstan
Date:
Subject: Re: New regression test time