Re: order of nested loop - Mailing list pgsql-general

From Jim C. Nasby
Subject Re: order of nested loop
Date
Msg-id 20030617235337.X66185@flake.decibel.org
Whole thread Raw
In response to Re: order of nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: order of nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Tue, Jun 17, 2003 at 07:29:16PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > On a related note; since temporary tables are only visible to a single
> > connection, do they have full MVCC info in them, or can it be bypassed?
>
> You can't really bypass it.  You still need the transaction number so
> you can tell if a tuple was created by a committed, aborted, or the
> current transaction, and you still need the command number for the same
> reasons as usual.

You do need that info, but I think a local-only temp. table means you
don't have to use XID (and maybe CID, I don't really know how that's
used) to store the info. This is because only a single transaction can
reference the table. As part of committing an update, you can either set
a simple flag in the old tuples, or you might even be able to just
delete them (though nested transactions would probably break that). You
would have to keep a list of what tuples are old and new, but that's
probably more efficient than 16/20 bytes per tuple.

OTOH, I've also been wondering if MVCC info could be stored more
efficiently across the board. One idea is to keep a list of valid MVCC
states, and assign int (or maybe even smaller) ID's to every entry in
that list. Each tuple would only have that entry number then. This would
probably not be good for tables that have a lot of single-row
transactions (though there's another advantage that might offset the
overhead), but for tables that only see fairly large transactions it
could be a tremendous gain.

The other advantage to normalizing the MVCC info is it should then be
reasonable to store it in indexes as well as base tuples. This could
dramatically speed up index work, since you don't have to read the base
tuple to see if it's still valid. It would also allow things like index
covering, and count(*) to do only an index scan.

Another alternative to an MVCC ID would be to store the MVCC info
outside the mainline storage with a reference back to the base tuple,
and use some form of RLE. IE:

xmin, cmin, xmax, cmax, xvac, min_tid, max_tid

where min_tid and max_tid define the range of tuples that MVCC info
applies to. I'm sure there's plenty of other ways MVCC info could be
stored without using 16/20 bytes per tuple.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-general by date:

Previous
From: culley harrelson
Date:
Subject: Re: plpython? (Was: Re: Damn triggers and NEW)
Next
From: Tom Lane
Date:
Subject: Re: order of nested loop