Re: queries on xmin - Mailing list pgsql-general

From Greg Stark
Subject Re: queries on xmin
Date
Msg-id 4136ffa0906110410i471a238fp68748af73711d9d1@mail.gmail.com
Whole thread Raw
In response to queries on xmin  (Matt Amos <zerebubuth@gmail.com>)
Responses Re: queries on xmin
List pgsql-general
On Thu, Jun 11, 2009 at 11:25 AM, Matt Amos<zerebubuth@gmail.com> wrote:
>
> what problems are we going to create for ourselves if we create a
> btree index on xmin casted to int4? would it be as efficient to use a
> hash index, create a temporary table of txids that we're querying with
> a hash index and do an explicit join? have i missed the point
> entirely?

Wow, I'm quite shocked that we don't already detect attempts to create
an index on xmin or xmax. There's no way that'll work properly since
those fields can change spontaneously when, for example vacuum runs or
for xmax when things like UPDATE or SELET FOR SHARE or SELECT FOR
UPDATE are used.

Incidentally the reason there's no btree opclass is because xids don't
monotonically increase. They wrap around. So periodically you would
lose updates or see them repeatedly whenever the xid wrapped around
and the old transactions appear to be in the future.

If you never run updates and are never interested in tuples that are
old enough to be frozen then perhaps you could mostly get away with
it. But I really don't think it's a good idea.

Much better would be to store a user-space column with somethin like
txid or a timestamp and use that directly. That way you have control
over the behaviour of the column.

Another option to consider would be including a boolean column
"dumped" defaulted to false. Then you could have a partial index on
the primary key or date "WHERE NOT dumped". Then when you dump you can
"SELECT FOR UPDATE * WHERE NOT dumped" and then when you're done
"UPDATE SET dumped = 't' ". Alternately you could use "UPDATE SET
dumped='t' WHERE NOT dumped RETURNING *" which is basically
equivalent.

That would create twice as much traffic in the table which would make
vacuums much more important. But it would mean you could quickly acces
undumped records using the index and know that your process doesn't
depend on a following a strict clock schedule.


--
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

pgsql-general by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Different handling of PL/pgSQL for-loop variables in 8.3.7 vs. 8.2.7 ???
Next
From: Marko Kreen
Date:
Subject: Re: queries on xmin