Re: Going for "all green" buildfarm results - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Going for "all green" buildfarm results
Date
Msg-id 25567.1154274284@sss.pgh.pa.us
Whole thread Raw
In response to Re: Going for "all green" buildfarm results  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Going for "all green" buildfarm results  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Stefan Kaltenbrunner wrote:
>> FYI: lionfish just managed to hit that problem again:
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-07-29%2023:30:06

> The test alter_table, which is on the same parallel group as limit (the
> failing test), contains these lines:
> ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1;
> ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1;

I bet Alvaro's spotted the problem.  ALTER INDEX RENAME doesn't seem to
take any lock on the index's parent table, only on the index itself.
That means that a query on "onek" could be trying to read the pg_class
entries for onek's indexes concurrently with someone trying to commit
a pg_class update to rename an index.  If the query manages to visit
the new and old versions of the row in that order, and the commit
happens between, *neither* of the versions would look valid.  MVCC
doesn't save us because this is all SnapshotNow.

Not sure what to do about this.  Trying to lock the parent table could
easily be a cure-worse-than-the-disease, because it would create
deadlock risks (we've already locked the index before we could look up
and lock the parent).  Thoughts?

The path of least resistance might just be to not run these tests in
parallel.  The chance of this issue causing problems in the real world
seems small.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCHES] New variable server_version_num
Next
From: David Fetter
Date:
Subject: Re: [PATCHES] New variable server_version_num