Re: Going for "all green" buildfarm results - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: Going for "all green" buildfarm results
Date
Msg-id 44CC67CD.6090604@kaltenbrunner.cc
Whole thread Raw
In response to Re: Going for "all green" buildfarm results  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>> Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>>> FWIW: lionfish had a weird make check error 3 weeks ago which I
>>>> (unsuccessfully) tried to reproduce multiple times after that:
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-05-12%2005:30:14
>>> Weird.
>>>
>>>   SELECT ''::text AS eleven, unique1, unique2, stringu1 
>>>                 FROM onek WHERE unique1 < 50 
>>>                 ORDER BY unique1 DESC LIMIT 20 OFFSET 39;
>>> ! ERROR:  could not open relation with OID 27035
>>>
>>> AFAICS, the only way to get that error in HEAD is if ScanPgRelation
>>> can't find a pg_class row with the mentioned OID.  Presumably 27035
>>> belongs to "onek" or one of its indexes.  The very next command also
>>> refers to "onek", and doesn't fail, so what we seem to have here is
>>> a transient lookup failure.  We've found a btree bug like that once
>>> before ... wonder if there's still one left?
>> FYI: lionfish just managed to hit that problem again:
>>
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-07-29%2023:30:06
> 
> The error message this time is
> 
> ! ERROR:  could not open relation with OID 27006

yeah and before it was:
! ERROR:  could not open relation with OID 27035

which looks quite related :-)

> 
> It's worth mentioning that the portals_p2 test, which happens in the
> parallel group previous to where this test is run, also accesses the
> onek table successfully.  It may be interesting to see exactly what
> relation is 27006.

sorry but i don't have access to the cluster in question any more
(lionfish is quite resource starved and I only enabled to keep failed
builds on -HEAD after the last incident ...)

> 
> The test alter_table, which is on the same parallel group as limit (the
> failing test), contains these lines:
> 
> ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1;
> ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1;

hmm interesting - lionfish is a slow box(250Mhz MIPS) and particulary
low on memory(48MB+140MB swap) so it is quite likely that the parallel
regress tests are driving it into swap - maybe some sort of subtile
timing issue ?


Stefan


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [PATCHES] New variable server_version_num
Next
From: Zoltan Boszormenyi
Date:
Subject: Re: Three weeks left until feature freeze