Thread: cvs head initdb hangs on unixware

cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
Hi all,

cvs head configured without --enable-debug hang in initdb while making 
check.

warthog doesn't exhibit it because it's configured with debug.

when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for 
it while creating template db.

According to truss, the last usefull thing postmaster does is writing 8K 
zeroes to disk.

If someone needs an access to a unixware machine, let me know.

regards,

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
Zdenek Kotala
Date:
Could you generate a core and send a stacktrace?

kill SIGABRT <pid> should do that.
Zdenek

ohp@pyrenet.fr napsal(a):
> Hi all,
> 
> cvs head configured without --enable-debug hang in initdb while making 
> check.
> 
> warthog doesn't exhibit it because it's configured with debug.
> 
> when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for 
> it while creating template db.
> 
> According to truss, the last usefull thing postmaster does is writing 8K 
> zeroes to disk.
> 
> If someone needs an access to a unixware machine, let me know.
> 
> regards,
> 



Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Tue, 2 Dec 2008, Zdenek Kotala wrote:

> Date: Tue, 02 Dec 2008 17:22:25 +0100
> From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> Could you generate a core and send a stacktrace?
>
> kill SIGABRT <pid> should do that.
>
>     Zdenek
Hmm. No point doing it, it's not debug enabled,  I'm afraid stack trace 
won't show us anything usefull.
>
> ohp@pyrenet.fr napsal(a):
>> Hi all,
>> 
>> cvs head configured without --enable-debug hang in initdb while making 
>> check.
>> 
>> warthog doesn't exhibit it because it's configured with debug.
>> 
>> when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it 
>> while creating template db.
>> 
>> According to truss, the last usefull thing postmaster does is writing 8K 
>> zeroes to disk.
>> 
>> If someone needs an access to a unixware machine, let me know.
>> 
>> regards,
>> 
>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Tue, 2 Dec 2008, Zdenek Kotala wrote:

> Date: Tue, 02 Dec 2008 17:22:25 +0100
> From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> Could you generate a core and send a stacktrace?
>
> kill SIGABRT <pid> should do that.
>
>     Zdenek
Zdenek,

On second thought,  I tried and got that:
Suivi de pile correspondant à p1, Programme postmaster
*[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) 
[0x81e6a97] [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9] [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e,
0x5,0x6, 0x2e, 0x8047416, 
 
0xb4) [0x81e6385] [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) 
[0x81e5a00] [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) 
[0x8099b59] [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042] [6] simple_heap_insert( présumé: 0x84b2250,
0x853a338,0x853a310) 
 
[0x8097297] [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) 
[0x80cb210] [8] boot_yyparse( présumé: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b] [9] BootstrapModeMain( présumé: 0x66,
0x8454600,0x4)  [0x80ca233] [10] AuxiliaryProcessMain(0x4, 0x8047ab4)      [0x80cab3b] [11] main(0x4, 0x8047ab4,
0x8047ac8)  [0x8177dce] [12] _start()  [0x807ff96]
 

seems interesting!

We've had problems already with unixware optimizer, hope this one is 
fixable!

regards
>
> ohp@pyrenet.fr napsal(a):
>> Hi all,
>> 
>> cvs head configured without --enable-debug hang in initdb while making 
>> check.
>> 
>> warthog doesn't exhibit it because it's configured with debug.
>> 
>> when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it 
>> while creating template db.
>> 
>> According to truss, the last usefull thing postmaster does is writing 8K 
>> zeroes to disk.
>> 
>> If someone needs an access to a unixware machine, let me know.
>> 
>> regards,
>> 
>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
ohp@pyrenet.fr wrote:
> Suivi de pile correspondant à p1, Programme postmaster
> *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
>  [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
>  [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 
> 0x8047416, 0xb4) [0x81e6385]
>  [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
>  [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
>  [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
>  [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
> [0x8097297]
>  [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
>  [8] boot_yyparse( présumé: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
>  [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
>  [10] AuxiliaryProcessMain(0x4, 0x8047ab4)      [0x80cab3b]
>  [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
>  [12] _start()  [0x807ff96]
> 
> seems interesting!
> 
> We've had problems already with unixware optimizer, hope this one is 
> fixable!

Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" 
as an unsigned integer? That would cause an infinite loop.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Tue, 2 Dec 2008, Heikki Linnakangas wrote:

> Date: Tue, 02 Dec 2008 20:47:19 +0200
> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
> To: ohp@pyrenet.fr
> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> ohp@pyrenet.fr wrote:
>> Suivi de pile correspondant à p1, Programme postmaster
>> *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
>>  [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
>>  [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 
>> 0xb4) [0x81e6385]
>>  [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
>>  [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
>>  [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
>>  [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
>> [0x8097297]
>>  [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
>>  [8] boot_yyparse( présumé: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
>>  [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
>>  [10] AuxiliaryProcessMain(0x4, 0x8047ab4)      [0x80cab3b]
>>  [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
>>  [12] _start()  [0x807ff96]
>> 
>> seems interesting!
>> 
>> We've had problems already with unixware optimizer, hope this one is 
>> fixable!
>
> Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as an 
> unsigned integer? That would cause an infinite loop.
>
>
No, a simple printf of nodeno shows it  starting at 4096 all the way down 
to 0, starting back at 4096...

I wonder if leftchild/rightchild definitions has something to do with 
it...

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: cvs head initdb hangs on unixware

From
Andrew Dunstan
Date:

ohp@pyrenet.fr wrote:
>>
>> Looking at fsm_rebuild_page, I wonder if the compiler is treating 
>> "int" as an unsigned integer? That would cause an infinite loop.
>>
>>
> No, a simple printf of nodeno shows it  starting at 4096 all the way 
> down to 0, starting back at 4096...
>
> I wonder if leftchild/rightchild definitions has something to do with 
> it...

With probably no relevance at all, I notice that this routine is 
declared extern, although it is only referenced in its own file 
apparently. Don't we have a tool that checks that?

cheers

andrew


Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
ohp@pyrenet.fr wrote:
> On Tue, 2 Dec 2008, Heikki Linnakangas wrote:
> 
>> Date: Tue, 02 Dec 2008 20:47:19 +0200
>> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
>> To: ohp@pyrenet.fr
>> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>>     pgsql-hackers list <pgsql-hackers@postgresql.org>
>> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
>>
>> ohp@pyrenet.fr wrote:
>>> Suivi de pile correspondant à p1, Programme postmaster
>>> *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
>>>  [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
>>>  [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 
>>> 0x8047416, 0xb4) [0x81e6385]
>>>  [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) 
>>> [0x81e5a00]
>>>  [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
>>>  [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
>>>  [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
>>> [0x8097297]
>>>  [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
>>>  [8] boot_yyparse( présumé: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
>>>  [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
>>>  [10] AuxiliaryProcessMain(0x4, 0x8047ab4)      [0x80cab3b]
>>>  [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
>>>  [12] _start()  [0x807ff96]
>>>
>>> seems interesting!
>>>
>>> We've had problems already with unixware optimizer, hope this one is 
>>> fixable!
>>
>> Looking at fsm_rebuild_page, I wonder if the compiler is treating 
>> "int" as an unsigned integer? That would cause an infinite loop.
>>
> No, a simple printf of nodeno shows it  starting at 4096 all the way 
> down to 0, starting back at 4096...

Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster, 
there shouldn't be any broken FSM pages that need rebuilding.

I'd like to see what the FSM page in question looks like. Could you try 
to run initdb with "-d -n" options? I bet you'll get an infinite number 
of lines like:

DEBUG: fixing corrupt FSM block 1, relation 123/456/789

Could you zip up the FSM file of that relation  (a file called e.g 
"789_fsm"), and send it over? Or the whole data directory, it shouldn't 
be that big.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> 
> 
> ohp@pyrenet.fr wrote:
> >>
> >> Looking at fsm_rebuild_page, I wonder if the compiler is treating 
> >> "int" as an unsigned integer? That would cause an infinite loop.
> >>
> >>
> > No, a simple printf of nodeno shows it  starting at 4096 all the way 
> > down to 0, starting back at 4096...
> >
> > I wonder if leftchild/rightchild definitions has something to do with 
> > it...
> 
> With probably no relevance at all, I notice that this routine is 
> declared extern, although it is only referenced in its own file 
> apparently. Don't we have a tool that checks that?

Sure, src/tools/find_static.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Wed, 3 Dec 2008, Heikki Linnakangas wrote:

> Date: Wed, 03 Dec 2008 20:29:01 +0200
> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
> To: ohp@pyrenet.fr
> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> ohp@pyrenet.fr wrote:
>> On Tue, 2 Dec 2008, Heikki Linnakangas wrote:
>> 
>>> Date: Tue, 02 Dec 2008 20:47:19 +0200
>>> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
>>> To: ohp@pyrenet.fr
>>> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>>>     pgsql-hackers list <pgsql-hackers@postgresql.org>
>>> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
>>> 
>>> ohp@pyrenet.fr wrote:
>>>> Suivi de pile correspondant à p1, Programme postmaster
>>>> *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
>>>>  [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
>>>>  [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 
>>>> 0xb4) [0x81e6385]
>>>>  [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) 
>>>> [0x81e5a00]
>>>>  [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
>>>>  [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
>>>>  [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
>>>> [0x8097297]
>>>>  [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
>>>>  [8] boot_yyparse( présumé: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
>>>>  [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
>>>>  [10] AuxiliaryProcessMain(0x4, 0x8047ab4)      [0x80cab3b]
>>>>  [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
>>>>  [12] _start()  [0x807ff96]
>>>> 
>>>> seems interesting!
>>>> 
>>>> We've had problems already with unixware optimizer, hope this one is 
>>>> fixable!
>>> 
>>> Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as 
>>> an unsigned integer? That would cause an infinite loop.
>>> 
>> No, a simple printf of nodeno shows it  starting at 4096 all the way down 
>> to 0, starting back at 4096...
>
> Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster, 
> there shouldn't be any broken FSM pages that need rebuilding.
You're right!
>
> I'd like to see what the FSM page in question looks like. Could you try to 
> run initdb with "-d -n" options? I bet you'll get an infinite number of lines 
> like:
>
> DEBUG: fixing corrupt FSM block 1, relation 123/456/789
>
right again!
DEBUG:  fixing corrupt FSM block 2, relation 1663/1/1255

> Could you zip up the FSM file of that relation  (a file called e.g 
> "789_fsm"), and send it over? Or the whole data directory, it shouldn't be 
> that big.
>
you get both.
BTW, this is an optimizer problem, not anything wrong with the code, but 
I'd hate to have a -g compiled postmaster in prod :)
>

best regards,
-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
ohp@pyrenet.fr wrote:
> On Wed, 3 Dec 2008, Heikki Linnakangas wrote:
>> Could you zip up the FSM file of that relation  (a file called e.g 
>> "789_fsm"), and send it over? Or the whole data directory, it 
>> shouldn't be that big.
>>
> you get both.

Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a 
just-created relation. fsm_search_avail should've returned quickly at 
the top of the function in that case. Can you put a extra printf or 
something at the top of the function, to print all the arguments? And 
the value of fsmpage->fp_nodes[0].

> BTW, this is an optimizer problem, not anything wrong with the code, but 
> I'd hate to have a -g compiled postmaster in prod :)

Yes, so it seems, although I wouldn't be surprised if it turns out to be 
a bug in the new FSM code either..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Thu, 4 Dec 2008, Heikki Linnakangas wrote:

> Date: Thu, 04 Dec 2008 13:19:15 +0200
> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
> To: ohp@pyrenet.fr
> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> ohp@pyrenet.fr wrote:
>> On Wed, 3 Dec 2008, Heikki Linnakangas wrote:
>>> Could you zip up the FSM file of that relation  (a file called e.g 
>>> "789_fsm"), and send it over? Or the whole data directory, it shouldn't be 
>>> that big.
>>> 
>> you get both.
>
> Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a 
> just-created relation. fsm_search_avail should've returned quickly at the top 
> of the function in that case. Can you put a extra printf or something at the 
> top of the function, to print all the arguments? And the value of 
> fsmpage->fp_nodes[0].
>
>> BTW, this is an optimizer problem, not anything wrong with the code, but 
>> I'd hate to have a -g compiled postmaster in prod :)
>
> Yes, so it seems, although I wouldn't be surprised if it turns out to be a 
> bug in the new FSM code either..
As you can see in attached initdb.log, it seems fsm_search_avail is called 
repeatedly and args are sort of looping...


>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> As you can see in attached initdb.log, it seems fsm_search_avail is called 
> repeatedly and args are sort of looping...

That's expected, since the system is inserting a lot of tuples
successively.  What it looks like to me is that the failing call is the
first one where the initial test *doesn't* result in falling out
immediately.  So the probability is that there's something wrong with
the code that descends the tree.

Note that the all-zeroes pages in your dump are uninformative because
none of the real FSM data has been written to disk yet.  We can see
from this trace that the code is dealing with not-all-zero pages.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> ohp@pyrenet.fr writes:
>> As you can see in attached initdb.log, it seems fsm_search_avail is called
>> repeatedly and args are sort of looping...
> 
> That's expected, since the system is inserting a lot of tuples
> successively. 

Right. I suspect it was in the infinite loop yet. Try to run it for 
*much* longer (it'll probably take much longer than usual because it's 
printing all the debug stuff), until it gets stuck looping over the same 
pages in same relation.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
Dear all,
On Mon, 8 Dec 2008, Heikki Linnakangas wrote:

> Date: Mon, 08 Dec 2008 09:17:52 +0200
> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: ohp@pyrenet.fr, Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> Tom Lane wrote:
>> ohp@pyrenet.fr writes:
>>> As you can see in attached initdb.log, it seems fsm_search_avail is called
>>> repeatedly and args are sort of looping...
>> 
>> That's expected, since the system is inserting a lot of tuples
>> successively. 
>
> Right. I suspect it was in the infinite loop yet. Try to run it for *much* 
> longer (it'll probably take much longer than usual because it's printing all 
> the debug stuff), until it gets stuck looping over the same pages in same 
> relation.
>
the infinite loop occurs in fsm_search_avail when called for the 32nd 
time.

It loops between restart: and goto restart

the long (95M) initdb.log can be found at 
ftp://ftp.pyrenet.fr/private/initdb.log
>

regards,

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> the infinite loop occurs in fsm_search_avail when called for the 32nd 
> time.

... which is the first time that the initial test doesn't make it fall
out immediately.

Would you add a couple more printouts, along the line of

nodeno = target;while (nodeno > 0){
+        fprintf(stderr, "ascend at node %d value %d\n",
+            nodeno, fsmpage->fp_nodes[nodeno]);
    if (fsmpage->fp_nodes[nodeno] >= minvalue)        break;
    /*     * Move to the right, wrapping around on same level if necessary,     * then climb up.     */    nodeno =
parentof(rightneighbor(nodeno));}
/* * We're now at a node with enough free space, somewhere in the middle of * the tree. Descend to the bottom,
followinga path with enough free * space, preferring to move left if there's a choice. */while (nodeno <
NonLeafNodesPerPage){   int leftnodeno = leftchild(nodeno);    int rightnodeno = leftnodeno + 1;    bool leftok =
(leftnodeno< NodesPerPage) &&        (fsmpage->fp_nodes[leftnodeno] >= minvalue);    bool rightok = (rightnodeno <
NodesPerPage)&&        (fsmpage->fp_nodes[rightnodeno] >= minvalue);
 

+        fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n",
+            nodeno, fsmpage->fp_nodes[nodeno],
+            leftnodeno, fsmpage->fp_nodes[leftnodeno],
+            rightnodeno, fsmpage->fp_nodes[rightnodeno]);
    if (leftok)        nodeno = leftnodeno;    else if (rightok)        nodeno = rightnodeno;    else

(I'm assuming we can print possibly-off-the-end array elements without dumping
core; which is bogus in general but I expect we can get away with it
for this purpose.)

Also, we don't really need 94MB of log to convince us it's an
infinite loop ;-)
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
Hi Tom,
On Mon, 8 Dec 2008, Tom Lane wrote:

> Date: Mon, 08 Dec 2008 13:15:28 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
>     Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware 
> 
> ohp@pyrenet.fr writes:
>> the infinite loop occurs in fsm_search_avail when called for the 32nd
>> time.
>
> ... which is the first time that the initial test doesn't make it fall
> out immediately.
>
> Would you add a couple more printouts, along the line of
>
>
>     nodeno = target;
>     while (nodeno > 0)
>     {
> +        fprintf(stderr, "ascend at node %d value %d\n",
> +            nodeno, fsmpage->fp_nodes[nodeno]);
>
>         if (fsmpage->fp_nodes[nodeno] >= minvalue)
>             break;
>
>         /*
>          * Move to the right, wrapping around on same level if necessary,
>          * then climb up.
>          */
>         nodeno = parentof(rightneighbor(nodeno));
>     }
>
>     /*
>      * We're now at a node with enough free space, somewhere in the middle of
>      * the tree. Descend to the bottom, following a path with enough free
>      * space, preferring to move left if there's a choice.
>      */
>     while (nodeno < NonLeafNodesPerPage)
>     {
>         int leftnodeno = leftchild(nodeno);
>         int rightnodeno = leftnodeno + 1;
>         bool leftok = (leftnodeno < NodesPerPage) &&
>             (fsmpage->fp_nodes[leftnodeno] >= minvalue);
>         bool rightok = (rightnodeno < NodesPerPage) &&
>             (fsmpage->fp_nodes[rightnodeno] >= minvalue);
>
> +        fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n",
> +            nodeno, fsmpage->fp_nodes[nodeno],
> +            leftnodeno, fsmpage->fp_nodes[leftnodeno],
> +            rightnodeno, fsmpage->fp_nodes[rightnodeno]);
>
>         if (leftok)
>             nodeno = leftnodeno;
>         else if (rightok)
>             nodeno = rightnodeno;
>         else
>
> (I'm assuming we can print possibly-off-the-end array elements without dumping
> core; which is bogus in general but I expect we can get away with it
> for this purpose.)
>
> Also, we don't really need 94MB of log to convince us it's an
> infinite loop ;-)
oops, sorry
>
>             regards, tom lane
>
I first misread your mail, and added only the first fprintf , while I was 
uploading a 400M initdb.log, I went back to add the second one.

Guess what! with the fprintf .. descending node... in place, everything 
goes well. The optimizer definitly does something weird along the 
definition/assignement of leftok/rightok..

  -- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
Zdenek Kotala
Date:
ohp@pyrenet.fr napsal(a):

>>
> I first misread your mail, and added only the first fprintf , while I 
> was uploading a 400M initdb.log, I went back to add the second one.
> 
> Guess what! with the fprintf .. descending node... in place, everything 
> goes well. The optimizer definitly does something weird along the 
> definition/assignement of leftok/rightok..
> 

Could you generate assembler code with and without optimization of fsmSearch 
function? Of course without extra printf :-). It should show difference.
    Zdenek




Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> Guess what! with the fprintf .. descending node... in place, everything 
> goes well. The optimizer definitly does something weird along the 
> definition/assignement of leftok/rightok..

Hmm, so the problem is in that second loop.  The trick is to pick some
reasonably non-ugly code change that makes the problem go away.

The first thing I'd try is to get rid of the overly cute optimization
int rightnodeno = leftnodeno + 1;

and make it just read
int rightnodeno = rightchild(nodeno);

If that doesn't work, we might try refactoring the code enough to get
rid of the goto, but that looks a little bit tedious.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Tue, 9 Dec 2008, Tom Lane wrote:

> Date: Tue, 09 Dec 2008 09:23:06 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
>     Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware 
> 
> ohp@pyrenet.fr writes:
>> Guess what! with the fprintf .. descending node... in place, everything
>> goes well. The optimizer definitly does something weird along the
>> definition/assignement of leftok/rightok..
>
> Hmm, so the problem is in that second loop.  The trick is to pick some
> reasonably non-ugly code change that makes the problem go away.
>
> The first thing I'd try is to get rid of the overly cute optimization
>
>     int rightnodeno = leftnodeno + 1;
>
> and make it just read
>
>     int rightnodeno = rightchild(nodeno);
>
> If that doesn't work, we might try refactoring the code enough to get
> rid of the goto, but that looks a little bit tedious.
>
>             regards, tom lane
>  I tried that and moving leftok,rightok declaration outside the loop, and 
refactor the assignement code of leftok, rightok . nothing worked!

Regards,
-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
Kenneth Marshall
Date:
Would it be reasonable to turn of optimization for this file?

Ken

On Tue, Dec 09, 2008 at 05:47:47PM +0100, ohp@pyrenet.fr wrote:
> On Tue, 9 Dec 2008, Tom Lane wrote:
>
>> Date: Tue, 09 Dec 2008 09:23:06 -0500
>> From: Tom Lane <tgl@sss.pgh.pa.us>
>> To: ohp@pyrenet.fr
>> Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
>>     Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>>     pgsql-hackers list <pgsql-hackers@postgresql.org>
>> Subject: Re: [HACKERS] cvs head initdb hangs on unixware ohp@pyrenet.fr 
>> writes:
>>> Guess what! with the fprintf .. descending node... in place, everything
>>> goes well. The optimizer definitly does something weird along the
>>> definition/assignement of leftok/rightok..
>>
>> Hmm, so the problem is in that second loop.  The trick is to pick some
>> reasonably non-ugly code change that makes the problem go away.
>>
>> The first thing I'd try is to get rid of the overly cute optimization
>>
>>     int rightnodeno = leftnodeno + 1;
>>
>> and make it just read
>>
>>     int rightnodeno = rightchild(nodeno);
>>
>> If that doesn't work, we might try refactoring the code enough to get
>> rid of the goto, but that looks a little bit tedious.
>>
>>             regards, tom lane
>>
>   I tried that and moving leftok,rightok declaration outside the loop, and 
> refactor the assignement code of leftok, rightok . nothing worked!
>
> Regards,
> -- 
> Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
> 15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
> 31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
> FRANCE                          Email: ohp@pyrenet.fr
> ------------------------------------------------------------------------------
> Make your life a dream, make your dream a reality. (St Exupery)
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> On Tue, 9 Dec 2008, Tom Lane wrote:
>> Hmm, so the problem is in that second loop.  The trick is to pick some
>> reasonably non-ugly code change that makes the problem go away.

>    I tried that and moving leftok,rightok declaration outside the loop, and 
> refactor the assignement code of leftok, rightok . nothing worked!

I was afraid of that.  We'd need to look at the assembly code to be sure
(can you provide it?), but what I bet is happening is that the compiler
is looking at the leftnodeno/rightnodeno computations and thinking it can
optimize those by a strength-reduction method, failing to notice that
the loop isn't a simple scan on nodeno.

Now in that regard the logic isn't very much different from a binary
search, which we have lots of and those have always worked.  So I'm
back to the theory that the goto inside the inner loop is probably
contributing to the confusion somehow.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> FWIW, I have attached the 2 generated .s. Someone with knowledge of asm
> may want to have a look..

Hmm.  It looks to me like the compiler is getting confused by the
interaction between nodeno, leftnodeno, and rightnodeno.  Try this
patch to see if it gets around it.  (This is a tad better anyway
since it avoids examining the right child if not needed.)

            regards, tom lane

Index: fsmpage.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/freespace/fsmpage.c,v
retrieving revision 1.2
diff -c -r1.2 fsmpage.c
*** fsmpage.c    7 Oct 2008 21:10:11 -0000    1.2
--- fsmpage.c    9 Dec 2008 18:18:53 -0000
***************
*** 243,259 ****
       */
      while (nodeno < NonLeafNodesPerPage)
      {
!         int leftnodeno = leftchild(nodeno);
!         int rightnodeno = leftnodeno + 1;
!         bool leftok = (leftnodeno < NodesPerPage) &&
!             (fsmpage->fp_nodes[leftnodeno] >= minvalue);
!         bool rightok = (rightnodeno < NodesPerPage) &&
!             (fsmpage->fp_nodes[rightnodeno] >= minvalue);
!
!         if (leftok)
!             nodeno = leftnodeno;
!         else if (rightok)
!             nodeno = rightnodeno;
          else
          {
              /*
--- 243,262 ----
       */
      while (nodeno < NonLeafNodesPerPage)
      {
!         int childnodeno = leftchild(nodeno);
!
!         if (childnodeno < NodesPerPage &&
!             fsmpage->fp_nodes[childnodeno] >= minvalue)
!         {
!             nodeno = childnodeno;
!             continue;
!         }
!         childnodeno++;            /* point to right child */
!         if (childnodeno < NodesPerPage &&
!             fsmpage->fp_nodes[childnodeno] >= minvalue)
!         {
!             nodeno = childnodeno;
!         }
          else
          {
              /*

Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
Dear Tom,
On Tue, 9 Dec 2008, Tom Lane wrote:

> Date: Tue, 09 Dec 2008 13:24:21 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware 
> 
> ohp@pyrenet.fr writes:
>> FWIW, I have attached the 2 generated .s. Someone with knowledge of asm
>> may want to have a look..
>
> Hmm.  It looks to me like the compiler is getting confused by the
> interaction between nodeno, leftnodeno, and rightnodeno.  Try this
> patch to see if it gets around it.  (This is a tad better anyway
> since it avoids examining the right child if not needed.)
>
>             regards, tom lane
>
>
Brillant!
You made my day, can't wait for this patch to be committed.  Thanks!!!

PS:   I wish I had 10% of your knowledge/genius!
-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
ohp@pyrenet.fr wrote:
> On Tue, 9 Dec 2008, Tom Lane wrote:
>> Hmm.  It looks to me like the compiler is getting confused by the
>> interaction between nodeno, leftnodeno, and rightnodeno.  Try this
>> patch to see if it gets around it.  (This is a tad better anyway
>> since it avoids examining the right child if not needed.)
>>
> Brillant!
> You made my day, can't wait for this patch to be committed.

I find it pretty scary to work around compiler bugs like this. Who knows 
what other code it miscompiles. Can you reduce fsm_search_avail into a 
small stand-alone test program, and file a bug report with the compiler 
vendor?

BTW, why does this work on warthog buildfarm member? Different compiler 
version?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
Peter Eisentraut
Date:
Heikki Linnakangas wrote:
> I find it pretty scary to work around compiler bugs like this. Who knows 
> what other code it miscompiles. Can you reduce fsm_search_avail into a 
> small stand-alone test program, and file a bug report with the compiler 
> vendor?
> 
> BTW, why does this work on warthog buildfarm member? Different compiler 
> version?

The archives are full of compiler bugs specifically in the SCO compilers 
appearing and disappearing in various versions.  We usually don't try to  work around it; instead we make a note to
avoidcertain compiler 
 
versions.  Filing upstream bugs usually also works.


Re: cvs head initdb hangs on unixware

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> ohp@pyrenet.fr wrote:
> > On Tue, 9 Dec 2008, Tom Lane wrote:
> >> Hmm.  It looks to me like the compiler is getting confused by the
> >> interaction between nodeno, leftnodeno, and rightnodeno.  Try this
> >> patch to see if it gets around it.  (This is a tad better anyway
> >> since it avoids examining the right child if not needed.)
> >>
> > Brillant!
> > You made my day, can't wait for this patch to be committed.
> 
> I find it pretty scary to work around compiler bugs like this. Who knows 
> what other code it miscompiles. Can you reduce fsm_search_avail into a 
> small stand-alone test program, and file a bug report with the compiler 
> vendor?
> 
> BTW, why does this work on warthog buildfarm member? Different compiler 
> version?

I assume this is the SCO compiler;  I gave up on the SCO compiler in the
1990's, and I suggest we do the same.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: cvs head initdb hangs on unixware

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Heikki Linnakangas wrote:
> > I find it pretty scary to work around compiler bugs like this. Who knows 
> > what other code it miscompiles. Can you reduce fsm_search_avail into a 
> > small stand-alone test program, and file a bug report with the compiler 
> > vendor?
> > 
> > BTW, why does this work on warthog buildfarm member? Different compiler 
> > version?
> 
> The archives are full of compiler bugs specifically in the SCO compilers 
> appearing and disappearing in various versions.  We usually don't try to 
>   work around it; instead we make a note to avoid certain compiler 
> versions.  Filing upstream bugs usually also works.

The SCO compiler is so bad and so prone to breakage that I question
whether it is even worth filing upstream bug reports.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
On Wed, 10 Dec 2008, Heikki Linnakangas wrote:

> Date: Wed, 10 Dec 2008 13:00:31 +0200
> From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
> To: ohp@pyrenet.fr
> Cc: Tom Lane <tgl@sss.pgh.pa.us>, Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware
> 
> ohp@pyrenet.fr wrote:
>> On Tue, 9 Dec 2008, Tom Lane wrote:
>>> Hmm.  It looks to me like the compiler is getting confused by the
>>> interaction between nodeno, leftnodeno, and rightnodeno.  Try this
>>> patch to see if it gets around it.  (This is a tad better anyway
>>> since it avoids examining the right child if not needed.)
>>> 
>> Brillant!
>> You made my day, can't wait for this patch to be committed.
>
> I find it pretty scary to work around compiler bugs like this. Who knows what 
> other code it miscompiles. Can you reduce fsm_search_avail into a small 
> stand-alone test program, and file a bug report with the compiler vendor?
FWIW, the compiler doesn't miscompîle anything on postgresql, as an heavy 
user/hoster, I'd know!

Let's not start a flame here, SCO compiler is as good or as bad as 
anyother..

Never saw a problem with gcc, hp-ux, darwin or M$?
>
> BTW, why does this work on warthog buildfarm member? Different compiler 
> version?
>
it's configured with --enable-debug.
Maybe run_build.pl should run twice, onece with --enable-debug once 
without.
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: cvs head initdb hangs on unixware

From
Heikki Linnakangas
Date:
ohp@pyrenet.fr wrote:
> On Wed, 10 Dec 2008, Heikki Linnakangas wrote:
>> I find it pretty scary to work around compiler bugs like this. Who 
>> knows what other code it miscompiles. Can you reduce fsm_search_avail 
>> into a small stand-alone test program, and file a bug report with the 
>> compiler vendor?
> FWIW, the compiler doesn't miscompîle anything on postgresql, as an 
> heavy user/hoster, I'd know!
> 
> Let's not start a flame here, SCO compiler is as good or as bad as 
> anyother..
> 
> Never saw a problem with gcc, hp-ux, darwin or M$?

Sure, that's not what I was saying. My point is, when there's a bug in 
one version of a compiler, we shouldn't try to adapt PostgreSQL to that 
bug. Instead, we should narrow down the bug, get it fixed in the 
compiler, and tell users to use the most recent version of the compiler 
where the bug has been fixed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> On Wed, 10 Dec 2008, Heikki Linnakangas wrote:
>> BTW, why does this work on warthog buildfarm member? Different compiler 
>> version?
>> 
> it's configured with --enable-debug.
> Maybe run_build.pl should run twice, onece with --enable-debug once 
> without.

No, the standard way to deal with such issues is to set up two buildfarm
members.  This would be a 100% waste of cycles for gcc-based members
anyway, since gcc generates the same code with or without -g.  However,
for compilers where it makes a difference, it might well be worth having
an additional member to test the optimized build.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> ohp@pyrenet.fr writes:
>> On Wed, 10 Dec 2008, Heikki Linnakangas wrote:
>>> BTW, why does this work on warthog buildfarm member? Different compiler 
>>> version?
>>>
>> it's configured with --enable-debug.
>> Maybe run_build.pl should run twice, onece with --enable-debug once 
>> without.
> 
> No, the standard way to deal with such issues is to set up two buildfarm
> members.  This would be a 100% waste of cycles for gcc-based members
> anyway, since gcc generates the same code with or without -g.  However,
> for compilers where it makes a difference, it might well be worth having
> an additional member to test the optimized build.

I think current infrastructures is not good for it. For example I would like to 
compile postgres on one machine with three different compiler and in 32 or 64 
mode. Should I have 6 animals? I think better idea is to have one animal and 
several test sets. Animals defines HW+OS version and test set specify PG 
version, configure switches, compiler and so on.
these are my two cents
    Zdenek


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> ohp@pyrenet.fr wrote:
>> Never saw a problem with gcc, hp-ux, darwin or M$?

> Sure, that's not what I was saying. My point is, when there's a bug in 
> one version of a compiler, we shouldn't try to adapt PostgreSQL to that 
> bug. Instead, we should narrow down the bug, get it fixed in the 
> compiler, and tell users to use the most recent version of the compiler 
> where the bug has been fixed.

We should certainly file a bug report against the compiler.  However,
ISTM a workaround is a good idea too if it's not too ugly (which this
one isn't).  If a bug exists in one compiler there might be similar
bugs in other compilers.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Tom Lane napsal(a):
>> No, the standard way to deal with such issues is to set up two buildfarm
>> members.

> I think current infrastructures is not good for it. For example I would like to 
> compile postgres on one machine with three different compiler and in 32 or 64 
> mode. Should I have 6 animals?

Yes.

> I think better idea is to have one animal and 
> several test sets.

That simply complicates everything --- the reporting infrastructure,
identifying which case failed, etc --- without actually improving
anything.
        regards, tom lane


Re: cvs head initdb hangs on unixware

From
Martijn van Oosterhout
Date:
On Wed, Dec 10, 2008 at 06:27:05PM +0100, Zdenek Kotala wrote:
> I think current infrastructures is not good for it. For example I would
> like to compile postgres on one machine with three different compiler and
> in 32 or 64 mode. Should I have 6 animals? I think better idea is to have
> one animal and several test sets. Animals defines HW+OS version and test
> set specify PG version, configure switches, compiler and so on.

Well, you could name them animal-1, animal-2, animal-3, etc... Once the
list reaches 100 entries we can think about alternatives...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: cvs head initdb hangs on unixware

From
Peter Eisentraut
Date:
On Wednesday 10 December 2008 19:36:38 Tom Lane wrote:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > Tom Lane napsal(a):
> >> No, the standard way to deal with such issues is to set up two buildfarm
> >> members.
> >
> > I think current infrastructures is not good for it. For example I would
> > like to compile postgres on one machine with three different compiler and
> > in 32 or 64 mode. Should I have 6 animals?
>
> Yes.

I have to say, I have concerns similar to Zdenek's.  Setting up a load of 
different animals for every altered configuration makes it difficult to tell 
which configurations are actually related.

I have been thinking about test coverage recently and analyzed bugs and so on.  
To get more confidence beyond a random (not even truly random) subset of 
platforms and options we should really be building with a lot more 
combinations of

- compilers
- compiler options
- configure options
- run time options
(- more tests of other code areas, but that is a different problem)

Note, for example, that downstream binary packages are almost never built with 
default or near-default compiler options, and of course production 
installations are hopefully never run with the default run-time 
configuration.  Essentially, we are not really testing what the users are 
running.

To cover reality better, I can easily imagine that a single platform (say, 
CPU, OS, bitness, and compiler) should do at least fifty different test runs 
in different combinations.  There, we'd also have resource problems, but some 
people have machines that can do that (and want to do that).  How can we 
accomodate that today?

A coincidental trouble with this is that I find the animal names to be 
increasingly difficult to process and remember.  They are basically just line 
noise to me at this point.  Other non-biologists might feel the same.  And we 
might eventually run out of reasonable names.

> That simply complicates everything --- the reporting infrastructure,
> identifying which case failed, etc --- without actually improving
> anything.

I don't think it has to be that complicated.  We could probably augment the 
naming scheme like "animal/foo" or "animal/12" or something like that.


Re: cvs head initdb hangs on unixware

From
Andrew Dunstan
Date:

Zdenek Kotala wrote:
> Tom Lane napsal(a):
>> ohp@pyrenet.fr writes:
>>> On Wed, 10 Dec 2008, Heikki Linnakangas wrote:
>>>> BTW, why does this work on warthog buildfarm member? Different 
>>>> compiler version?
>>>>
>>> it's configured with --enable-debug.
>>> Maybe run_build.pl should run twice, onece with --enable-debug once 
>>> without.
>>
>> No, the standard way to deal with such issues is to set up two buildfarm
>> members.  This would be a 100% waste of cycles for gcc-based members
>> anyway, since gcc generates the same code with or without -g.  However,
>> for compilers where it makes a difference, it might well be worth having
>> an additional member to test the optimized build.
>
> I think current infrastructures is not good for it. For example I 
> would like to compile postgres on one machine with three different 
> compiler and in 32 or 64 mode. Should I have 6 animals? I think better 
> idea is to have one animal and several test sets. Animals defines 
> HW+OS version and test set specify PG version, configure switches, 
> compiler and so on.
>
>   

Well, you're asking for a significant redesign for which I at least 
don't have time. What is so hard about having six animals on one 
machine. A number of people have such setups, including me.

cheers

andrew


Re: cvs head initdb hangs on unixware

From
Aidan Van Dyk
Date:
* Zdenek Kotala <Zdenek.Kotala@Sun.COM> [081210 12:29]:

>> No, the standard way to deal with such issues is to set up two buildfarm
>> members.  This would be a 100% waste of cycles for gcc-based members
>> anyway, since gcc generates the same code with or without -g.  However,
>> for compilers where it makes a difference, it might well be worth having
>> an additional member to test the optimized build.

> I think current infrastructures is not good for it. For example I would 
> like to compile postgres on one machine with three different compiler and 
> in 32 or 64 mode. Should I have 6 animals? I think better idea is to have 
> one animal and several test sets. Animals defines HW+OS version and test 
> set specify PG version, configure switches, compiler and so on.

Sure and in my neck of the woods, and there are cows, calfs, heiffers,
bulls, steers, but they are all cattle...  And when talking about cows,
Jerseys and Guernsey's have high MF, lower production, Ayrshire have
high production, lower MF, and Holstiens inbetween.

Should I call them "cow with high MF" and "cow with high production", or
just say Jersey or Ayrshire?

Where ever you (the generic you, not specific you) draw the line, what
you call it is still arbitrary...  But where that line is drawn
currently defined in the buildfarm code...

Not that it can't be changed, but I thin there's much better things to
worry about ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: cvs head initdb hangs on unixware

From
ohp@pyrenet.fr
Date:
Tom,
On Wed, 10 Dec 2008, Tom Lane wrote:

> Date: Wed, 10 Dec 2008 12:17:18 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
>     Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
>     pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] cvs head initdb hangs on unixware 
> 
> ohp@pyrenet.fr writes:
>> On Wed, 10 Dec 2008, Heikki Linnakangas wrote:
>>> BTW, why does this work on warthog buildfarm member? Different compiler
>>> version?
>>>
>> it's configured with --enable-debug.
>> Maybe run_build.pl should run twice, onece with --enable-debug once
>> without.
>
> No, the standard way to deal with such issues is to set up two buildfarm
> members.  This would be a 100% waste of cycles for gcc-based members
> anyway, since gcc generates the same code with or without -g.  However,
> for compilers where it makes a difference, it might well be worth having
> an additional member to test the optimized build.
>
>             regards, tom lane
> I understand your concern. Maybe an option --flip-debug that would not 
be used by gcc owners could help having both tests in 1 run.

In the mean time, while preparing my home unixware server to become an 
other animal, I came on a new optimizer bug in ecpg.

To  not pollute this close thread, I start a new one.

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)