Thread: Unixware 714 pthreads

Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
Hi every one,

I need help to debug the problem I have on Unixware 714 and beta3.
postgresql make check hangs on plpgsql test when compiled with
--enable-thread-safty.

It does hang on select block_me();

This select should be canceled by the set statement_timeout=1000, instead,
the backend is 100% CPU bound and only kill -9 can kill it.

It works ok when compiled without -enable-thread-safty.

I've tried almost every thing I could think of, but not knowing so much
about threads and PG source code, I request that someone can help me as to
find a way to debug this. It worked up until beta2, but I'm not sure
block_me()was there.

I really need someone to tell me where to begin.

TIA

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
Bruce Momjian
Date:
The only help I can be is that on Unixware (only) the backend is
compiled with threading enabled.  This might be showing some thread
bugs.


---------------------------------------------------------------------------

ohp@pyrenet.fr wrote:
> Hi every one,
> 
> I need help to debug the problem I have on Unixware 714 and beta3.
> postgresql make check hangs on plpgsql test when compiled with
> --enable-thread-safty.
> 
> It does hang on select block_me();
> 
> This select should be canceled by the set statement_timeout=1000, instead,
> the backend is 100% CPU bound and only kill -9 can kill it.
> 
> It works ok when compiled without -enable-thread-safty.
> 
> I've tried almost every thing I could think of, but not knowing so much
> about threads and PG source code, I request that someone can help me as to
> find a way to debug this. It worked up until beta2, but I'm not sure
> block_me()was there.
> 
> I really need someone to tell me where to begin.
> 
> TIA
> 
> -- 
> Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
> 6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
> 31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
> FRANCE                          Email: ohp@pyrenet.fr
> ------------------------------------------------------------------------------
> Make your life a dream, make your dream a reality. (St Exupery)
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
Dear Bruce,

Thanks for your reply, I was desperate I did'nt get one!

As I said, I'm quite sure there is a bug in pthread library, Before saying
this to SCO, I have to prove it. Postgresql is the way to prove it!

What I need is to know where to start from (I'd like to put elogs where
statement_timeout is processed to see what really happens and why it
doesn't cancel the query).

Could someone tell me where to look for? If anyone is interessed in
debugging this issue with me, I can set up  an account on a test unixware
machine.

TIAOn Tue, 26 Oct 2004, Bruce Momjian wrote:

> Date: Tue, 26 Oct 2004 17:59:17 -0400 (EDT)
> From: Bruce Momjian <pgman@candle.pha.pa.us>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Unixware 714 pthreads
>
>
> The only help I can be is that on Unixware (only) the backend is
> compiled with threading enabled.  This might be showing some thread
> bugs.
>
>
> ---------------------------------------------------------------------------
>
> ohp@pyrenet.fr wrote:
> > Hi every one,
> >
> > I need help to debug the problem I have on Unixware 714 and beta3.
> > postgresql make check hangs on plpgsql test when compiled with
> > --enable-thread-safty.
> >
> > It does hang on select block_me();
> >
> > This select should be canceled by the set statement_timeout=1000, instead,
> > the backend is 100% CPU bound and only kill -9 can kill it.
> >
> > It works ok when compiled without -enable-thread-safty.
> >
> > I've tried almost every thing I could think of, but not knowing so much
> > about threads and PG source code, I request that someone can help me as to
> > find a way to debug this. It worked up until beta2, but I'm not sure
> > block_me()was there.
> >
> > I really need someone to tell me where to begin.
> >
> > TIA
> >
> > --
> > Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
> > 6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
> > 31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
> > FRANCE                          Email: ohp@pyrenet.fr
> > ------------------------------------------------------------------------------
> > Make your life a dream, make your dream a reality. (St Exupery)
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
> >
>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
Bruce Momjian
Date:
ohp@pyrenet.fr wrote:
> Dear Bruce,
> 
> Thanks for your reply, I was desperate I did'nt get one!
> 
> As I said, I'm quite sure there is a bug in pthread library, Before saying
> this to SCO, I have to prove it. Postgresql is the way to prove it!
> 
> What I need is to know where to start from (I'd like to put elogs where
> statement_timeout is processed to see what really happens and why it
> doesn't cancel the query).
> 
> Could someone tell me where to look for? If anyone is interessed in
> debugging this issue with me, I can set up  an account on a test unixware
> machine.

My guess is that there is some problem with delivering alarm signals
because that is how the timeout code works.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
On Wed, 27 Oct 2004, Bruce Momjian wrote:

> Date: Wed, 27 Oct 2004 14:53:26 -0400 (EDT)
> From: Bruce Momjian <pgman@candle.pha.pa.us>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Unixware 714 pthreads
>
> ohp@pyrenet.fr wrote:
> > Dear Bruce,
> >
> > Thanks for your reply, I was desperate I did'nt get one!
> >
> > As I said, I'm quite sure there is a bug in pthread library, Before saying
> > this to SCO, I have to prove it. Postgresql is the way to prove it!
> >
> > What I need is to know where to start from (I'd like to put elogs where
> > statement_timeout is processed to see what really happens and why it
> > doesn't cancel the query).
> >
> > Could someone tell me where to look for? If anyone is interessed in
> > debugging this issue with me, I can set up  an account on a test unixware
> > machine.
>
> My guess is that there is some problem with delivering alarm signals
> because that is how the timeout code works.
>
That's my guess too. I've traked that to src/backend/storage/lmrg/proc.c
where kill is called.

Unixware doc says that kill to self_proc id delivers the signal to the
thread that called it.

For some reason, this backend has 2 threads (can't figure why) and INMHO
kill should be pthread_kill.

I wanted to try but found no way to find the other thread_id.

I need the help of postgresql/thread guru here.

Many thanks
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
Sorry to follow up my own post.

I made some more tests:
create table foo (f1 int) -- for it not to be removed if if kill the
process

each time I do:
psql template1
template1# set statement_timeout=1000;
SET
template1 select block_me();

it works ok

if i do it a second time in the same session, blockme() never returns

I wonder if handle_sig_alarm is re-armed after being used or if sleep is
used anywhere in the backend.

Unixware doc for setitimer
(http://www.pyrenet.fr:8458/en/man/html.3C/getitimer.3C.html)
says that "A sleep following a setitimer wipes out knowledge of the user
signal handler"

What can I do next?

Regards
On Wed, 27 Oct 2004 ohp@pyrenet.fr wrote:

> Date: Wed, 27 Oct 2004 13:01:45 +0200 (MET DST)
> From: ohp@pyrenet.fr
> Newsgroups: comp.databases.postgresql.hackers
> Subject: Re: Unixware 714 pthreads
>
> Dear Bruce,
>
> Thanks for your reply, I was desperate I did'nt get one!
>
> As I said, I'm quite sure there is a bug in pthread library, Before saying
> this to SCO, I have to prove it. Postgresql is the way to prove it!
>
> What I need is to know where to start from (I'd like to put elogs where
> statement_timeout is processed to see what really happens and why it
> doesn't cancel the query).
>
> Could someone tell me where to look for? If anyone is interessed in
> debugging this issue with me, I can set up  an account on a test unixware
> machine.
>
> TIA
>  On Tue, 26 Oct 2004, Bruce Momjian wrote:
>
> > Date: Tue, 26 Oct 2004 17:59:17 -0400 (EDT)
> > From: Bruce Momjian <pgman@candle.pha.pa.us>
> > To: ohp@pyrenet.fr
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Unixware 714 pthreads
> >
> >
> > The only help I can be is that on Unixware (only) the backend is
> > compiled with threading enabled.  This might be showing some thread
> > bugs.
> >
> >
> > ---------------------------------------------------------------------------
> >
> > ohp@pyrenet.fr wrote:
> > > Hi every one,
> > >
> > > I need help to debug the problem I have on Unixware 714 and beta3.
> > > postgresql make check hangs on plpgsql test when compiled with
> > > --enable-thread-safty.
> > >
> > > It does hang on select block_me();
> > >
> > > This select should be canceled by the set statement_timeout=1000, instead,
> > > the backend is 100% CPU bound and only kill -9 can kill it.
> > >
> > > It works ok when compiled without -enable-thread-safty.
> > >
> > > I've tried almost every thing I could think of, but not knowing so much
> > > about threads and PG source code, I request that someone can help me as to
> > > find a way to debug this. It worked up until beta2, but I'm not sure
> > > block_me()was there.
> > >
> > > I really need someone to tell me where to begin.
> > >
> > > TIA
> > >
> > > --
> > > Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
> > > 6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
> > > 31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
> > > FRANCE                          Email: ohp@pyrenet.fr
> > > ------------------------------------------------------------------------------
> > > Make your life a dream, make your dream a reality. (St Exupery)
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 9: the planner will ignore your desire to choose an index scan if your
> > >       joining column's datatypes do not match
> > >
> >
> >
>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> if i do it a second time in the same session, blockme() never returns

> I wonder if handle_sig_alarm is re-armed after being used

No.  Why should the signal handler need re-arming?

> or if sleep is used anywhere in the backend.

Nope.
        regards, tom lane


Re: Unixware 714 pthreads

From
Andrew Dunstan
Date:

Tom Lane wrote:

>ohp@pyrenet.fr writes:
>  
>
>>if i do it a second time in the same session, blockme() never returns
>>    
>>
>
>  
>
>>I wonder if handle_sig_alarm is re-armed after being used
>>    
>>
>
>No.  Why should the signal handler need re-arming?
>
>  
>
>>or if sleep is used anywhere in the backend.
>>    
>>
>
>Nope.
>
>  
>

Actually, I just noticed that postmaster/pg_arch.c has a call to sleep() 
that needs to be removed ... I guess it snuck in after all that stuff 
was adjusted.

cheers

andrew


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
On Thu, 28 Oct 2004, Tom Lane wrote:

> Date: Thu, 28 Oct 2004 12:11:12 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Unixware 714 pthreads
>
> ohp@pyrenet.fr writes:
> > if i do it a second time in the same session, blockme() never returns
>
> > I wonder if handle_sig_alarm is re-armed after being used
>
> No.  Why should the signal handler need re-arming?
My impression was that once caught, signal handler for a particular signal
is reset to SIG-DFL.
>
> > or if sleep is used anywhere in the backend.
>
> Nope.
>
>             regards, tom lane
>
Oh well, bye  bye htreads on unixware, I give up! (very disapointed)

cheers
-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
Tom Lane
Date:
ohp@pyrenet.fr writes:
> On Thu, 28 Oct 2004, Tom Lane wrote:
>> No.  Why should the signal handler need re-arming?

> My impression was that once caught, signal handler for a particular signal
> is reset to SIG-DFL.

No.  If your signal support is POSIX-compatible, it should not do that
because we don't set SA_RESETHAND when calling sigaction(2).  If you
don't have POSIX signals, you had better have BSD-style signal(2),
which doesn't reset either.  If this is not happening as expected,
you will have much worse problems than whether statement_timeout works :-(
        regards, tom lane


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
I agree with all that you say Tom, I'm just asking for some help to debug
this, Now that Larry is a litle off the list, I'm feeling really lonely on
UW.
SCO won't do anything until I come up with a test program that fails. All
my tries did work until then.

I use other threaded progs like postfix or bind that nether fail.

I'm really at lost. Would you/someone help me?

Best regards
On Thu, 28 Oct 2004, Tom Lane wrote:

> Date: Thu, 28 Oct 2004 13:55:56 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Unixware 714 pthreads
>
> ohp@pyrenet.fr writes:
> > On Thu, 28 Oct 2004, Tom Lane wrote:
> >> No.  Why should the signal handler need re-arming?
>
> > My impression was that once caught, signal handler for a particular signal
> > is reset to SIG-DFL.
>
> No.  If your signal support is POSIX-compatible, it should not do that
> because we don't set SA_RESETHAND when calling sigaction(2).  If you
> don't have POSIX signals, you had better have BSD-style signal(2),
> which doesn't reset either.  If this is not happening as expected,
> you will have much worse problems than whether statement_timeout works :-(
>
>             regards, tom lane
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


Re: Unixware 714 pthreads

From
Kevin Brown
Date:
Tom Lane wrote:
> ohp@pyrenet.fr writes:
> > On Thu, 28 Oct 2004, Tom Lane wrote:
> >> No.  Why should the signal handler need re-arming?
>
> > My impression was that once caught, signal handler for a particular signal
> > is reset to SIG-DFL.
>
> No.  If your signal support is POSIX-compatible, it should not do that
> because we don't set SA_RESETHAND when calling sigaction(2).  If you
> don't have POSIX signals, you had better have BSD-style signal(2),
> which doesn't reset either.  If this is not happening as expected,
> you will have much worse problems than whether statement_timeout works :-(

SysV-style signal(2) handling does indeed require that the signal
handler be re-enabled.  The attached program demonstrates this on
Solaris, and probably on Unixware as well (I don't have access to the
latter).  Just run it and interrupt it with ctrl-c.  It should print
something the first time around, and actually be interrupted the
second time.


So if Unixware doesn't have sigaction() or it's not being picked up by
autoconf then yeah, he'll have big problems...



--
Kevin Brown                          kevin@sysexperts.com

Attachment

Re: Unixware 714 pthreads

From
Bruce Momjian
Date:
ohp@pyrenet.fr wrote:
> I agree with all that you say Tom, I'm just asking for some help to debug
> this, Now that Larry is a litle off the list, I'm feeling really lonely on
> UW.
> SCO won't do anything until I come up with a test program that fails. All
> my tries did work until then.
> 
> I use other threaded progs like postfix or bind that nether fail.
> 
> I'm really at lost. Would you/someone help me?

The problem is that we are then spending our time debugging Unixware
problems rather than focusing on our database software.  I think this is
why few have offered assistance.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Unixware 714 pthreads

From
ohp@pyrenet.fr
Date:
On Thu, 28 Oct 2004, Bruce Momjian wrote:

> Date: Thu, 28 Oct 2004 19:58:56 -0400 (EDT)
> From: Bruce Momjian <pgman@candle.pha.pa.us>
> To: ohp@pyrenet.fr
> Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Unixware 714 pthreads
>
> ohp@pyrenet.fr wrote:
> > I agree with all that you say Tom, I'm just asking for some help to debug
> > this, Now that Larry is a litle off the list, I'm feeling really lonely on
> > UW.
> > SCO won't do anything until I come up with a test program that fails. All
> > my tries did work until then.
> >
> > I use other threaded progs like postfix or bind that nether fail.
> >
> > I'm really at lost. Would you/someone help me?
>
> The problem is that we are then spending our time debugging Unixware
> problems rather than focusing on our database software.  I think this is
> why few have offered assistance.
>
>
I understand your concerns, OTOH you all spend quite a lot of time
debugging windows...

Anyway, the little program attached does more or less what postgresql does
in term of sigaction and setitimer (Yes, unixware DOES have posix signals)
and works like a charm wether compiled with or without pthreads.

Regards--
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: Unixware 714 pthreads

From
Bruce Momjian
Date:
ohp@pyrenet.fr wrote:
> On Thu, 28 Oct 2004, Bruce Momjian wrote:
> 
> > Date: Thu, 28 Oct 2004 19:58:56 -0400 (EDT)
> > From: Bruce Momjian <pgman@candle.pha.pa.us>
> > To: ohp@pyrenet.fr
> > Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Unixware 714 pthreads
> >
> > ohp@pyrenet.fr wrote:
> > > I agree with all that you say Tom, I'm just asking for some help to debug
> > > this, Now that Larry is a litle off the list, I'm feeling really lonely on
> > > UW.
> > > SCO won't do anything until I come up with a test program that fails. All
> > > my tries did work until then.
> > >
> > > I use other threaded progs like postfix or bind that nether fail.
> > >
> > > I'm really at lost. Would you/someone help me?
> >
> > The problem is that we are then spending our time debugging Unixware
> > problems rather than focusing on our database software.  I think this is
> > why few have offered assistance.
> >
> >
> I understand your concerns, OTOH you all spend quite a lot of time
> debugging windows...
> 
> Anyway, the little program attached does more or less what postgresql does
> in term of sigaction and setitimer (Yes, unixware DOES have posix signals)
> and works like a charm wether compiled with or without pthreads.

That work is done by people who care about the Win32 port and plan to
use it, just like you are debugging Unixware because you use it.  We
don't expect non-Win32 users to fix Win32 problems and we don't expect
non-Unixware people to fix Unixware problems.  Each platform's users
have to track down its own bugs for us to work efficiently.

Also, we haven't spent much time tracking down Win32 bugs but more
figuring out how to use the Win32 API.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073