Thread: Unixware 714 pthreads
Hi every one, I need help to debug the problem I have on Unixware 714 and beta3. postgresql make check hangs on plpgsql test when compiled with --enable-thread-safty. It does hang on select block_me(); This select should be canceled by the set statement_timeout=1000, instead, the backend is 100% CPU bound and only kill -9 can kill it. It works ok when compiled without -enable-thread-safty. I've tried almost every thing I could think of, but not knowing so much about threads and PG source code, I request that someone can help me as to find a way to debug this. It worked up until beta2, but I'm not sure block_me()was there. I really need someone to tell me where to begin. TIA -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
The only help I can be is that on Unixware (only) the backend is compiled with threading enabled. This might be showing some thread bugs. --------------------------------------------------------------------------- ohp@pyrenet.fr wrote: > Hi every one, > > I need help to debug the problem I have on Unixware 714 and beta3. > postgresql make check hangs on plpgsql test when compiled with > --enable-thread-safty. > > It does hang on select block_me(); > > This select should be canceled by the set statement_timeout=1000, instead, > the backend is 100% CPU bound and only kill -9 can kill it. > > It works ok when compiled without -enable-thread-safty. > > I've tried almost every thing I could think of, but not knowing so much > about threads and PG source code, I request that someone can help me as to > find a way to debug this. It worked up until beta2, but I'm not sure > block_me()was there. > > I really need someone to tell me where to begin. > > TIA > > -- > Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) > 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) > 31190 AUTERIVE +33-6-07-63-80-64 (GSM) > FRANCE Email: ohp@pyrenet.fr > ------------------------------------------------------------------------------ > Make your life a dream, make your dream a reality. (St Exupery) > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Dear Bruce, Thanks for your reply, I was desperate I did'nt get one! As I said, I'm quite sure there is a bug in pthread library, Before saying this to SCO, I have to prove it. Postgresql is the way to prove it! What I need is to know where to start from (I'd like to put elogs where statement_timeout is processed to see what really happens and why it doesn't cancel the query). Could someone tell me where to look for? If anyone is interessed in debugging this issue with me, I can set up an account on a test unixware machine. TIAOn Tue, 26 Oct 2004, Bruce Momjian wrote: > Date: Tue, 26 Oct 2004 17:59:17 -0400 (EDT) > From: Bruce Momjian <pgman@candle.pha.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unixware 714 pthreads > > > The only help I can be is that on Unixware (only) the backend is > compiled with threading enabled. This might be showing some thread > bugs. > > > --------------------------------------------------------------------------- > > ohp@pyrenet.fr wrote: > > Hi every one, > > > > I need help to debug the problem I have on Unixware 714 and beta3. > > postgresql make check hangs on plpgsql test when compiled with > > --enable-thread-safty. > > > > It does hang on select block_me(); > > > > This select should be canceled by the set statement_timeout=1000, instead, > > the backend is 100% CPU bound and only kill -9 can kill it. > > > > It works ok when compiled without -enable-thread-safty. > > > > I've tried almost every thing I could think of, but not knowing so much > > about threads and PG source code, I request that someone can help me as to > > find a way to debug this. It worked up until beta2, but I'm not sure > > block_me()was there. > > > > I really need someone to tell me where to begin. > > > > TIA > > > > -- > > Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) > > 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) > > 31190 AUTERIVE +33-6-07-63-80-64 (GSM) > > FRANCE Email: ohp@pyrenet.fr > > ------------------------------------------------------------------------------ > > Make your life a dream, make your dream a reality. (St Exupery) > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > > > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr wrote: > Dear Bruce, > > Thanks for your reply, I was desperate I did'nt get one! > > As I said, I'm quite sure there is a bug in pthread library, Before saying > this to SCO, I have to prove it. Postgresql is the way to prove it! > > What I need is to know where to start from (I'd like to put elogs where > statement_timeout is processed to see what really happens and why it > doesn't cancel the query). > > Could someone tell me where to look for? If anyone is interessed in > debugging this issue with me, I can set up an account on a test unixware > machine. My guess is that there is some problem with delivering alarm signals because that is how the timeout code works. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, 27 Oct 2004, Bruce Momjian wrote: > Date: Wed, 27 Oct 2004 14:53:26 -0400 (EDT) > From: Bruce Momjian <pgman@candle.pha.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unixware 714 pthreads > > ohp@pyrenet.fr wrote: > > Dear Bruce, > > > > Thanks for your reply, I was desperate I did'nt get one! > > > > As I said, I'm quite sure there is a bug in pthread library, Before saying > > this to SCO, I have to prove it. Postgresql is the way to prove it! > > > > What I need is to know where to start from (I'd like to put elogs where > > statement_timeout is processed to see what really happens and why it > > doesn't cancel the query). > > > > Could someone tell me where to look for? If anyone is interessed in > > debugging this issue with me, I can set up an account on a test unixware > > machine. > > My guess is that there is some problem with delivering alarm signals > because that is how the timeout code works. > That's my guess too. I've traked that to src/backend/storage/lmrg/proc.c where kill is called. Unixware doc says that kill to self_proc id delivers the signal to the thread that called it. For some reason, this backend has 2 threads (can't figure why) and INMHO kill should be pthread_kill. I wanted to try but found no way to find the other thread_id. I need the help of postgresql/thread guru here. Many thanks > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
Sorry to follow up my own post. I made some more tests: create table foo (f1 int) -- for it not to be removed if if kill the process each time I do: psql template1 template1# set statement_timeout=1000; SET template1 select block_me(); it works ok if i do it a second time in the same session, blockme() never returns I wonder if handle_sig_alarm is re-armed after being used or if sleep is used anywhere in the backend. Unixware doc for setitimer (http://www.pyrenet.fr:8458/en/man/html.3C/getitimer.3C.html) says that "A sleep following a setitimer wipes out knowledge of the user signal handler" What can I do next? Regards On Wed, 27 Oct 2004 ohp@pyrenet.fr wrote: > Date: Wed, 27 Oct 2004 13:01:45 +0200 (MET DST) > From: ohp@pyrenet.fr > Newsgroups: comp.databases.postgresql.hackers > Subject: Re: Unixware 714 pthreads > > Dear Bruce, > > Thanks for your reply, I was desperate I did'nt get one! > > As I said, I'm quite sure there is a bug in pthread library, Before saying > this to SCO, I have to prove it. Postgresql is the way to prove it! > > What I need is to know where to start from (I'd like to put elogs where > statement_timeout is processed to see what really happens and why it > doesn't cancel the query). > > Could someone tell me where to look for? If anyone is interessed in > debugging this issue with me, I can set up an account on a test unixware > machine. > > TIA > On Tue, 26 Oct 2004, Bruce Momjian wrote: > > > Date: Tue, 26 Oct 2004 17:59:17 -0400 (EDT) > > From: Bruce Momjian <pgman@candle.pha.pa.us> > > To: ohp@pyrenet.fr > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Unixware 714 pthreads > > > > > > The only help I can be is that on Unixware (only) the backend is > > compiled with threading enabled. This might be showing some thread > > bugs. > > > > > > --------------------------------------------------------------------------- > > > > ohp@pyrenet.fr wrote: > > > Hi every one, > > > > > > I need help to debug the problem I have on Unixware 714 and beta3. > > > postgresql make check hangs on plpgsql test when compiled with > > > --enable-thread-safty. > > > > > > It does hang on select block_me(); > > > > > > This select should be canceled by the set statement_timeout=1000, instead, > > > the backend is 100% CPU bound and only kill -9 can kill it. > > > > > > It works ok when compiled without -enable-thread-safty. > > > > > > I've tried almost every thing I could think of, but not knowing so much > > > about threads and PG source code, I request that someone can help me as to > > > find a way to debug this. It worked up until beta2, but I'm not sure > > > block_me()was there. > > > > > > I really need someone to tell me where to begin. > > > > > > TIA > > > > > > -- > > > Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) > > > 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) > > > 31190 AUTERIVE +33-6-07-63-80-64 (GSM) > > > FRANCE Email: ohp@pyrenet.fr > > > ------------------------------------------------------------------------------ > > > Make your life a dream, make your dream a reality. (St Exupery) > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 9: the planner will ignore your desire to choose an index scan if your > > > joining column's datatypes do not match > > > > > > > > > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr writes: > if i do it a second time in the same session, blockme() never returns > I wonder if handle_sig_alarm is re-armed after being used No. Why should the signal handler need re-arming? > or if sleep is used anywhere in the backend. Nope. regards, tom lane
Tom Lane wrote: >ohp@pyrenet.fr writes: > > >>if i do it a second time in the same session, blockme() never returns >> >> > > > >>I wonder if handle_sig_alarm is re-armed after being used >> >> > >No. Why should the signal handler need re-arming? > > > >>or if sleep is used anywhere in the backend. >> >> > >Nope. > > > Actually, I just noticed that postmaster/pg_arch.c has a call to sleep() that needs to be removed ... I guess it snuck in after all that stuff was adjusted. cheers andrew
On Thu, 28 Oct 2004, Tom Lane wrote: > Date: Thu, 28 Oct 2004 12:11:12 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unixware 714 pthreads > > ohp@pyrenet.fr writes: > > if i do it a second time in the same session, blockme() never returns > > > I wonder if handle_sig_alarm is re-armed after being used > > No. Why should the signal handler need re-arming? My impression was that once caught, signal handler for a particular signal is reset to SIG-DFL. > > > or if sleep is used anywhere in the backend. > > Nope. > > regards, tom lane > Oh well, bye bye htreads on unixware, I give up! (very disapointed) cheers -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr writes: > On Thu, 28 Oct 2004, Tom Lane wrote: >> No. Why should the signal handler need re-arming? > My impression was that once caught, signal handler for a particular signal > is reset to SIG-DFL. No. If your signal support is POSIX-compatible, it should not do that because we don't set SA_RESETHAND when calling sigaction(2). If you don't have POSIX signals, you had better have BSD-style signal(2), which doesn't reset either. If this is not happening as expected, you will have much worse problems than whether statement_timeout works :-( regards, tom lane
I agree with all that you say Tom, I'm just asking for some help to debug this, Now that Larry is a litle off the list, I'm feeling really lonely on UW. SCO won't do anything until I come up with a test program that fails. All my tries did work until then. I use other threaded progs like postfix or bind that nether fail. I'm really at lost. Would you/someone help me? Best regards On Thu, 28 Oct 2004, Tom Lane wrote: > Date: Thu, 28 Oct 2004 13:55:56 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unixware 714 pthreads > > ohp@pyrenet.fr writes: > > On Thu, 28 Oct 2004, Tom Lane wrote: > >> No. Why should the signal handler need re-arming? > > > My impression was that once caught, signal handler for a particular signal > > is reset to SIG-DFL. > > No. If your signal support is POSIX-compatible, it should not do that > because we don't set SA_RESETHAND when calling sigaction(2). If you > don't have POSIX signals, you had better have BSD-style signal(2), > which doesn't reset either. If this is not happening as expected, > you will have much worse problems than whether statement_timeout works :-( > > regards, tom lane > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
Tom Lane wrote: > ohp@pyrenet.fr writes: > > On Thu, 28 Oct 2004, Tom Lane wrote: > >> No. Why should the signal handler need re-arming? > > > My impression was that once caught, signal handler for a particular signal > > is reset to SIG-DFL. > > No. If your signal support is POSIX-compatible, it should not do that > because we don't set SA_RESETHAND when calling sigaction(2). If you > don't have POSIX signals, you had better have BSD-style signal(2), > which doesn't reset either. If this is not happening as expected, > you will have much worse problems than whether statement_timeout works :-( SysV-style signal(2) handling does indeed require that the signal handler be re-enabled. The attached program demonstrates this on Solaris, and probably on Unixware as well (I don't have access to the latter). Just run it and interrupt it with ctrl-c. It should print something the first time around, and actually be interrupted the second time. So if Unixware doesn't have sigaction() or it's not being picked up by autoconf then yeah, he'll have big problems... -- Kevin Brown kevin@sysexperts.com
Attachment
ohp@pyrenet.fr wrote: > I agree with all that you say Tom, I'm just asking for some help to debug > this, Now that Larry is a litle off the list, I'm feeling really lonely on > UW. > SCO won't do anything until I come up with a test program that fails. All > my tries did work until then. > > I use other threaded progs like postfix or bind that nether fail. > > I'm really at lost. Would you/someone help me? The problem is that we are then spending our time debugging Unixware problems rather than focusing on our database software. I think this is why few have offered assistance. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, 28 Oct 2004, Bruce Momjian wrote: > Date: Thu, 28 Oct 2004 19:58:56 -0400 (EDT) > From: Bruce Momjian <pgman@candle.pha.pa.us> > To: ohp@pyrenet.fr > Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unixware 714 pthreads > > ohp@pyrenet.fr wrote: > > I agree with all that you say Tom, I'm just asking for some help to debug > > this, Now that Larry is a litle off the list, I'm feeling really lonely on > > UW. > > SCO won't do anything until I come up with a test program that fails. All > > my tries did work until then. > > > > I use other threaded progs like postfix or bind that nether fail. > > > > I'm really at lost. Would you/someone help me? > > The problem is that we are then spending our time debugging Unixware > problems rather than focusing on our database software. I think this is > why few have offered assistance. > > I understand your concerns, OTOH you all spend quite a lot of time debugging windows... Anyway, the little program attached does more or less what postgresql does in term of sigaction and setitimer (Yes, unixware DOES have posix signals) and works like a charm wether compiled with or without pthreads. Regards-- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr wrote: > On Thu, 28 Oct 2004, Bruce Momjian wrote: > > > Date: Thu, 28 Oct 2004 19:58:56 -0400 (EDT) > > From: Bruce Momjian <pgman@candle.pha.pa.us> > > To: ohp@pyrenet.fr > > Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Unixware 714 pthreads > > > > ohp@pyrenet.fr wrote: > > > I agree with all that you say Tom, I'm just asking for some help to debug > > > this, Now that Larry is a litle off the list, I'm feeling really lonely on > > > UW. > > > SCO won't do anything until I come up with a test program that fails. All > > > my tries did work until then. > > > > > > I use other threaded progs like postfix or bind that nether fail. > > > > > > I'm really at lost. Would you/someone help me? > > > > The problem is that we are then spending our time debugging Unixware > > problems rather than focusing on our database software. I think this is > > why few have offered assistance. > > > > > I understand your concerns, OTOH you all spend quite a lot of time > debugging windows... > > Anyway, the little program attached does more or less what postgresql does > in term of sigaction and setitimer (Yes, unixware DOES have posix signals) > and works like a charm wether compiled with or without pthreads. That work is done by people who care about the Win32 port and plan to use it, just like you are debugging Unixware because you use it. We don't expect non-Win32 users to fix Win32 problems and we don't expect non-Unixware people to fix Unixware problems. Each platform's users have to track down its own bugs for us to work efficiently. Also, we haven't spent much time tracking down Win32 bugs but more figuring out how to use the Win32 API. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073