Re: stuck spinlock - Mailing list pgsql-hackers

From Peter Schindler
Subject Re: stuck spinlock
Date
Msg-id B0021317079@tellurian.net
Whole thread Raw
In response to stuck spinlock  (Peter Schindler <pschindler@synchronicity.com>)
Responses Re: stuck spinlock  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Judging from the line number, this is in CreateCheckPoint.  I'm
> betting that your platform (Solaris 2.7, you said?) has the same odd
> behavior that I discovered a couple days ago on HPUX: a select with
> a delay of tv_sec = 0, tv_usec = 1000000 doesn't delay 1 second like
> a reasonable person would expect, but fails instantly with EINVAL.
After I finally understood what you meant, this behavior looks somehow
reasonable to me as its a struct, but I must admit, that I don't have
to much knowledge in this area.

Anyway, after further thoughts I was curious about this odd behavior on
the different platforms and I used your previously posted program, extended
it a little bit and run it on all platforms I could get a hold of.
Please have a look at the extracted log and comments below about the different
platforms. It seems, that this functions a "good" example of a really
incompatible implementation across platforms, even within the same
across different versions of the OSs.
Happy wondering ;-)


> In short: please try the latest nightly snapshot (this fix is since
> beta5, unfortunately) and let me know if you still see a problem.
I did and I didn't get the error yet, but didn't run as many jobs either.
If I get the error again, I'll post it.

Thanks for your help,
Peter

=====AIX 4.3.3
=========

 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 |  0.0 [msec/loop] | 0.0 [msec/sel]
   500 | 10.3 [msec/loop] | 1.0 [msec/sel]
  1000 | 10.3 [msec/loop] | 1.0 [msec/sel]
  1500 | 15.3 [msec/loop] | 1.5 [msec/sel]
  2000 | 20.3 [msec/loop] | 2.0 [msec/sel]
...
980000 | 9800.3 [msec/loop] | 980.0 [msec/sel]
990000 | 9899.3 [msec/loop] | 989.9 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1000000 |  0.1 [msec/loop] | 0.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1010000 |  0.1 [msec/loop] | 0.0 [msec/sel]


other more granular run with steps of 10 usec after 1000:

NUM_OF_LOOPS: 1000
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 | 2.7090 [msec/loop] | 0.0027 [msec/sel]
   100 | 1024.7370 [msec/loop] | 1.0247 [msec/sel]
   200 | 1024.3160 [msec/loop] | 1.0243 [msec/sel]
   300 | 1024.6510 [msec/loop] | 1.0247 [msec/sel]
   400 | 1024.5030 [msec/loop] | 1.0245 [msec/sel]
   500 | 1024.5400 [msec/loop] | 1.0245 [msec/sel]
   600 | 1024.8340 [msec/loop] | 1.0248 [msec/sel]
   700 | 1024.3110 [msec/loop] | 1.0243 [msec/sel]
   800 | 1024.7030 [msec/loop] | 1.0247 [msec/sel]
   900 | 1024.4560 [msec/loop] | 1.0245 [msec/sel]
  1000 | 1024.2810 [msec/loop] | 1.0243 [msec/sel]
  1010 | 1034.4840 [msec/loop] | 1.0345 [msec/sel]
  1020 | 1044.0490 [msec/loop] | 1.0440 [msec/sel]
  1030 | 1054.3530 [msec/loop] | 1.0544 [msec/sel]
  1040 | 1064.6620 [msec/loop] | 1.0647 [msec/sel]
  1050 | 1074.0980 [msec/loop] | 1.0741 [msec/sel]
  1060 | 1084.4850 [msec/loop] | 1.0845 [msec/sel]
  1070 | 1094.1270 [msec/loop] | 1.0941 [msec/sel]
  1080 | 1104.4080 [msec/loop] | 1.1044 [msec/sel]
  1090 | 1132.8880 [msec/loop] | 1.1329 [msec/sel]
  1100 | 1124.2220 [msec/loop] | 1.1242 [msec/sel]

Comments:
 o minimum is 1 msec until 1000 usec and than it
   tries to respect the actual number in usec
 o usec >= 1 sec not allowed

################################################################################

HP-UX 10.20
===========
NUM_OF_LOOPS: 10
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 |  0.1 [msec/loop] | 0.0 [msec/sel]
   500 | 97.6 [msec/loop] | 9.8 [msec/sel]
  1000 | 100.0 [msec/loop] | 10.0 [msec/sel]
  1500 | 100.0 [msec/loop] | 10.0 [msec/sel]
...
 14000 | 100.0 [msec/loop] | 10.0 [msec/sel]
 14500 | 100.2 [msec/loop] | 10.0 [msec/sel]
 15000 | 199.8 [msec/loop] | 20.0 [msec/sel]
 15500 | 200.0 [msec/loop] | 20.0 [msec/sel]
...
 24000 | 200.0 [msec/loop] | 20.0 [msec/sel]
 24500 | 200.0 [msec/loop] | 20.0 [msec/sel]
 25000 | 300.0 [msec/loop] | 30.0 [msec/sel]
 25500 | 300.0 [msec/loop] | 30.0 [msec/sel]
...
980000 | 9800.1 [msec/loop] | 980.0 [msec/sel]
990000 | 9900.0 [msec/loop] | 990.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1000000 |  0.1 [msec/loop] | 0.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1010000 |  0.1 [msec/loop] | 0.0 [msec/sel]

Comments:
 o minimum is 10 msec until 1000 usec
 o after 1000 it rounds down or up to the next 10msec
 o usec >= 1 sec not allowed

################################################################################

HP-UX 11
===========
NUM_OF_LOOPS: 10
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 | 92.7 [msec/loop] | 9.3 [msec/sel]
   500 | 99.9 [msec/loop] | 10.0 [msec/sel]
  1000 | 99.8 [msec/loop] | 10.0 [msec/sel]
  1500 | 100.0 [msec/loop] | 10.0 [msec/sel]
...
  9000 | 99.9 [msec/loop] | 10.0 [msec/sel]
  9500 | 100.1 [msec/loop] | 10.0 [msec/sel]
 10000 | 199.9 [msec/loop] | 20.0 [msec/sel]
 10500 | 199.9 [msec/loop] | 20.0 [msec/sel]
...
 19000 | 199.9 [msec/loop] | 20.0 [msec/sel]
 19500 | 200.0 [msec/loop] | 20.0 [msec/sel]
 20000 | 300.1 [msec/loop] | 30.0 [msec/sel]
 20500 | 309.9 [msec/loop] | 31.0 [msec/sel]
 21000 | 299.9 [msec/loop] | 30.0 [msec/sel]
...
980000 | 9900.2 [msec/loop] | 990.0 [msec/sel]
990000 | 10000.7 [msec/loop] | 1000.1 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1000000 |  0.1 [msec/loop] | 0.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1010000 |  0.1 [msec/loop] | 0.0 [msec/sel]

Comments:
 o minimum is 10 msec until 1000 usec
 o after 1000 it rounds up to the next 10 msec on each 10000 usec
 o usec >= 1 sec not allowed
 o almost same behavior than Solaris 2.7, but breakeven is handled differently

################################################################################

Solaris 2.7
===========
NUM_OF_LOOPS: 10
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 |  0.0 [msec/loop] | 0.0 [msec/sel]
   500 | 92.7 [msec/loop] | 9.3 [msec/sel]
  1000 | 99.9 [msec/loop] | 10.0 [msec/sel]
  1500 | 100.0 [msec/loop] | 10.0 [msec/sel]
...
  9500 | 100.0 [msec/loop] | 10.0 [msec/sel]
 10000 | 100.0 [msec/loop] | 10.0 [msec/sel]
 10500 | 200.0 [msec/loop] | 20.0 [msec/sel]
 11000 | 200.0 [msec/loop] | 20.0 [msec/sel]
...
 19500 | 200.0 [msec/loop] | 20.0 [msec/sel]
 20000 | 200.0 [msec/loop] | 20.0 [msec/sel]
 20500 | 300.0 [msec/loop] | 30.0 [msec/sel]
 21000 | 300.0 [msec/loop] | 30.0 [msec/sel]
...
980000 | 9800.2 [msec/loop] | 980.0 [msec/sel]
990000 | 9900.1 [msec/loop] | 990.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1000000 |  0.0 [msec/loop] | 0.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1010000 |  0.0 [msec/loop] | 0.0 [msec/sel]

Comments:
 o minimum is 10 msec until 1000 usec
 o after 1000 it rounds up to the next 10 msec on each 10000 usec excluding the 10000
 o usec >= 1 sec not allowed
 o almost same behavior than HP-UX 11, but breakeven is handled differently

################################################################################

Solaris 2.8 (4-CPU)
===================
NUM_OF_LOOPS: 100
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 |  0.2 [msec/loop] | 0.0 [msec/sel]
   100 | 999.5 [msec/loop] | 10.0 [msec/sel]
   200 | 1000.2 [msec/loop] | 10.0 [msec/sel]
   300 | 1000.1 [msec/loop] | 10.0 [msec/sel]
...
  8900 | 1000.2 [msec/loop] | 10.0 [msec/sel]
  9000 | 1000.2 [msec/loop] | 10.0 [msec/sel]
  9100 | 1650.2 [msec/loop] | 16.5 [msec/sel]
  9200 | 1640.3 [msec/loop] | 16.4 [msec/sel]
  9300 | 1650.2 [msec/loop] | 16.5 [msec/sel]
  9400 | 1720.3 [msec/loop] | 17.2 [msec/sel]
  9500 | 1730.3 [msec/loop] | 17.3 [msec/sel]
  9600 | 1630.2 [msec/loop] | 16.3 [msec/sel]
  9700 | 1680.3 [msec/loop] | 16.8 [msec/sel]
  9800 | 1690.3 [msec/loop] | 16.9 [msec/sel]
  9900 | 1660.2 [msec/loop] | 16.6 [msec/sel]
 10000 | 1690.3 [msec/loop] | 16.9 [msec/sel]
 10100 | 2000.3 [msec/loop] | 20.0 [msec/sel]
 10200 | 2000.3 [msec/loop] | 20.0 [msec/sel]
...
 18900 | 2000.3 [msec/loop] | 20.0 [msec/sel]
 19000 | 2000.3 [msec/loop] | 20.0 [msec/sel]
 19100 | 2760.5 [msec/loop] | 27.6 [msec/sel]
 19200 | 2760.4 [msec/loop] | 27.6 [msec/sel]
 19300 | 2760.4 [msec/loop] | 27.6 [msec/sel]
 19400 | 2700.4 [msec/loop] | 27.0 [msec/sel]
 19500 | 2790.5 [msec/loop] | 27.9 [msec/sel]
 19600 | 2740.4 [msec/loop] | 27.4 [msec/sel]
 19700 | 2730.4 [msec/loop] | 27.3 [msec/sel]
 19800 | 2730.4 [msec/loop] | 27.3 [msec/sel]
 19900 | 2700.4 [msec/loop] | 27.0 [msec/sel]
 20000 | 2770.4 [msec/loop] | 27.7 [msec/sel]
 20100 | 3000.5 [msec/loop] | 30.0 [msec/sel]
 20200 | 3000.5 [msec/loop] | 30.0 [msec/sel]
...
980000 | 9800.2 [msec/loop] | 980.0 [msec/sel]
990000 | 9900.1 [msec/loop] | 990.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1000000 |  0.0 [msec/loop] | 0.0 [msec/sel]
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1 1010000 |  0.0 [msec/loop] | 0.0 [msec/sel]


Comments:
 o minimum is 10 msec until 1000 usec
 o after 1000 it rounds up to the next 10 msec on each 10000 usec excluding the 10000
 o usec >= 1 sec not allowed
 o the range from x9000-x0100 is kind of "in between" ??
   different to Solaris 2.7 !!
   I'm not sure if this behavior is coming from the fact, that
   its a 4-CPU machine. On the other hand I got the Solaris 2.7
   results consistently on a 1-CPU and a 2-CPU machine

################################################################################

Cygwin (WinNT)
==============
NUM_OF_LOOPS: 10
 Delay |   elapsed Time    |   actual Wait
---------------------------------------------------
     0 |  0.0 [msec/loop] | 0.0 [msec/sel]
   500 | 100.0 [msec/loop] | 10.0 [msec/sel]
  1000 | 100.0 [msec/loop] | 10.0 [msec/sel]
...
 10000 | 100.0 [msec/loop] | 10.0 [msec/sel]
 10500 | 100.0 [msec/loop] | 10.0 [msec/sel]
 11000 | 200.0 [msec/loop] | 20.0 [msec/sel]
 11500 | 201.0 [msec/loop] | 20.1 [msec/sel]
...
 20000 | 200.0 [msec/loop] | 20.0 [msec/sel]
 20500 | 201.0 [msec/loop] | 20.1 [msec/sel]
 21000 | 300.0 [msec/loop] | 30.0 [msec/sel]
 21500 | 301.0 [msec/loop] | 30.1 [msec/sel]
...
980000 | 9814.0 [msec/loop] | 981.4 [msec/sel]
990000 | 9914.0 [msec/loop] | 991.4 [msec/sel]
1000000 | 10014.0 [msec/loop] | 1001.4 [msec/sel]
1010000 | 10115.0 [msec/loop] | 1011.5 [msec/sel]


Comments:
 o minimum is 10 msec until 1000 usec
 o after 1000 it rounds up to the next 10 msec on each 11000 usec including the 11000
 o usec >= 1 sec IS ALLOWED

################################################################################


pgsql-hackers by date:

Previous
From: Matthew Kirkwood
Date:
Subject: Re: mmap for zeroing WAL log
Next
From: Tatsuo Ishii
Date:
Subject: Re: Idea for reducing planning time