Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6 - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6
Date
Msg-id 9803201915.AA09552@hawk.illustra.com
Whole thread Raw
In response to Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6  (Massimo Dal Zotto <dz@cs.unitn.it>)
List pgsql-hackers
Massimo Dal Zotto writes:
> > David, go for it.  The code is all local in two files, and I think you
> > can basically change all the do{test-and-set} while(lock-is-false)
> > loops to:
> >
> >     do{test-and-set} while(lock-is-false && select ())
> >
> > Pretty easy.  No need to test multiple platforms.  The ones where the
> > loop is integrated into the asm(), leave them for later.
> >
> > --
> > Bruce Momjian                          |  830 Blythe Avenue
>
> I'am against a generic patch using select(). If we have sched_yield() on an
> architecture I don't see why dont't use it. Here is the patch for Linux.
> It has been tested for two months by 100 users without any problem.
> The only thing I would add is a more general configuration test in configure
> to include the proper include files.
>
> *** src/include/storage/s_lock.h.orig    Sat Oct 18 22:39:21 1997
> --- src/include/storage/s_lock.h    Wed Nov 19 23:11:14 1997
> ***************
> *** 294,300 ****
> --- 294,314 ----
>    */
>
>   #if defined(NEED_I386_TAS_ASM)
> + #include <unistd.h>
> + #include <sched.h>
>
> + #ifdef _POSIX_PRIORITY_SCHEDULING
> + #define    S_LOCK(lock)    do \
> +                     { \
> +                         slock_t        _res; \
> +                         do \
> +                         { \
> +                             __asm__("xchgb %0,%1": "=q"(_res), \
> +                                     "=m"(*lock):"0"(0x1)); \
> +                             if (_res) sched_yield(); \
> +                         } while (_res != 0); \
> +                     } while (0)
> + #else
>   #define    S_LOCK(lock)    do \
>                       { \
>                           slock_t        _res; \
> ***************
> *** 303,308 ****
> --- 317,323 ----
>               __asm__("xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
>                           } while (_res != 0); \
>                       } while (0)
> + #endif
>
>   #define    S_UNLOCK(lock)    (*(lock) = 0)
>
> +----------------------------------------------------------------------+
> |  Massimo Dal Zotto                e-mail:  dz@cs.unitn.it            |
> |  Via Marconi, 141                 phone:  ++39-461-534251            |
> |  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
> |  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
> +----------------------------------------------------------------------+



I am perfectly happy to use sched_yield() on Linux. My goal however is to
make the concept work on all platforms.

There was a recent thread on comp.os.linux.system on context switch times
of Linux vs NTthat revealed that sched_yield() is fairly costly as it causes
the scheduler to do a full scan of the process table and recalculate the
the priorities of all processes. Probably not a problem, but it should
be it should probably be benchmarked both ways.

Finally, even though this appears to work, there is a possible stability
problem with both approaches. Here is some of the discussion I had about
that with Bruce:

-----------------------------------------------------------------------
David:
> Right now, there is not much chance of catching a signal while waiting for
> a spinlock. This is good, cause the code that waits for spinlocks tends to
> be doing funny things with buffers and locks and shared stuff like that.
> We don't catch signals because we don't make syscalls. But, once this goes in,
> we will be calling select() a _lot_ and so it kinda becomes the likely place
> for a signal to get delivered. Without thinking about it and looking at the
> code a bit longer, I am not sure it is prudent to rush this in. I still want
> it in as soon a possible, but possible includes free from harmful side effects.
Bruce:
> Well, signals are handled in the backend by tcop/postgres.c.  In
> most/all? cases, a signal causes a longjump() out of the code and
> releases locks and aborts the transaction.

David:
> I was afraid that would be the answer. Basically, this never worked. The
> problem is that in an indeterminate number of places the code manipulates
> multiple distinct but related structures in a sequence. To leave the server
> in a consistant state all these updates need to be done in the right order
> so that if the sequence in interrupted by a signal the partially updated
> state is consistant. Unhappily, the original coders did not always have this
> in mind. An example:
>
>  cleaning up after a scan
>   1 - release buffer pointed by scan descriptor
>   2 - release scan descriptor
>
> If a signal is taken just after one, the abort code will see the scan
> descriptor and call the cleanup for it resulting in:
>
> cleaning up after a scan (take 2)
>   1 - release buffer pointed by scan descriptor
>     - Whoopsie, buffer already released!
>   2 - release scan descriptor
>
> These sequences either _all_ have to identified and fixed, or made atomic
> somehow, which is a biggish job.
>
> Or the system has to acknowledge signals at only clearly defined points.
> My preference is to have signal handlers only set a global flag to indicate
> that a signal was seen. Then one just sprinkles tests of the flag in
> all the likely places: step to next node, fetch next page, call function, etc.
>
> The way this shows up in real life is strange unreproduceable errors on busy
> systems, especially when backends are killed or transactions are forced to
> abort.
>
> Fixing this is a bit of a project, but the result is that a lot of mystery
> bugs go away.

--------------------------------------------------------------------------

-dg


David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - I realize now that irony has no place in business communications.


pgsql-hackers by date:

Previous
From: "Maurice Gittens"
Date:
Subject: Re: [HACKERS] Re: Buffer overruns with the Electric Fence debugging library (Solved?)
Next
From: Bruce Momjian
Date:
Subject: Re: Added Having Clause