Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6 - Mailing list pgsql-hackers
From | dg@illustra.com (David Gould) |
---|---|
Subject | Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6 |
Date | |
Msg-id | 9803201915.AA09552@hawk.illustra.com Whole thread Raw |
In response to | Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6 (Massimo Dal Zotto <dz@cs.unitn.it>) |
List | pgsql-hackers |
Massimo Dal Zotto writes: > > David, go for it. The code is all local in two files, and I think you > > can basically change all the do{test-and-set} while(lock-is-false) > > loops to: > > > > do{test-and-set} while(lock-is-false && select ()) > > > > Pretty easy. No need to test multiple platforms. The ones where the > > loop is integrated into the asm(), leave them for later. > > > > -- > > Bruce Momjian | 830 Blythe Avenue > > I'am against a generic patch using select(). If we have sched_yield() on an > architecture I don't see why dont't use it. Here is the patch for Linux. > It has been tested for two months by 100 users without any problem. > The only thing I would add is a more general configuration test in configure > to include the proper include files. > > *** src/include/storage/s_lock.h.orig Sat Oct 18 22:39:21 1997 > --- src/include/storage/s_lock.h Wed Nov 19 23:11:14 1997 > *************** > *** 294,300 **** > --- 294,314 ---- > */ > > #if defined(NEED_I386_TAS_ASM) > + #include <unistd.h> > + #include <sched.h> > > + #ifdef _POSIX_PRIORITY_SCHEDULING > + #define S_LOCK(lock) do \ > + { \ > + slock_t _res; \ > + do \ > + { \ > + __asm__("xchgb %0,%1": "=q"(_res), \ > + "=m"(*lock):"0"(0x1)); \ > + if (_res) sched_yield(); \ > + } while (_res != 0); \ > + } while (0) > + #else > #define S_LOCK(lock) do \ > { \ > slock_t _res; \ > *************** > *** 303,308 **** > --- 317,323 ---- > __asm__("xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \ > } while (_res != 0); \ > } while (0) > + #endif > > #define S_UNLOCK(lock) (*(lock) = 0) > > +----------------------------------------------------------------------+ > | Massimo Dal Zotto e-mail: dz@cs.unitn.it | > | Via Marconi, 141 phone: ++39-461-534251 | > | 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ | > | Italy pgp: finger dz@tango.cs.unitn.it | > +----------------------------------------------------------------------+ I am perfectly happy to use sched_yield() on Linux. My goal however is to make the concept work on all platforms. There was a recent thread on comp.os.linux.system on context switch times of Linux vs NTthat revealed that sched_yield() is fairly costly as it causes the scheduler to do a full scan of the process table and recalculate the the priorities of all processes. Probably not a problem, but it should be it should probably be benchmarked both ways. Finally, even though this appears to work, there is a possible stability problem with both approaches. Here is some of the discussion I had about that with Bruce: ----------------------------------------------------------------------- David: > Right now, there is not much chance of catching a signal while waiting for > a spinlock. This is good, cause the code that waits for spinlocks tends to > be doing funny things with buffers and locks and shared stuff like that. > We don't catch signals because we don't make syscalls. But, once this goes in, > we will be calling select() a _lot_ and so it kinda becomes the likely place > for a signal to get delivered. Without thinking about it and looking at the > code a bit longer, I am not sure it is prudent to rush this in. I still want > it in as soon a possible, but possible includes free from harmful side effects. Bruce: > Well, signals are handled in the backend by tcop/postgres.c. In > most/all? cases, a signal causes a longjump() out of the code and > releases locks and aborts the transaction. David: > I was afraid that would be the answer. Basically, this never worked. The > problem is that in an indeterminate number of places the code manipulates > multiple distinct but related structures in a sequence. To leave the server > in a consistant state all these updates need to be done in the right order > so that if the sequence in interrupted by a signal the partially updated > state is consistant. Unhappily, the original coders did not always have this > in mind. An example: > > cleaning up after a scan > 1 - release buffer pointed by scan descriptor > 2 - release scan descriptor > > If a signal is taken just after one, the abort code will see the scan > descriptor and call the cleanup for it resulting in: > > cleaning up after a scan (take 2) > 1 - release buffer pointed by scan descriptor > - Whoopsie, buffer already released! > 2 - release scan descriptor > > These sequences either _all_ have to identified and fixed, or made atomic > somehow, which is a biggish job. > > Or the system has to acknowledge signals at only clearly defined points. > My preference is to have signal handlers only set a global flag to indicate > that a signal was seen. Then one just sprinkles tests of the flag in > all the likely places: step to next node, fetch next page, call function, etc. > > The way this shows up in real life is strange unreproduceable errors on busy > systems, especially when backends are killed or transactions are forced to > abort. > > Fixing this is a bit of a project, but the result is that a lot of mystery > bugs go away. -------------------------------------------------------------------------- -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - I realize now that irony has no place in business communications.
pgsql-hackers by date: