Re: [HACKERS] FreeBSD problem under heavy load - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] FreeBSD problem under heavy load
Date
Msg-id 19991210145658J.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] FreeBSD problem under heavy load  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] FreeBSD problem under heavy load
Re: [HACKERS] FreeBSD problem under heavy load
List pgsql-hackers
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I seem to run into a serious problem. With 6.5.x + FreeBSD 3.2, I get
> > a core under heavy load (16 or more concurrent users). FreeBSD 2.2.x
> > seems more stable but soon or later same thing happens. Examing a
> > core, I found it segfaulted in hash_search().
> 
> I've been looking into this without much success.  I cannot reproduce it
> here under HPUX --- I ran pgbench for several hours without seeing any
> problem.  I also made another pass over the dynahash.c code looking for
> portability bugs, but didn't find anything that looked promising.  (The
> code is ugly and fragile, but AFAICT it will work under existing usage
> patterns.)  It's quite possible the problem is elsewhere and dynahash is
> just on the receiving end of a memory clobber ... but if so, we have
> very little to go on in guessing where to look.
> 
> Can anyone else reproduce the problem?  Does anything show up in the
> postmaster log at or just before the crash?
> 
>             regards, tom lane

I think I got it. in storage/lmgr/lock.c:WaitOnLock:
char        old_status[64],            new_status[64];    :    :
strcpy(old_status, PS_STATUS);strcpy(new_status, PS_STATUS);strcat(new_status, " waiting");PS_SET_STATUS(new_status);
:    :PS_SET_STATUS(old_status);
 

The current status string is copied into old_status, then the pointer
to it is set to a gloable variable ps_status by PS_SET_STATUS
macro. Unfortunately old_status is on the stack, so once WaitOnLock
returns, ps_status would point to a garbage. In the subsequent call to
WaitOnLock,
strcpy(old_status, PS_STATUS);

will copy garbage string into old_status. So if the string is longer
than 63, the stack would be broken. Note that this would not happen on
Linux due to the difference of the definition of the macro. See
include/utils/ps_status.h for more details.

Also, I don't understand why:
strcpy(new_status, PS_STATUS);strcat(new_status, " waiting");PS_SET_STATUS(new_status);

is necessary. Just:
PS_SET_STATUS("waiting");

should be enough. After doing some tests on my FreeBSD and Linux box,
I will commit fixes to both current and 6.5 source tree.

> PS: pgbench's configure fails on HPUX, because HP's compiler doesn't
> like whitespace before #include.  I modified configure.in like this:
> 
> AC_TRY_LINK([#include <sys/time.h>
> #include <sys/resource.h>],
>         [struct rlimit rlim;

Thanks. I will incorporate your fix.

BTW, I think pgbench is usefull to detect this kind of problems. Can I
put it into contrib or somewhere?
--
Tatsuo Ishii


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: 6.6 release
Next
From: Tom Lane
Date:
Subject: OK, what's this LDREL all about?