Thread: --enable-thread-safety bug

--enable-thread-safety bug

From
Steve Clark
Date:
Hello List,

I am running 8.3.1 on FreeBSD 6.2 patch-7.

The ports for Freebsd turn on --enable-thread-safety during configure
of pg.

When running my app after some time I have been getting a core dump -
sig 11.

#0  0x28333b96 in memcpy () from /lib/libc.so.6
(gdb) bt
#0  0x28333b96 in memcpy () from /lib/libc.so.6
#1  0x280d0122 in ecpg_init_sqlca (sqlca=0x0) at misc.c:100
#2  0x280d0264 in ECPGget_sqlca () at misc.c:145
#3  0x280d056c in ecpg_log (
     format=0x280d1d78 "free_params line %d: parameter %d = %s\n") at
misc.c:243
#4  0x280c9758 in free_params (paramValues=0x836fe00, nParams=104,
print=1 '\001',
     lineno=3303) at execute.c:1045
#5  0x280c9f08 in ecpg_execute (stmt=0xa726f00) at execute.c:1298
#6  0x280ca978 in ECPGdo (lineno=3303, compat=0, force_indicator=1,
     connection_name=0x0, questionmarks=0 '\0', st=0,
     query=0x806023c "update T_UNIT_STATUS_LOG set ip_address  =  $1
:: inet   , last_ip_address  =  $2  :: inet   , unit_date  =  $3  ::
timestamp with time zone  , unit_raw_time  =  $4  , status_date  = now
() , unit_ac"...) at execute.c:1636
#7  0x08057a46 in UpdateTUSL (pCachedUnit=0x807b680, msg=0xbfbf8850 "",
     p_threshold=80, p_actualIP=0xbfbfe880 "24.39.85.226")
     at srm2_monitor_db.pgc:3303
#8  0x0804f174 in main (argc=3, argv=0xbfbf7fc0) at
srm2_monitor_server.c:3265
(gdb) f 2
#2  0x280d0264 in ECPGget_sqlca () at misc.c:145
145                     ecpg_init_sqlca(sqlca);
(gdb) p sqlca
$1 = (struct sqlca_t *) 0x0

in looking in the code in misc.c

I see:

struct sqlca_t *
ECPGget_sqlca(void)
{
#ifdef ENABLE_THREAD_SAFETY
    struct sqlca_t *sqlca;

    pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);

    sqlca = pthread_getspecific(sqlca_key);
    if (sqlca == NULL)
    {
        sqlca = malloc(sizeof(struct sqlca_t));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        ecpg_init_sqlca(sqlca);
        pthread_setspecific(sqlca_key, sqlca);
    }
    return (sqlca);
#else
    return (&sqlca);
#endif
}

The return from malloc should be checked to make sure it succeeds -
right???

Steve

Re: --enable-thread-safety bug

From
Tom Lane
Date:
Steve Clark <sclark@netwolves.com> writes:
> The return from malloc should be checked to make sure it succeeds -
> right???

Probably, but what do you expect the code to do if it doesn't succeed?
This function seems not to have any defined error-return convention.

            regards, tom lane

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Tom Lane wrote:
> Steve Clark <sclark@netwolves.com> writes:
>
>>The return from malloc should be checked to make sure it succeeds -
>>right???
>
>
> Probably, but what do you expect the code to do if it doesn't succeed?
> This function seems not to have any defined error-return convention.
>
>             regards, tom lane
>
>
Retry - the malloc - maybe there is a memory leak when
--enable-thread-saftey is enabled,
send an out of memory message to the postgres log, abort the
transaction - I don't know I am
not a postgres developer so I don't know all the issues. I all I know
as a user having a program
like postgres just sig 11 is unacceptable! As a commercial developer
of software for over 30 years
I would never just do nothing.

My $.02
Steve

Re: --enable-thread-safety bug

From
Martijn van Oosterhout
Date:
On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote:
> Retry - the malloc - maybe there is a memory leak when
> --enable-thread-saftey is enabled,
> send an out of memory message to the postgres log, abort the
> transaction - I don't know I am
> not a postgres developer so I don't know all the issues. I all I know
> as a user having a program
> like postgres just sig 11 is unacceptable! As a commercial developer
> of software for over 30 years
> I would never just do nothing.

Note this is your in application, not the server. Only your program
died. Ofcourse the transaction got aborted, since the client (you)
disconnected. There is no way for this to write to the server log,
since it may be one another machine...

As to the issue at hand: it looks like your program ran out of memory.
Can you confirm the memory was running low? Even if it handled it by
returning NULL, the caller will die because it also needs memory.

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: --enable-thread-safety bug

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Note this is your in application, not the server. Only your program
> died. Ofcourse the transaction got aborted, since the client (you)
> disconnected. There is no way for this to write to the server log,
> since it may be one another machine...

Right.  And note that if we don't have enough memory for the struct
that was requested, we *certainly* don't have enough to do anything
interesting.  We could try

    fprintf(stderr, "out of memory\n");
    exit(1);

but even that I would give only about 50-50 odds of success; and more
to the point, how is this any better for an application than a core
dump?  It's still summary termination.

> Do you create and destroy a lot of threads since it seems this memory
> won't be freed?

The OP's program isn't threaded at all, since he was apparently running
with a non-threaded ecpg/libpq before.  This means that the proposal of
looping till someone else frees memory is at least as silly as allowing
the core dump to happen.

            regards, tom lane

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Martijn van Oosterhout wrote:
> On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote:
>
>>Retry - the malloc - maybe there is a memory leak when
>>--enable-thread-saftey is enabled,
>>send an out of memory message to the postgres log, abort the
>>transaction - I don't know I am
>>not a postgres developer so I don't know all the issues. I all I know
>>as a user having a program
>>like postgres just sig 11 is unacceptable! As a commercial developer
>>of software for over 30 years
>>I would never just do nothing.
>
>
> Note this is your in application, not the server. Only your program
> died. Ofcourse the transaction got aborted, since the client (you)
> disconnected. There is no way for this to write to the server log,
> since it may be one another machine...
>
> As to the issue at hand: it looks like your program ran out of memory.
> Can you confirm the memory was running low? Even if it handled it by
> returning NULL, the caller will die because it also needs memory.
>
> Do you create and destroy a lot of threads since it seems this memory
> won't be freed?
>
> Have a nice day,
My program had no threads - as I pointed out if I change the default
Makefile in the FreeBSD ports
system to not enable thread safety my programs runs just fine for days
on end. It appears to me
without any kind of close examination that there is a memory leak in
the ecpg library when enable
thread safety is turned on.

I had an earlier problem in 8.2.6 where if enable-thread-safety was
turned on sqlca would always be zero
no matter if there was an error or not.

This appears to me to be a problem in the ecpg library when thread
safety is enabled.

Have a nice day.

Steve

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>
>>Note this is your in application, not the server. Only your program
>>died. Ofcourse the transaction got aborted, since the client (you)
>>disconnected. There is no way for this to write to the server log,
>>since it may be one another machine...
>
>
> Right.  And note that if we don't have enough memory for the struct
> that was requested, we *certainly* don't have enough to do anything
> interesting.  We could try
>
>     fprintf(stderr, "out of memory\n");
>     exit(1);
>
> but even that I would give only about 50-50 odds of success; and more
> to the point, how is this any better for an application than a core
> dump?  It's still summary termination.
>
>
>>Do you create and destroy a lot of threads since it seems this memory
>>won't be freed?
>
>
> The OP's program isn't threaded at all, since he was apparently running
> with a non-threaded ecpg/libpq before.  This means that the proposal of
> looping till someone else frees memory is at least as silly as allowing
> the core dump to happen.
>
>             regards, tom lane
>
>
I guess the real question is why we are running out of memory when
this option is enabled.
Since my app doesn't use threads that points to a memory leak in the
ecpg library when enable thread
safety is turned on.


Steve

Re: --enable-thread-safety bug

From
Martijn van Oosterhout
Date:
On Sat, Mar 22, 2008 at 12:42:51PM -0400, Tom Lane wrote:
> > Do you create and destroy a lot of threads since it seems this memory
> > won't be freed?
>
> The OP's program isn't threaded at all, since he was apparently running
> with a non-threaded ecpg/libpq before.  This means that the proposal of
> looping till someone else frees memory is at least as silly as allowing
> the core dump to happen.

I found an old report where someone found that the get/setspecific
wasn't working and it was allocating a new version of the structure
each time.

http://www.mail-archive.com/pgsql-general@postgresql.org/msg42918.html

That was on Solaris though. It would be instructive to test that by
calling that function multiple times successivly and ensure it's
returning the same addess each time.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: --enable-thread-safety bug

From
Michael Meskes
Date:
On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:
> My program had no threads - as I pointed out if I change the default
> Makefile in the FreeBSD ports
> system to not enable thread safety my programs runs just fine for days
> on end. It appears to me
> without any kind of close examination that there is a memory leak in the
> ecpg library when enable
> thread safety is turned on.

There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
how the program manages to spend so much time allocating memory to eat
all of it. Could you give us some more info about your source code? Do
you use descriptors? Auto allocating?

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Michael Meskes wrote:
> On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:
>
>>My program had no threads - as I pointed out if I change the default
>>Makefile in the FreeBSD ports
>>system to not enable thread safety my programs runs just fine for days
>>on end. It appears to me
>>without any kind of close examination that there is a memory leak in the
>>ecpg library when enable
>>thread safety is turned on.
>
>
> There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
> how the program manages to spend so much time allocating memory to eat
> all of it. Could you give us some more info about your source code? Do
> you use descriptors? Auto allocating?
>
> Michael

Hi Michael,

Not exactly sure what you are asking about - descriptors and auto
allocating.

The program processes about 800000 packets a day, which can update
several tables.
It runs continously reading udp packets from systems at remote
locations coming in over the internet.

It has a global
exec sql include sqlca;

then a number of functions that get called with each function having
it own

xxx( args,... )
{
EXEC SQL BEGIN DECLARE SECTION;
a bunch of variable
EXEC SQL END DECLARE SECTION;

with various EXEC SQL inserts, updates and selects.
with checks of sqlca.sqlcode to determine if the sql statement succeeded.

}

Steve

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Steve Clark wrote:
> Michael Meskes wrote:
>
>>On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:
>>
>>
>>>My program had no threads - as I pointed out if I change the default
>>>Makefile in the FreeBSD ports
>>>system to not enable thread safety my programs runs just fine for days
>>>on end. It appears to me
>>>without any kind of close examination that there is a memory leak in the
>>>ecpg library when enable
>>>thread safety is turned on.
>>
>>
>>There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
>>how the program manages to spend so much time allocating memory to eat
>>all of it. Could you give us some more info about your source code? Do
>>you use descriptors? Auto allocating?
>>
>>Michael
>
>
> Hi Michael,
>
> Not exactly sure what you are asking about - descriptors and auto
> allocating.
>
> The program processes about 800000 packets a day, which can update
> several tables.
> It runs continously reading udp packets from systems at remote
> locations coming in over the internet.
>
> It has a global
> exec sql include sqlca;
>
> then a number of functions that get called with each function having
> it own
>
> xxx( args,... )
> {
> EXEC SQL BEGIN DECLARE SECTION;
> a bunch of variable
> EXEC SQL END DECLARE SECTION;
>
> with various EXEC SQL inserts, updates and selects.
> with checks of sqlca.sqlcode to determine if the sql statement succeeded.
>
> }
>
> Steve
>
to further illustrate our code below is a typical exec sql statement:
     exec sql insert into t_unit_event_log
            (event_log_no,
             unit_serial_no,
             event_type,
             event_category,
             event_mesg,
             event_severity,
             event_status,
             event_ref_log_no,
             event_logged_by,
             event_date,
             alarm,
             last_updated_by,
             last_updated_date)
     values (nextval('seq_event_log_no'),
             :h_serial_no,
             'ALERT',
             :h_category,
             :h_mesg,
             :h_sev,
             3,
             NULL,
             current_user,
             now(),
             :h_alarm,
             current_user,
             now());

     if (sqlca.sqlcode != 0)

     {
         VARLOG(INFO, LOG_LEVEL_DBG4, "could not insert into
T_UNIT_EVENT_LOG\n");
         VARLOG(INFO, LOG_LEVEL_DBG4, "insertTUEL returns %d\n", ret);
         return ret;
     }


Re: --enable-thread-safety bug

From
Craig Ringer
Date:
Steve Clark wrote:

> I guess the real question is why we are running out of memory when
> this option is enabled.
> Since my app doesn't use threads that points to a memory leak in the
> ecpg library when enable thread
> safety is turned on.
>
It might be worth building ecpg with debug symbols then running your
app, linked to that ecpg, under Valgrind. If you are able to produce
more specific information about how the leak occurs in the context of
your application people here may be more able to help you.

--
Craig Ringer


Re: --enable-thread-safety bug

From
Steve Clark
Date:
Craig Ringer wrote:
> Steve Clark wrote:
>
>
>>I guess the real question is why we are running out of memory when
>>this option is enabled.
>>Since my app doesn't use threads that points to a memory leak in the
>>ecpg library when enable thread
>>safety is turned on.
>>
>
> It might be worth building ecpg with debug symbols then running your
> app, linked to that ecpg, under Valgrind. If you are able to produce
> more specific information about how the leak occurs in the context of
> your application people here may be more able to help you.
>
> --
> Craig Ringer
>
>

Hi Craig,

I could do that - but in my situation I am not using threads so I
really don't need --enable-thread-safety
turned on. The freebsd ports maintainer for postgresql decided
everybody should have it whether they
needed it or not. I simply deleted the option from the freebsd
makefile rebuilt the port - relinked my app
and no more problem. I just thought the postgresql developers would
want to know there was a bug. If
they don't care to investigate or trouble shoot the bug it is fine by me.

I just find it is interesting that a non-threaded program causes a
memory leak when used with postgres
libraries that are compiled with --enable-thread-safety - doesn't seem
to safe to me.

Have a nice day.

Steve

Re: --enable-thread-safety bug

From
Tom Lane
Date:
Steve Clark <sclark@netwolves.com> writes:
> I could do that - but in my situation I am not using threads so I
> really don't need --enable-thread-safety
> turned on. The freebsd ports maintainer for postgresql decided
> everybody should have it whether they
> needed it or not. I simply deleted the option from the freebsd
> makefile rebuilt the port - relinked my app
> and no more problem. I just thought the postgresql developers would
> want to know there was a bug. If
> they don't care to investigate or trouble shoot the bug it is fine by me.

I don't think you grasp the situation, Steve.  Having
enable-thread-safety turned on is standard across a wide swath of the
world, and yet nobody else has reported severe memory leaks in ecpg.
So there's something very specific to what your app is doing that
triggers the problem.  There's little point in anyone else investigating
unless you can give them a test case that reproduces the misbehavior.

I can assure you we would like to fix the problem if we can find it.
But with no cooperation from you, we'll just have to wait until someone
else stumbles across it and can show us exactly how to make it happen.

            regards, tom lane

Re: --enable-thread-safety bug

From
Michael Meskes
Date:
On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote:
> Not exactly sure what you are asking about - descriptors and auto
> allocating.

So I guess you don't use either feature. :-)

> The program processes about 800000 packets a day, which can update
> several tables.
> It runs continously reading udp packets from systems at remote locations
> coming in over the internet.

But the code for processing all thoss statements is the same, with and
without threading enabled.

One code that differs is allocation of sqlca, but given that this
structure has a mere 215 bytes (about). Even if it was allocated 800000
times it would make up for a memory loss of about 164MB. Which brings up
the question how long the application runs until it segfaults.

As Tom already pointed out, without more information there simply is no
way for us to find out what's going on. We are more than willing to dig
into it, but we need more to be able to.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

Re: --enable-thread-safety bug

From
Steve Clark
Date:
Michael Meskes wrote:
> On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote:
>
>>Not exactly sure what you are asking about - descriptors and auto
>>allocating.
>
>
> So I guess you don't use either feature. :-)
>
>
>>The program processes about 800000 packets a day, which can update
>>several tables.
>>It runs continously reading udp packets from systems at remote locations
>>coming in over the internet.
>
>
> But the code for processing all thoss statements is the same, with and
> without threading enabled.
>
> One code that differs is allocation of sqlca, but given that this
> structure has a mere 215 bytes (about). Even if it was allocated 800000
> times it would make up for a memory loss of about 164MB. Which brings up
> the question how long the application runs until it segfaults.
>
> As Tom already pointed out, without more information there simply is no
> way for us to find out what's going on. We are more than willing to dig
> into it, but we need more to be able to.
>
> Michael

Ok I tryed valgrind and after a while it dies with a valgrind
assertion error before providing any
useful data.

So I tried linking with -lc_r and it appears to have stopped the leak.
Without -lc_r
using "top" my app quickly climbed over 150mbyte in memory size - it
is now staying steady
at about 8mb - which is about what it ran when I compiled the ecpg lib
without --enable-thread-safety
enabled.

Now why does this make a difference in ecpg?

HTH,
Steve

If anyone cares below is the valgrind assertion failure:
valgrind: vg_malloc2.c:1008 (vgPlain_arena_malloc): Assertion `new_sb
!= ((void*)0)' failed.
==4166==    at 0xB802BE1F: (within /usr/local/lib/valgrind/stage2)
==4166==    by 0xB802BE1E: (within /usr/local/lib/valgrind/stage2)
==4166==    by 0xB802BE5D: vgPlain_core_assert_fail (in
/usr/local/lib/valgrind/stage2)
==4166==    by 0xB8028091: vgPlain_arena_malloc (in
/usr/local/lib/valgrind/stage2)

sched status:

Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0
==4166==    at 0x3C03894B: calloc (in
/usr/local/lib/valgrind/vgpreload_memcheck.so)


Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.

If that doesn't help, please report this bug to: valgrind.kde.org

In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using.  Thanks.