Thread: Re: PG Seg Faults Performing a Query

Re: PG Seg Faults Performing a Query

From
Tom Lane
Date:
Bill Thoen <bthoen@gisnet.com> writes:
> (gdb) bt
> #0  0x0000003054264571 in fputc () from /lib64/libc.so.6
> #1  0x000000000040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0,
>     cells=0x2aaaaf8fc010, footers=0x557c90,
>     opt_align=0x557ef0 'l' <repeats 18 times>, "rr", 'l' <repeats 12
> times>, "rl lllllll", opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0',
> opt_border=1,
>     encoding=8, fout=0x0) at print.c:448
> #2  0x000000000040f0eb in printTable (title=0x0, headers=0x5665d0,
>     cells=0x2aaaaf8fc010, footers=0x557c90,
>     align=0x557ef0 'l' <repeats 18 times>, "rr", 'l' <repeats 12 times>,
> "rlllll lll", opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at
> print.c:1551

OK, so the problem is that print_aligned_text is being passed fout = NULL.
Since that wasn't what was passed to printTable, the conclusion must be
that PageOutput() was called and returned NULL --- that is, that its
popen() call failed.  Obviously we should put in some sort of check for
that.  I can see three reasonable responses: either make psql abort
entirely (akin to its out-of-memory behavior), or have it fall back to
not using the pager, either silently or after printing an error
message.  Any thoughts which way to jump?

Meanwhile, the question Bill needs to look into is why popen() is
failing for him.  I'm guessing it's a fork() failure at bottom, but
why so consistent?  strace'ing the psql run might provide some more
info.

            regards, tom lane

Re: PG Seg Faults Performing a Query

From
Bill Thoen
Date:
I'm a bit out of my depth with using these debugging tools and
interpreting their results, but I think the problem is due to the output
being just too big for interactive display. Using the same query with
tighter limits in the WHERE clause works perfectly. When I changed the
SQL script to write output into a table it worked with the same query
using even looser limits in the WHERE clause. So sending output to a
table instead of to the monitor when the queries produce a large amount
of output is reliable, faster and doesn't tie up the machine.

I tried using strace, but it produced so much telemetry and
unfortunately I couldn't understand it anyway that I don't think this
would do me any good. I don't want to bug the PostgreSQL list with a
problem that's probably not a PostgreSQL one, but if someone here would
be willing to help me track down this apparent popen or fork problem I'd
appreciate it. However, I managed to get the results I needed, so we
could also call this "fixed via workaround."

Thanks for the help, Tom and others!
- Bill Thoen

Tom Lane wrote:
> Bill Thoen <bthoen@gisnet.com> writes:
>
>> (gdb) bt
>> #0  0x0000003054264571 in fputc () from /lib64/libc.so.6
>> #1  0x000000000040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0,
>>     cells=0x2aaaaf8fc010, footers=0x557c90,
>>     opt_align=0x557ef0 'l' <repeats 18 times>, "rr", 'l' <repeats 12
>> times>, "rl lllllll", opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0',
>> opt_border=1,
>>     encoding=8, fout=0x0) at print.c:448
>> #2  0x000000000040f0eb in printTable (title=0x0, headers=0x5665d0,
>>     cells=0x2aaaaf8fc010, footers=0x557c90,
>>     align=0x557ef0 'l' <repeats 18 times>, "rr", 'l' <repeats 12 times>,
>> "rlllll lll", opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at
>> print.c:1551
>>
>
> OK, so the problem is that print_aligned_text is being passed fout = NULL.
> Since that wasn't what was passed to printTable, the conclusion must be
> that PageOutput() was called and returned NULL --- that is, that its
> popen() call failed.  Obviously we should put in some sort of check for
> that.  I can see three reasonable responses: either make psql abort
> entirely (akin to its out-of-memory behavior), or have it fall back to
> not using the pager, either silently or after printing an error
> message.  Any thoughts which way to jump?
>
> Meanwhile, the question Bill needs to look into is why popen() is
> failing for him.  I'm guessing it's a fork() failure at bottom, but
> why so consistent?  strace'ing the psql run might provide some more
> info.
>
>             regards, tom lane
>
>


Re: PG Seg Faults Performing a Query

From
Tom Lane
Date:
Bill Thoen <bthoen@gisnet.com> writes:
> I'm a bit out of my depth with using these debugging tools and
> interpreting their results, but I think the problem is due to the output
> being just too big for interactive display.

Well, I can certainly believe it's related to the amount of data
involved, but the exact relationship is far from clear.  popen()
doesn't do any actual data-pushing, it just sets up a pipe and forks
a child process --- so even if the child fails immediately after being
forked, that wouldn't lead to the problem seen here.  The rarity of
a failure here explains why we hadn't noticed the lack of error checking
long ago.

What I suppose is that you are running into some system-wide resource
constraint.  Exactly which one, and whether it's easy to fix, remain to
be seen.

> I tried using strace, but it produced so much telemetry and
> unfortunately I couldn't understand it anyway that I don't think this
> would do me any good.

Sorry, I should have said: the last few dozen lines before the crash are
all that will be interesting.

            regards, tom lane