Thread: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
"Bernt Marius Johnsen"
Date:
The following bug has been logged online:

Bug reference:      5145
Logged by:          Bernt Marius Johnsen
Email address:      bernt.johnsen@sun.com
PostgreSQL version: 8.3.8
Operating system:   Linux 2.6.27-14 (Ubuntu Interpid)
Description:        Complex query with lots of LEFT JOIN causes segfault
Details:

The below query generated by the Random Query Generator
(https://launchpad.net/randgen) causes a segfault. It was caused running

./gentest.pl --dsn=dbi:Pg:user=bernt --gendata --queries=100000 --threads=1
--grammar=/home/bernt/xx.yy

(I'll follow up with a mail with xx.yy as attachment)

Query:

SELECT * from   B AS alias0  LEFT JOIN        BB AS alias1  LEFT JOIN    B
AS alias2  LEFT JOIN     A AS alias3  LEFT JOIN   AA AS alias4  LEFT JOIN  B
AS alias5 ON alias4.int_key = alias5.int_key  ON alias3.int_key =
alias4.int_key   LEFT JOIN    AA AS alias6  LEFT JOIN  A AS alias7 ON
alias6.int_key = alias7.int_key   LEFT JOIN  BB AS alias8 ON alias7.int_key
= alias8.int_key  ON alias3.int_key = alias8.int_key   LEFT JOIN  AA AS
alias9 ON alias6.int_key = alias9.int_key  ON alias2.int_key =
alias8.int_key   LEFT JOIN   BB AS alias10  LEFT JOIN   AA AS alias11  LEFT
JOIN  B AS alias12 ON alias11.int_key = alias12.int_key  ON alias10.int_key
= alias11.int_key  ON alias9.int_key = alias10.int_key  ON alias1.int_key =
alias8.int_key   LEFT JOIN    BB AS alias13  LEFT JOIN      A AS alias14
LEFT JOIN    AA AS alias15  LEFT JOIN  A AS alias16 ON alias15.int_key =
alias16.int_key   LEFT JOIN  B AS alias17 ON alias15.int_key =
alias17.int_key  ON alias14.int_key = alias16.int_key   LEFT JOIN  AA AS
alias18 ON alias14.int_key = alias18.int_key   LEFT JOIN  B AS alias19 ON
alias15.int_key = alias19.int_key   LEFT JOIN  AA AS alias20 ON
alias16.int_key = alias20.int_key  ON alias13.int_key = alias19.int_key
LEFT JOIN  A AS alias21 ON alias13.int_key = alias21.int_key  ON
alias3.int_key = alias17.int_key   LEFT JOIN  B AS alias22 ON alias7.int_key
= alias22.int_key   LEFT JOIN  A AS alias23 ON alias20.int_key =
alias23.int_key   LEFT JOIN  A AS alias24 ON alias14.int_key =
alias24.int_key   LEFT JOIN     BB AS alias25  LEFT JOIN  BB AS alias26 ON
alias25.int_key = alias26.int_key   LEFT JOIN      A AS alias27  LEFT JOIN
A AS alias28 ON alias27.int_key = alias28.int_key   LEFT JOIN   B AS alias29
 LEFT JOIN   BB AS alias30  LEFT JOIN   B AS alias31  LEFT JOIN    A AS
alias32  LEFT JOIN  B AS alias33 ON alias32.int_key = alias33.int_key   LEFT
JOIN  A AS alias34 ON alias32.int_key = alias34.int_key  ON alias31.int_key
= alias33.int_key  ON alias30.int_key = alias33.int_key  ON alias29.int_key
= alias34.int_key  ON alias27.int_key = alias34.int_key   LEFT JOIN   AA AS
alias35  LEFT JOIN  A AS alias36 ON alias35.int_key = alias36.int_key  ON
alias34.int_key = alias36.int_key   LEFT JOIN  A AS alias37 ON
alias33.int_key = alias37.int_key  ON alias25.int_key = alias32.int_key
LEFT JOIN  A AS alias38 ON alias37.int_key = alias38.int_key  ON
alias15.int_key = alias37.int_key  ON alias0.int_key = alias9.int_key

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Euler Taveira de Oliveira
Date:
Bernt Marius Johnsen escreveu:
> The below query generated by the Random Query Generator
> (https://launchpad.net/randgen) causes a segfault. It was caused running
>
Could you get a core dump and post the gdb backtrace?

$ ulimit -c unlimited
$ pg_ctl start
$ psql -c "<my query goes here>" mydb
$ gdb /path/to/postgres $PGDATA/core
(gdb) bt
.
.
.
(gdb) quit


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
"Bernt M. Johnsen"
Date:
>>>>>>>>>>>> Euler Taveira de Oliveira wrote (2009-10-29 12:17:36):
> Bernt Marius Johnsen escreveu:
> > The below query generated by the Random Query Generator
> > (https://launchpad.net/randgen) causes a segfault. It was caused running
> >

Checkout the latest RQG from launchpad and run ./gentest.pl as shown
above (The lastest tarball misses a feature you need). xx.yy is
attached.

Run like this:
./gentest.pl --dsn=dbi:Pg:user=xxxx --gendata --queries=100000 --threads=1 --grammar=/path/to/xx.yy



> Could you get a core dump and post the gdb backtrace?
>
> $ ulimit -c unlimited
> $ pg_ctl start
> $ psql -c "<my query goes here>" mydb
> $ gdb /path/to/postgres $PGDATA/core
> (gdb) bt
> .
> .
> .
> (gdb) quit

We'll see next week If I can spare some time.

>
>
> --
>   Euler Taveira de Oliveira
>   http://www.timbira.com/

--
Bernt Marius Johnsen, Staff Engineer
Database Technology Group, Sun Microsystems, Trondheim, Norway

Attachment

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Bernt M. Johnsen wrote:
>>>>>>>>>>>>> Euler Taveira de Oliveira wrote (2009-10-29 12:17:36):
>> Bernt Marius Johnsen escreveu:
>>> The below query generated by the Random Query Generator
>>> (https://launchpad.net/randgen) causes a segfault. It was caused running
>>>
>
> Checkout the latest RQG from launchpad and run ./gentest.pl as shown
> above (The lastest tarball misses a feature you need). xx.yy is
> attached.
>
> Run like this:
> ./gentest.pl --dsn=dbi:Pg:user=xxxx --gendata --queries=100000 --threads=1 --grammar=/path/to/xx.yy
>
>
>
>> Could you get a core dump and post the gdb backtrace?
>>
>> $ ulimit -c unlimited
>> $ pg_ctl start
>> $ psql -c "<my query goes here>" mydb
>> $ gdb /path/to/postgres $PGDATA/core
>> (gdb) bt
>> .
>> .
>> .
>> (gdb) quit
>
> We'll see next week If I can spare some time.

I can easily reproduce the segfault on 8.4 and 8.5a2:

Program received signal SIGSEGV, Segmentation fault.
ExecHashJoinSaveTuple (tuple=0xb49c8870, hashvalue=3316173823,
fileptr=0x96185a8) at nodeHashjoin.c:775
775        BufFile    *file = *fileptr;
(gdb) bt
#0  ExecHashJoinSaveTuple (tuple=0xb49c8870, hashvalue=3316173823,
fileptr=0x96185a8) at nodeHashjoin.c:775
#1  0x081cf21f in ExecHashJoin (node=0x88c6540) at nodeHashjoin.c:224
#2  0x081bd898 in ExecProcNode (node=0x88c6540) at execProcnode.c:427
#3  0x081bc445 in standard_ExecutorRun (queryDesc=0x875d22c,
direction=ForwardScanDirection, count=0) at execMain.c:1187
#4  0x0828215c in PortalRunSelect (portal=0x879197c, forward=1 '\001',
count=0, dest=0xb4f7bfb8) at pquery.c:953
#5  0x082834be in PortalRun (portal=0x879197c, count=2147483647,
isTopLevel=1 '\001', dest=0xb4f7bfb8, altdest=0xb4f7bfb8,
     completionTag=0xbfe8ff9a "") at pquery.c:807
#6  0x0827f760 in exec_simple_query (
     query_string=0x8751d3c " SELECT * from   B AS alias0  LEFT JOIN
     BB AS alias1  LEFT JOIN    B AS alias2  LEFT JOIN     A AS alias3
LEFT JOIN   AA AS alias4  LEFT JOIN  B AS alias5 ON alias4.int_key =
alias5.int_key  O"...) at postgres.c:1000
#7  0x08280b8e in PostgresMain (argc=2, argv=0x86d7960,
username=0x86d7928 "mastermind") at postgres.c:3573
#8  0x082499be in ServerLoop () at postmaster.c:3366
#9  0x0824a9db in PostmasterMain (argc=3, argv=0x86d5a98) at
postmaster.c:1064
#10 0x081ea466 in main (argc=3, argv=0x86d5a98) at main.c:188


Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> I can easily reproduce the segfault on 8.4 and 8.5a2:

Doesn't crash here ... could we see the specific test data being used,
please?

            regards, tom lane

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Greg Stark
Date:
On Thu, Oct 29, 2009 at 2:14 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> ExecHashJoinSaveTuple (tuple=3D0xb49c8870, hashvalue=3D3316173823,
> fileptr=3D0x96185a8) at nodeHashjoin.c:775
> 775 =A0 =A0 =A0 =A0 =A0 =A0 BufFile =A0 =A0*file =3D *fileptr;
>

That back trace doesn't make much sense. That function only has one
call site and it's called with fileptr set to an expression starting
with &. Ie, there's no way it can get null or an invalid pointer.

--=20
greg

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> I can easily reproduce the segfault on 8.4 and 8.5a2:
>
> Doesn't crash here ... could we see the specific test data being used,
> please?

uploaded a dump of the dataset here:

http://www.kaltenbrunner.cc/files/rand_gen_data.sql

and the query that causes the segfault:

http://www.kaltenbrunner.cc/files/rand_gen_query.sql


Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> uploaded a dump of the dataset here:
> http://www.kaltenbrunner.cc/files/rand_gen_data.sql
> and the query that causes the segfault:
> http://www.kaltenbrunner.cc/files/rand_gen_query.sql

[ scratches head... ]  Still no crash here, and I tried it on a couple
different types of hardware.  What configure parameters are you using,
and what non-default postgresql.conf settings?

            regards, tom lane

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> uploaded a dump of the dataset here:
>> http://www.kaltenbrunner.cc/files/rand_gen_data.sql
>> and the query that causes the segfault:
>> http://www.kaltenbrunner.cc/files/rand_gen_query.sql
>
> [ scratches head... ]  Still no crash here, and I tried it on a couple
> different types of hardware.  What configure parameters are you using,
> and what non-default postgresql.conf settings?

this is 8.5a2 configured with ./configure --enable-cassert
--enable-debug and  just default settings(ie plain initdb with default
settings).
The OS is Debian Lenny/AMD64.


Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> this is 8.5a2 configured with ./configure --enable-cassert
> --enable-debug and  just default settings(ie plain initdb with default
> settings).
> The OS is Debian Lenny/AMD64.

Huh.  That should not be noticeably different from my F11/Xeon64
machine ... but I still can't make it crash.  Somebody else is gonna
have to debug this one.

            regards, tom lane

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> this is 8.5a2 configured with ./configure --enable-cassert
>> --enable-debug and  just default settings(ie plain initdb with default
>> settings).
>> The OS is Debian Lenny/AMD64.
>
> Huh.  That should not be noticeably different from my F11/Xeon64
> machine ... but I still can't make it crash.  Somebody else is gonna
> have to debug this one.

hmm sorry - that was actually Lenny/i386 however I fail to reproduce
this on some of my other test boxes as well though I don't have an idea
why it crashes on my laptop(and for the original reporter) but not
elsewhere :(

Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>> this is 8.5a2 configured with ./configure --enable-cassert
>>> --enable-debug and  just default settings(ie plain initdb with
>>> default settings).
>>> The OS is Debian Lenny/AMD64.
>>
>> Huh.  That should not be noticeably different from my F11/Xeon64
>> machine ... but I still can't make it crash.  Somebody else is gonna
>> have to debug this one.
>
> hmm sorry - that was actually Lenny/i386 however I fail to reproduce
> this on some of my other test boxes as well though I don't have an idea
> why it crashes on my laptop(and for the original reporter) but not
> elsewhere :(

ok I now see why you (and I) failed to reproduce the problem - it only
causes clusters/databases to crash that were actually generated using
the upthread mentioned script. it does NOT fail using a dump generated
by a database that fails(!).
So the issue must be a bit more complex and somehow relate to some prior
stuff the script does.


Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Greg Stark
Date:
On Fri, Oct 30, 2009 at 11:22 AM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> ok I now see why you (and I) failed to reproduce the problem - it only
> causes clusters/databases to crash that were actually generated using the
> upthread mentioned script. it does NOT fail using a dump generated by a
> database that fails(!).
> So the issue must be a bit more complex and somehow relate to some prior
> stuff the script does.

Does it still crash if you compile with CFLAGS='-O0 -g' ? Could you
send a backtrace from that?

--
greg

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Stefan Kaltenbrunner
Date:
Greg Stark wrote:
> On Fri, Oct 30, 2009 at 11:22 AM, Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>> ok I now see why you (and I) failed to reproduce the problem - it only
>> causes clusters/databases to crash that were actually generated using the
>> upthread mentioned script. it does NOT fail using a dump generated by a
>> database that fails(!).
>> So the issue must be a bit more complex and somehow relate to some prior
>> stuff the script does.
>
> Does it still crash if you compile with CFLAGS='-O0 -g' ? Could you
> send a backtrace from that?

ok just assembled a new testcase from the querylog of the tool:


http://www.kaltenbrunner.cc/files/rand_gen_crash.sql

this crashes on i386 and results in something like:

ERROR:  invalid memory alloc request size 8589934592

on an AMD64 host for me.

However the error seems to go away after an ANALYZE... so I wonder if
this is just another case of "if we missestimated the size of the
hashtable we are doomed"


Stefan

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> ok just assembled a new testcase from the querylog of the tool:
> http://www.kaltenbrunner.cc/files/rand_gen_crash.sql

Ah, that works (or should I say fails).  Will take a look.

> However the error seems to go away after an ANALYZE... so I wonder if
> this is just another case of "if we missestimated the size of the
> hashtable we are doomed"

Well, we shouldn't dump core in any case.

            regards, tom lane

Re: BUG #5145: Complex query with lots of LEFT JOIN causes segfault

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> However the error seems to go away after an ANALYZE... so I wonder if
> this is just another case of "if we missestimated the size of the
> hashtable we are doomed"

Actually, what seems to be happening on a 32-bit machine is that
ExecChooseHashTableSize sets nbatch = INT_MAX/2, and then when we
try to do

        hashtable->outerBatchFile = (BufFile **)
            palloc0(nbatch * sizeof(BufFile *));

the memory size calculation overflows to zero, so we get an empty
outerBatchFile array.  So the fix is to make sure we limit the
number of batches to something sane, perhaps work_mem / sizeof(pointer).

            regards, tom lane