Thread: lots of values for IN() clause

lots of values for IN() clause

From
Markus Schiltknecht
Date:
Hi,

One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
query (backend SEGFAULTs). I've figured the crashes were caused by a
very long IN() clause.

You can easily reproduce the crash by feeding the output of the python
script below to your database.

Fortunately, 8.2 (as of 09/19/2006) no longer crashes. Anyway, I think
it would be helpful to have at least a better error message instead of a
SEGFAULT for 8.1. I didn't test earlier versions.

Regards

Markus


python script mentioned:

#!/usr/bin/python

values = 100000

print "CREATE TABLE test (a INT, b INT);"
print "SELECT a, b FROM test WHERE b IN (" + \
    ",".join([str(x) for x in range(values)]) + \
    ");"
print "DROP TABLE test;"

Re: lots of values for IN() clause

From
Shelby Cain
Date:
I don't get a segfault on 8.0.8 under linux or 8.1.4 under win32.  The backend (correctly I assume) issues a hint to
increasemax_stack_depth in both cases. 

$ psql -h localhost -p 5432 -d test -U readwrite < script.sql

Password:

CREATE TABLE

ERROR:  stack depth limit exceeded

HINT:  Increase the configuration parameter "max_stack_depth".

DROP TABLE

Regards,

Shelby Cain

----- Original Message ----
From: Markus Schiltknecht <markus@bluegap.ch>
To: pgsql general <pgsql-general@postgresql.org>
Sent: Thursday, November 2, 2006 9:21:52 AM
Subject: [GENERAL] lots of values for IN() clause

Hi,

One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
query (backend SEGFAULTs). I've figured the crashes were caused by a
very long IN() clause.

You can easily reproduce the crash by feeding the output of the python
script below to your database.

Fortunately, 8.2 (as of 09/19/2006) no longer crashes. Anyway, I think
it would be helpful to have at least a better error message instead of a
SEGFAULT for 8.1. I didn't test earlier versions.

Regards

Markus


python script mentioned:

#!/usr/bin/python

values = 100000

print "CREATE TABLE test (a INT, b INT);"
print "SELECT a, b FROM test WHERE b IN (" + \
    ",".join([str(x) for x in range(values)]) + \
    ");"
print "DROP TABLE test;"

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match






Re: lots of values for IN() clause

From
Markus Schiltknecht
Date:
Hi,

thanks for testing, unfortunately I don't have a 8.0 around. And as 8.2
works and is probably coming very soon...

Regards

Markus

Shelby Cain wrote:
> I don't get a segfault on 8.0.8 under linux or 8.1.4 under win32.  The backend (correctly I assume) issues a hint to
increasemax_stack_depth in both cases. 
>
> $ psql -h localhost -p 5432 -d test -U readwrite < script.sql
>
> Password:
>
> CREATE TABLE
>
> ERROR:  stack depth limit exceeded
>
> HINT:  Increase the configuration parameter "max_stack_depth".
>
> DROP TABLE
>
> Regards,
>
> Shelby Cain


Re: lots of values for IN() clause

From
Shelby Cain
Date:
No segfault in 8.1.5 on win32 either...

----- Original Message ----
From: Markus Schiltknecht <markus@bluegap.ch>
To: Shelby Cain <alyandon@yahoo.com>
Cc: pgsql general <pgsql-general@postgresql.org>
Sent: Thursday, November 2, 2006 9:55:08 AM
Subject: Re: [GENERAL] lots of values for IN() clause

Hi,

thanks for testing, unfortunately I don't have a 8.0 around. And as 8.2
works and is probably coming very soon...

Regards

Markus








Re: lots of values for IN() clause

From
Alvaro Herrera
Date:
Markus Schiltknecht wrote:
> Hi,
>
> One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
> query (backend SEGFAULTs). I've figured the crashes were caused by a
> very long IN() clause.
>
> You can easily reproduce the crash by feeding the output of the python
> script below to your database.

I'd argue that you have max_stack_depth set to an invalid value (higher
than your true stack limit).  I tried your example here on 8.1.5 and got
this:

alvherre=# \i foo
CREATE TABLE
psql:foo:2: ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth".
DROP TABLE

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: lots of values for IN() clause

From
Richard Huxton
Date:
Markus Schiltknecht wrote:
> Hi,
>
> thanks for testing, unfortunately I don't have a 8.0 around. And as 8.2
> works and is probably coming very soon...

If you can reliably reproduce it (I can't here - Debian on x86) - a
bug-report on the bugs mailing list or the website would probably be
appreciated by the developers. PG version, OS version, method of install
etc.

--
   Richard Huxton
   Archonet Ltd

Re: lots of values for IN() clause

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Markus Schiltknecht wrote:
> > Hi,
> >
> > One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
> > query (backend SEGFAULTs). I've figured the crashes were caused by a
> > very long IN() clause.
> >
> > You can easily reproduce the crash by feeding the output of the python
> > script below to your database.
>
> I'd argue that you have max_stack_depth set to an invalid value (higher
> than your true stack limit).  I tried your example here on 8.1.5 and got
> this:

Forgot to quote the value:

alvherre=# show max_stack_depth ;
 max_stack_depth
-----------------
 2048
(1 fila)

This is a stock ix86 machine running Linux 2.6.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)

Re: lots of values for IN() clause

From
Richard Huxton
Date:
Markus Schiltknecht wrote:
> Hi,
>
> One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
> query (backend SEGFAULTs). I've figured the crashes were caused by a
> very long IN() clause.

(Sorry for replying twice in a row)

Could it be out-of-memory rather than a segfault you're seeing? That is,
you've got max_stack_depth set high enough that you're exhausting system
RAM (Or some other limit).

--
   Richard Huxton
   Archonet Ltd

Re: lots of values for IN() clause

From
Markus Schiltknecht
Date:
Hello Alvaro,

yeah, thanks, that's it. postgresql.conf had:

max_stack_depth = 8192                  # min 100, size in KB

I don't know who put it at 8192. According to the fine manual at [1], it
should be set to something below 'ulimit -s', which gives 8192 on the
machine in question. I've now set it to 7000 and I also get a warning
instead of a SEGFAULT.

Thank you!

Markus

[1]:
http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html

Alvaro Herrera wrote:
> Markus Schiltknecht wrote:
>> Hi,
>>
>> One of our PostgreSQL 8.1.5 databases constantly crashed on a certain
>> query (backend SEGFAULTs). I've figured the crashes were caused by a
>> very long IN() clause.
>>
>> You can easily reproduce the crash by feeding the output of the python
>> script below to your database.
>
> I'd argue that you have max_stack_depth set to an invalid value (higher
> than your true stack limit).  I tried your example here on 8.1.5 and got
> this:
>
> alvherre=# \i foo
> CREATE TABLE
> psql:foo:2: ERROR:  stack depth limit exceeded
> HINT:  Increase the configuration parameter "max_stack_depth".
> DROP TABLE
>


Re: lots of values for IN() clause

From
Markus Schiltknecht
Date:
Hi,

Richard Huxton wrote:
> If you can reliably reproduce it (I can't here - Debian on x86) - a
> bug-report on the bugs mailing list or the website would probably be
> appreciated by the developers. PG version, OS version, method of install
> etc.

I've thought about that, but I somehow just *knew* it was my fault and
not a bug in PostgreSQL ;-)

OTOH, having to configure such things is not exactly user friendly. I
guess it's difficult to determine the stack limit in a cross-platform
way. Or does having that configuration option other reasons for existence?

Regards

Markus

Re: lots of values for IN() clause

From
Martijn van Oosterhout
Date:
On Thu, Nov 02, 2006 at 05:16:40PM +0100, Markus Schiltknecht wrote:
> OTOH, having to configure such things is not exactly user friendly. I
> guess it's difficult to determine the stack limit in a cross-platform
> way. Or does having that configuration option other reasons for existence?

A patch went in recently that (on platforms where it's possible) tries
to determine the maximum stack depth and complains if you set it too
large. So this problem should go away in the future...

There might be situations where you want to be able to use a larger
stack, that's why it's configurable. PostgreSQL now actually checks
whether the number you provided makes sense.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: lots of values for IN() clause

From
Alvaro Herrera
Date:
Markus Schiltknecht wrote:
> Hello Alvaro,
>
> yeah, thanks, that's it. postgresql.conf had:
>
> max_stack_depth = 8192                  # min 100, size in KB
>
> I don't know who put it at 8192. According to the fine manual at [1], it
> should be set to something below 'ulimit -s', which gives 8192 on the
> machine in question. I've now set it to 7000 and I also get a warning
> instead of a SEGFAULT.

I notice it crashes with max_stack_depth set to 8173, but correctly
detects the error with max_stack_depth set to 8172.  The doc suggests a
safety margin of "a megabyte or so", so I think we are conforming to our
docs here.

Tom recently added a check for getrlimit(RLIMIT_STACK), but I don't know
if that considered the "megabyte or so".  *peeks the code*  Yeah,
there's a 512 kb "daylight", but there's also an absolute maximum of
2MB.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: lots of values for IN() clause

From
"Gurjeet Singh"
Date:
But I do not understand why the IN list has to make so many recursive calls???

I know if it was possible, it'd have been done already... but can 'making it iterative' (whatever 'it' stands for) be reconsidered?

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

On 11/2/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
I notice it crashes with max_stack_depth set to 8173, but correctly
detects the error with max_stack_depth set to 8172.  The doc suggests a
safety margin of "a megabyte or so", so I think we are conforming to our
docs here.

Tom recently added a check for getrlimit(RLIMIT_STACK), but I don't know
if that considered the "megabyte or so".  *peeks the code*  Yeah,
there's a 512 kb "daylight", but there's also an absolute maximum of
2MB.

Re: lots of values for IN() clause

From
Tom Lane
Date:
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
> But I do not understand why the IN list has to make so many recursive
> calls???

Existing releases effectively expand "foo IN (1,2,3,...)" to
"(((foo = 1) OR foo = 2) OR foo = 3) ..." which can be a deeply
nested OR structure.  IIRC this is changed in 8.2.

            regards, tom lane

Re: lots of values for IN() clause

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom recently added a check for getrlimit(RLIMIT_STACK), but I don't know
> if that considered the "megabyte or so".  *peeks the code*  Yeah,
> there's a 512 kb "daylight", but there's also an absolute maximum of
> 2MB.

No, there's no absolute maximum (unless I blew the logic).  The
*default* if you do not specify max_stack_depth at all is capped at 2MB,
which is the same as the old default.  I did that to avoid creating any
new failure if getrlimit lies to us for some reason.  Possibly once we
have more confidence in that code, we can be more aggressive about
setting max_stack_depth to getrlimit(RLIMIT_STACK) minus 512K or so.

            regards, tom lane