Re: IN list processing performance (yet again) - Mailing list pgsql-performance

From Tom Lane
Subject Re: IN list processing performance (yet again)
Date
Msg-id 9207.1054165441@sss.pgh.pa.us
Whole thread Raw
In response to IN list processing performance (yet again)  (Dave Tenny <tenny@attbi.com>)
Responses Re: IN list processing performance (yet again)  (Dave Tenny <tenny@attbi.com>)
List pgsql-performance
Dave Tenny <tenny@attbi.com> writes:
> My application relies heavily on IN lists.  The lists are primarily
> constant integers, so queries look like:
> SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...)

> 1) PostgreSQL exhibits worse-than-linear performance behavior with
> respect to IN list size.

Yeah.  There are a couple of places in the planner that have O(N^2)
behavior on sufficiently large WHERE clauses, due to building lists
in a naive way (repeated lappend() operations).  The inner loop is
very tight, but nonetheless when you do it tens of millions of times
it adds up :-(

I have just committed some fixes into CVS tip for this --- I see about
a 10x speedup in planning time on test cases involving 10000-OR-item
WHERE clauses.  We looked at this once before; the test cases I'm using
actually date back to Jan 2000.  But it seems some slowness has crept
in due to subsequent planning improvements.


> 4)  God help you if you haven't vacuum/analyzed that the newly loaded
> table.

Without knowledge that the id field is unique, the planner is likely to
tilt away from an indexscan with not too many IN items.  I don't
consider this a bug.


>       PostgreSQL  craps out trying to process 8000 elements with the error:
>       out of free buffers: time to abort!

This is a known bug in 7.3.2; it's fixed in 7.3.3.

            regards, tom lane

pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: Wildcard searches & performance question
Next
From: Dave Tenny
Date:
Subject: Re: IN list processing performance (yet again)