Re: performance tuning in large function / transaction - Mailing list pgsql-sql

From Tom Lane
Subject Re: performance tuning in large function / transaction
Date
Msg-id 29547.1008570076@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance tuning in large function / transaction  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses 'IN' performance
List pgsql-sql
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Is it true that the IN command is implemented sort of as a linked list
> linear time search?  Is there any plan for a super-fast implementation of
> 'IN'?

This deserves a somewhat long-winded answer.

Postgres presently supports two kinds of IN (I'm not sure whether SQL92
allows any additional kinds):

1. Scalar-list IN:  foo IN ('bar', 'baz', 'quux', ...)

2. Sub-select IN:   foo IN (SELECT bar FROM ...)

In the scalar-list form, a variable is compared to an explicit list of
constants or expressions.  This form is exactly equivalent tofoo = 'bar' OR foo = 'baz' OR foo = 'quux' OR ...
and is converted into that form by the parser.  The planner is capable
of converting a WHERE clause of this kind into multiple passes of
indexscan, when foo is an indexed column and all the IN-list elements
are constants.  Whether it actually will make that conversion depends
on the usual vagaries of pg_statistic entries, etc.  But if it's a
unique or fairly-selective index, and there aren't a huge number of
entries in the IN list, a multiple indexscan should be a good plan.

In the sub-select form, we pretty much suck: for each tuple in the outer
query, we run the inner query until we find a matching value or the
inner query ends.  This is basically a nested-loop scenario, with the
only (minimally) redeeming social value being that the planner realizes
it should pick a fast-start plan for the inner query.  I think it should
be possible to convert this form into a modified kind of join (sort of
the reverse of an outer join: rather than at least one result per
lefthand row, at most one result per lefthand row), and then we could
use join methods that are more efficient than nested-loop.  But no one's
tried to make that happen yet.
        regards, tom lane


pgsql-sql by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: performance tuning in large function / transaction
Next
From: "Christopher Kings-Lynne"
Date:
Subject: 'IN' performance