Re: Recursively traversing a partially ordered set - Mailing list pgsql-sql

From Gregory Stark
Subject Re: Recursively traversing a partially ordered set
Date
Msg-id 87zm3nocn2.fsf@oxford.xeocode.com
Whole thread Raw
In response to Recursively traversing a partially ordered set  (Jason Grout <jason-postgresql@creativetrax.com>)
List pgsql-sql
"Jason Grout" <jason-postgresql@creativetrax.com> writes:

> 2. Is there a big difference in speed between using an array versus
> using a SELECT in a WHERE condition?  In other words, which is generally
> going to be faster:
>
> SELECT * from table where field IN (some function returning a SETOF);
>
> or
>
> SELECT * from table where field = ANY(some function returning an array);

In theory since these are equivalent (well nearly. To be equivalent the
optimizer would need to know whether the array could possibly have NULLs in
it) the optimizer ought to produce the same plan for each. In practice it's
not clear where the optimizer would get the information to decide what plan to
use for these two cases and whether it would have the same kinds of
information available.

In any case in practice the plans available in each of these cases are not the
same so you'll have to try them and see which one works better for you. I
think there are more plans available for the first case so it may work out
better if you're returning quite large sets where those plans help. If you're
returning quite small sets where you just need a simple bitmap index scan then
the second will be less likely to pick (or actually I think it's incapable of
picking) some other plan which works poorly.

There was some discussion recently on what to do about exactly this type of
case. I would be interested to hear about what plans you got from each and
which plan ended up being best.

> 3. Is there a strong reason I should strip out duplicates in either of
> the two cases in question 2?  Or is the performance about the same when
> doing the queries whether or not the SETOF or arrays contain duplicates?

The plans where it matters will remove the duplicates anyways, but I don't
think the array version does if you're not using an bitmap index scan.

> 4. Can you see any obvious optimizations to the above functions
> (particularly the last one)?
>
> Thanks for your help. Thanks for the absolutely wonderful database and
> solid documentation.  I originally did this project in MySQL and had the
> weirdest errors (the errors turned out to be due to the default
> case-insensitive collation of MySQL!).  That's when I decided to move to
> postgresql when I updated the project.

Well, unfortunately collocation support isn't exactly a strong point in
Postgres either. Plenty of people get bitten by their database being initdb'd
in a locale they didn't expect.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



pgsql-sql by date:

Previous
From: Jason Grout
Date:
Subject: Recursively traversing a partially ordered set
Next
From: Joshua
Date:
Subject: problems with SELECT query results