Re: Query plan for "heavy" SELECT with "lite" sub-SELECTs - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Query plan for "heavy" SELECT with "lite" sub-SELECTs
Date
Msg-id 4549E729.1080302@archonet.com
Whole thread Raw
In response to Query plan for "heavy" SELECT with "lite" sub-SELECTs  ("Nikolay Samokhvalov" <samokhvalov@gmail.com>)
List pgsql-performance
Nikolay Samokhvalov wrote:
> 2. explain analyze select
>  *,
>  (select typname from pg_type where pg_type.oid=pg_proc.prorettype limit 1)
> from pg_proc offset 1500 limit 1;
> "Limit  (cost=8983.31..8989.30 rows=1 width=365) (actual
> time=17.648..17.649 rows=1 loops=1)"
> "  ->  Seq Scan on pg_proc  (cost=0.00..13486.95 rows=2252 width=365)
> (actual time=0.100..16.851 rows=1501 loops=1)"
> "        SubPlan"
> "          ->  Limit  (cost=0.00..5.91 rows=1 width=64) (actual
> time=0.006..0.007 rows=1 loops=1501)"
> "                ->  Index Scan using pg_type_oid_index on pg_type
> (cost=0.00..5.91 rows=1 width=64) (actual time=0.004..0.004 rows=1
> loops=1501)"
> "                      Index Cond: (oid = $0)"
> "Total runtime: 17.784 ms"
>
> We see that in the 2nd example Postgres starts with "Index Scan using
> pg_type_oid_index" (1501 iterations!).

No, what you see here is that the inner loop is the index-scan over
pg_type_oid. It's running a sequential scan on pg_proc and then runs
1501 index scans against pg_type.

> My understanding of SQL says me
> that the simplest (and, in this case - and probably in *most* cases -
> fastest) way to perform such queries is to start from main SELECT and
> then, when we already have rows from "main" table, perform "lite"
> sub-selects. So, I expected  smth near 2.156 ms + 0.079 ms, but obtain
> 17.784 ms... For large table this is killing behaviour.

You've forgotten about the cost of matching up the two sets of rows.
Now, if the first part of the query outputs only one row then you might
be right, but I'm not sure that the SQL standard allows the subquery to
be delayed to that stage without explicitly organising the query that
way. From memory, the OFFSET/LIMIT takes place at the very end of the
query processing.

> What should I do to make Postgres work properly in such cases (I have
> a lot of similar queries; surely, they are executed w/o seqscans, but
> overall picture is the same - I see that starting from sub-selects
> dramatically decrease performance)?

Do you have a real example? That might be more practical.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "Nikolay Samokhvalov"
Date:
Subject: Query plan for "heavy" SELECT with "lite" sub-SELECTs
Next
From: Robins
Date:
Subject: Locking vs. Exceptions