Re: Postgresql 9.1 - select statement with multiple "with-clauses" becomes very slow - Mailing list pgsql-general

From Opel Fahrer
Subject Re: Postgresql 9.1 - select statement with multiple "with-clauses" becomes very slow
Date
Msg-id 1357218083.81382.YahooMailNeo@web171402.mail.ir2.yahoo.com
Whole thread Raw
In response to Re: Postgresql 9.1 - select statement with multiple "with-clauses" becomes very slow  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
Hi Alban,

thanks a lot - didn't realize it was so simple. It works like a charm!

Cheers


Von: Alban Hertroys <haramrae@gmail.com>
An: Opel Fahrer <opelfahrer79@yahoo.de>
CC: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Gesendet: 13:25 Donnerstag, 3.Januar 2013
Betreff: Re: [GENERAL] Postgresql 9.1 - select statement with multiple "with-clauses" becomes very slow

You're ending up with something that's basically a carthesian product of closebuildings and closebuildingdescriptions.

Your query looks like a simple join would serve just fine, something like:

prepare getmydata(real,real,real) AS (
select image, data
from info
inner join buildings on (buildings.id = info.building_id)
inner join buildingdescriptions on (buildingdescriptions.id = buildings.description_id)
where ST_DWithin(position, 'POINT($1 $2)', $3)
)


On 3 January 2013 12:43, Opel Fahrer <opelfahrer79@yahoo.de> wrote:
I'm a noob in writing efficient Postgres queries, so I wrote a first function to query multiple linked tables using the PostGIS extension. The query should fetch data from multiple tables and finally give me a table with two columns. Here's the code:

[code]
    prepare getmydata(real,real,real) AS (
     with
     closeby(id) AS (
      select buildingid from info where ST_DWithin(position, 'POINT($1 $2)', $3)   
     ),
     closebuildings(descriptionid,image) AS (
      select descriptionid,image from buildings where id IN (select * from closeby)
     ),
     closebuildingdescriptions(data) AS (
      select data from buildingdescriptions where id IN (select descriptionid from closebuildings)
     )
     select image,data from closebuildings,closebuildingdescriptions;
    );
    execute getmydata(0.0,0.0,10.0);
[/code]

Actually the problem is that this query is VERY slow, even if the database content is small (taking around 15 minutes or so). The problem seems to be that postgres has to make sure that for the select statement both columns have equal length. If I only do "select image from closebuildings", the results are delivered in 0.1 secs, a "select data from closebuildingdescriptions" is delivered in 7.8 secs.

I ran an "explain analyze" call, but I can't make any sense from the output:

[code]
"Nested Loop  (cost=7816.51..2636821.06 rows=131352326 width=36) (actual time=117.125..6723.014 rows=12845056 loops=1)"
"  CTE closeby"
"    ->  Seq Scan on info  (cost=0.00..1753.11 rows=186 width=4) (actual time=0.022..5.821 rows=1579 loops=1)"
"          Filter: (("position" && '0103000020797F000001000000050000007D3F35DEAC512041E3A59BBC9BE153417D3F35DEAC512041E3A59BBCA0E153417D3F35DED4512041E3A59BBCA0E153417D3F35DED4512041E3A59BBC9BE153417D3F35DEAC512041E3A59BBC9BE15341'::geometry) AND ('0101000020797F00007D3F35DEC0512041E3A59B3C9EE15341'::geometry && st_expand("position", 10::double precision)) AND _st_dwithin("position", '0101000020797F00007D3F35DEC0512041E3A59B3C9EE15341'::geometry, 10::double precision))"
"  CTE closebuildings"
"    ->  Hash Semi Join  (cost=6.04..4890.03 rows=1351 width=8) (actual time=54.743..61.025 rows=3584 loops=1)"
"          Hash Cond: (closebuildings.id = closeby.buildingid)"
"          ->  Seq Scan on closebuildings  (cost=0.00..4358.52 rows=194452 width=12) (actual time=0.042..31.646 rows=194452 loops=1)"
"          ->  Hash  (cost=3.72..3.72 rows=186 width=4) (actual time=7.073..7.073 rows=1579 loops=1)"
"                Buckets: 1024  Batches: 1  Memory Usage: 56kB"
"                ->  CTE Scan on closeby  (cost=0.00..3.72 rows=186 width=4) (actual time=0.023..6.591 rows=1579 loops=1)"
"  CTE closebuildingdescriptions"
"    ->  Nested Loop  (cost=30.40..1173.37 rows=97226 width=516) (actual time=117.103..1890.902 rows=3584 loops=1)"
"          ->  HashAggregate  (cost=30.40..32.40 rows=200 width=4) (actual time=63.529..66.176 rows=3584 loops=1)"
"                ->  CTE Scan on closebuildings  (cost=0.00..27.02 rows=1351 width=4) (actual time=54.746..62.316 rows=3584 loops=1)"
"          ->  Index Scan using buildingdescriptions_pkey on buildingdescriptions  (cost=0.00..5.69 rows=1 width=520) (actual time=0.506..0.507 rows=1 loops=3584)"
"                Index Cond: (id = closebuildings.descriptionid)"
"  ->  CTE Scan on closebuildingdescriptions  (cost=0.00..1944.52 rows=97226 width=32) (actual time=117.115..1901.993 rows=3584 loops=1)"
"  ->  CTE Scan on closebuildings  (cost=0.00..27.02 rows=1351 width=4) (actual time=0.000..0.536 rows=3584 loops=3584)"
"Total runtime: 7870.567 ms"
[/code]


If anyone can come up with a solution or a suggestion how to solve this, I would highly appreciate it.

Cheers



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Postgresql 9.1 - select statement with multiple "with-clauses" becomes very slow
Next
From: Rich Shepard
Date:
Subject: Database Design: Maintain Audit Trail of Changes