Re: full outer performance problem - Mailing list pgsql-performance
From | Kim Bisgaard |
---|---|
Subject | Re: full outer performance problem |
Date | |
Msg-id | 42A6E55C.4030203@dmi.dk Whole thread Raw |
In response to | Re: full outer performance problem (Bruno Wolff III <bruno@wolff.to>) |
Responses |
Re: full outer performance problem
|
List | pgsql-performance |
Hi Bruno, Thanks for the moral support! I feel so too - but I am confident it will show up soon. W.r.t. your rewrite of the query, I get this "ERROR: could not devise a query plan for the given query" but no further details - I will try google Regards, Kim. Bruno Wolff III wrote: >On Wed, Jun 08, 2005 at 11:37:40 +0200, > Kim Bisgaard <kib+pg@dmi.dk> wrote: > > >>Hi, >> >>I'm having problems with the query optimizer and FULL OUTER JOIN on >>PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. >>I might be naive, but I think that it should be possible? >> >>I have two BIG tables (virtually identical) with 3 NOT NULL columns >>Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id, >>TimeObs) and valid ANALYSE (set statistics=100). I want to join the two >>tables with a FULL OUTER JOIN. >> >>When I specify the query as: >> >>SELECT station_id, timeobs,temp_grass, temp_dry_at_2m >> FROM temp_dry_at_2m a >> FULL OUTER JOIN temp_grass b >> USING (station_id, timeobs) >> WHERE station_id = 52981 >> AND timeobs = '2004-1-1 0:0:0' >> >>I get the correct results >> >>station_id | timeobs | temp_grass | temp_dry_at_2m >>------------+---------------------+------------+---------------- >> 52944 | 2004-01-01 00:10:00 | | -1.1 >>(1 row) >> >>BUT LOUSY performance, and the following EXPLAIN: >> >> QUERY PLAN >>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) >>(actual time=187176.408..201436.264 rows=1 loops=1) >> Merge Cond: (("outer".station_id = "inner".station_id) AND >> ("outer".timeobs = "inner".timeobs)) >> Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND >> (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 >> 00:00:00'::timestamp without time zone)) >> -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual >> time=145748.253..153851.607 rows=6956994 loops=1) >> Sort Key: a.station_id, a.timeobs >> -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 >> rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 >> loops=1) >> -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual >> time=31668.876..34491.123 rows=2406292 loops=1) >> Sort Key: b.station_id, b.timeobs >> -> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 >> width=16) (actual time=0.052..5484.489 rows=2406292 loops=1) >>Total runtime: 201795.989 ms >>(10 rows) >> >> > >Someone else will need to comment on why Postgres can't use a more >efficient plan. What I think will work for you is to restrict >the station_id and timeobs on each side and then do a full join. >You can try something like the sample query below (which hasn't been tested): >SELECT station_id, timeobs, temp_grass, temp_dry_at_2m > FROM > (SELECT station_id, timeobs, temp_dry_at_2m > FROM temp_dry_at_2m > WHERE > station_id = 52981 > AND > timeobs = '2004-1-1 0:0:0') a > FULL OUTER JOIN > (SELECT station_id, timeobs, temp_grass > FROM temp_grass > WHERE > station_id = 52981 > AND > timeobs = '2004-1-1 0:0:0') b > USING (station_id, timeobs) > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > >
pgsql-performance by date: