full outer performance problem - Mailing list pgsql-performance

From Kim Bisgaard
Subject full outer performance problem
Date
Msg-id 428078E6.90007@dmi.dk
Whole thread Raw
Responses Re: full outer performance problem
Re: full outer performance problem
List pgsql-performance
Hi,

I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible?

I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_XXXX, with indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN.

When I specify the query as:
select temp_max_60min,temp_dry_at_2m
from station s natural join
temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs)
where s.wmo_id=6065 
and timeobs='2004-1-1 0:0:0'
and '2004-1-1 0:0:0' between s.startdate and s.enddate;
I get the correct results, BUT LOUSY performance, and the following explain:
 Nested Loop Left Join  (cost=5.84..163484.08 rows=1349 width=12) (actual time=66146.815..119005.381 rows=1 loops=1)  Filter: (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone)  ->  Hash Join  (cost=5.84..155420.24 rows=1349 width=16) (actual time=8644.449..110836.038 rows=109826 loops=1)        Hash Cond: ("outer".station_id = "inner".station_id)        ->  Seq Scan on temp_dry_at_2m a  (cost=0.00..120615.94 rows=6956994 width=16) (actual time=0.024..104548.515 rows=6956994 loops=1)        ->  Hash  (cost=5.84..5.84 rows=1 width=4) (actual time=0.114..0.114 rows=0 loops=1)              ->  Index Scan using wmo_idx on station  (cost=0.00..5.84 rows=1 width=4) (actual time=0.105..0.108 rows=1 loops=1)                    Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate))  ->  Index Scan using temp_max_60min_idx on temp_max_60min b  (cost=0.00..5.96 rows=1 width=20) (actual time=0.071..0.071 rows=0 loops=109826)        Index Cond: (("outer".station_id = b.station_id) AND ("outer".timeobs = b.timeobs))Total runtime: 119005.499 ms
(11 rows)
If I change the query to (and thus negates the full outer join):
select temp_max_60min,temp_dry_at_2m
from station s natural join
temp_dry_at_2m a full outer join temp_max_60min b using (station_id, timeobs)
where s.wmo_id=6065 
and a.timeobs='2004-1-1 0:0:0' and b.timeobs='2004-1-1 0:0:0' 
and '2004-1-1 0:0:0' between s.startdate and s.enddate;

I get wrong results (In the case where one of the records is missing in one of the tables), BUT GOOD performance, and this query plan:
 Nested Loop  (cost=0.00..17.83 rows=1 width=12) (actual time=79.221..79.236 rows=1 loops=1)  ->  Nested Loop  (cost=0.00..11.82 rows=1 width=24) (actual time=65.517..65.526 rows=1 loops=1)        ->  Index Scan using wmo_idx on station  (cost=0.00..5.83 rows=1 width=4) (actual time=0.022..0.026 rows=1 loops=1)              Index Cond: ((wmo_id = 6065) AND ('2004-01-01 00:00:00'::timestamp without time zone >= startdate) AND ('2004-01-01 00:00:00'::timestamp without time zone <= enddate))        ->  Index Scan using temp_max_60min_idx on temp_max_60min b  (cost=0.00..5.97 rows=1 width=20) (actual time=65.483..65.486 rows=1 loops=1)              Index Cond: (("outer".station_id = b.station_id) AND (b.timeobs = '2004-01-01 00:00:00'::timestamp without time zone))  ->  Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m a  (cost=0.00..6.00 rows=1 width=16) (actual time=13.694..13.698 rows=1 loops=1)        Index Cond: (("outer".station_id = a.station_id) AND (a.timeobs = '2004-01-01 00:00:00'::timestamp without time zone))Total runtime: 79.340 ms
(9 rows)

If further info like EXPLAIN VERBOSE is useful please say so and I will provide it.

Thanks in advance!
Kim Bisgaard.

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Next
From: Alex Stapleton
Date:
Subject: Partitioning / Clustering