Re: Propogating conditions into a query - Mailing list pgsql-general

From Kim Bisgaard
Subject Re: Propogating conditions into a query
Date
Msg-id 42BA6653.2090201@dmi.dk
Whole thread Raw
In response to Re: Propogating conditions into a query  (Kim Bisgaard <kib+pg@dmi.dk>)
Responses Re: Propogating conditions into a query
List pgsql-general
Hi Tom,

I have now completed the move to PG8.0.3, and feel that I have confirmed
that this problem is related to the problem I'm having:

Formulated like this, it is not performing:

SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
       FROM temp_dry_at_2m a
       FULL OUTER JOIN temp_grass b
       USING (station_id, timeobs)
       WHERE station_id = 52981
         AND timeobs = '2004-1-1 0:0:0';

Merge Full Join  (cost=1598312.83..11032924.48 rows=6956994 width=32) (actual time=119061.098..133314.306 rows=1
loops=1)
  Merge Cond: (("outer".timeobs = "inner".timeobs) AND ("outer".station_id = "inner".station_id))
  Filter: ((COALESCE("inner".station_id, "outer".station_id) = 52981) AND (COALESCE("inner".timeobs, "outer".timeobs) =
'2004-01-0100:00:00'::timestamp without time zone)) 
  ->  Sort  (cost=346429.38..352445.11 rows=2406292 width=16) (actual time=20315.241..23850.529 rows=2406292 loops=1)
        Sort Key: b.timeobs, b.station_id
        ->  Seq Scan on temp_grass b  (cost=0.00..41756.92 rows=2406292 width=16) (actual time=10.517..7003.468
rows=2406292loops=1) 
  ->  Sort  (cost=1251883.44..1269275.93 rows=6956994 width=16) (actual time=82122.354..92027.850 rows=6956994 loops=1)
        Sort Key: a.timeobs, a.station_id
        ->  Seq Scan on temp_dry_at_2m a  (cost=0.00..117549.94 rows=6956994 width=16) (actual time=23.759..39930.741
rows=6956994loops=1) 
Total runtime: 133623.422 ms

But Postgresql can do the work, if it is reformulated into:

SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
 FROM
   (SELECT station_id, timeobs, temp_dry_at_2m
     FROM temp_dry_at_2m
     WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a
   FULL OUTER JOIN
   (SELECT station_id, timeobs, temp_grass
     FROM temp_grass
     WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b
   USING (station_id, timeobs)

Merge Full Join  (cost=0.00..43023.64 rows=10614 width=32) (actual time=0.056..0.064 rows=1 loops=1)
  ->  Index Scan using temp_grass_idx on temp_grass  (cost=0.00..246.55 rows=61 width=16) (actual time=0.029..0.031
rows=1loops=1) 
        Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
  ->  Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m  (cost=0.00..699.52 rows=174 width=16) (actual
time=0.017..0.020rows=1 loops=1) 
        Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
Total runtime: 0.163 ms


The reason the first query is not performing is because the query
optimizer does not push the conditions down into the sub-queries - right??

The reason that I do not just use the reformulated query, is that e.g.
the station_id comes from another table (and there can be more of them),
so it is bloody inconvenient to first select them, and then repeat them
a number of time in the above transformation (I need to outer join more
than two tables) ........

Best regards,


Kim Bisgaard wrote:

> Hi Tom,
>
> This sounds like the same "problem" which prevented PG from using the
> indices, and thus giving abyssmal performance in this other thread:
>
>> I have two BIG tables (virtually identical) with 3 NOT NULL columns
>> Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id,
>> TimeObs) and valid ANALYSE (set statistics=100). I want to join the
>> two tables with a FULL OUTER JOIN.
>>
>> When I specify the query as:
>>
>> SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
>>        FROM temp_dry_at_2m a
>>        FULL OUTER JOIN temp_grass b        USING (station_id, timeobs)
>>        WHERE station_id = 52981
>>          AND timeobs = '2004-1-1 0:0:0'
>
>
> Then I would also vote for improving the inteligence of the optimizer!
> :-)
>
> Regards,
> Kim.
>
> Tom Lane wrote:
>
>> Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
>>
>>
>>> I don't see anything in there about LEFT OUTER JOIN though.  Any ideas?
>>>
>>
>>
>> Oh, I missed that part of your message.  Hmm, I think the issue is
>> that in
>>
>>
>>
>>>> D join (M join G on (M.g=G.id)) on (D.id=M.b) where D.id=nnn
>>>>
>>>
>>
>> the planner deduces M.b=nnn by transitivity, but when the join is an
>> outer join it can't make the same deduction.
>>
>> [ thinks some more... ]  If we distinguished conditions that hold below
>> the join from those that hold above it, we could deduce that M.b=nnn can
>> be enforced below the join even though it might not be true above it.
>> There's no such mechanism in existence now, though.
>>
>> A possible workaround is to generate your query like
>>
>> D left join (M join G on (M.g=G.id)) on (D.id=M.b AND M.b=nnn) where
>> D.id=nnn
>>
>> but I don't know how practical that is for you.
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

pgsql-general by date:

Previous
From: Akash Garg
Date:
Subject: Corrupted index
Next
From: Zlatko Matić
Date:
Subject: help about the function