Re: slow joining very large table to smaller ones - Mailing list pgsql-performance

From John A Meinel
Subject Re: slow joining very large table to smaller ones
Date
Msg-id 42D6F7CE.5010007@arbash-meinel.com
Whole thread Raw
In response to Re: slow joining very large table to smaller ones  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> John A Meinel <john@arbash-meinel.com> writes:
>
>>What I don't understand is that the planner is actually estimating that
>>joining against the new table is going to *increase* the number of
>>returned rows.
>
>
> It evidently thinks that incidentid in the k_r table is pretty
> nonunique.  We really need to look at the statistics data to
> see what's going on.
>
>             regards, tom lane
>

Okay, sure. What about doing this, then:

EXPLAIN ANALYZE
SELECT recordtext FROM eventactivity
  JOIN (SELECT DISTINCT incidentid FROM k_r JOIN k_b USING (incidentid)
     WHERE k_r.id = ?? AND k_b.id = ??)
 USING (incidentid)
;

Since I assume that eventactivity is the only table with "recordtext",
and that you don't get any columns from k_r and k_b, meaning it would be
pointless to get duplicate incidentids.

I may be misunderstanding what the query is trying to do, but depending
on what is in k_r and k_b, is it possible to use a UNIQUE INDEX rather
than just an index on incidentid?

There is also the possibility of
EXPLAIN ANALYZE
SELECT recordtext FROM eventactivtity
  JOIN (SELECT incidentid FROM k_r WHERE k_r.id = ??
         UNION SELECT incidentid FROM k_b WHERE k_b.id = ??)
 USING (incidentid)
;

But both of these would mean that you don't actually want columns from
k_r or k_b, just a unique list of incident ids.

But first, I agree, we should make sure the pg_stats values are reasonable.

John
=:->


Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: slow joining very large table to smaller ones
Next
From: alison@mirrabooka.com (Alison Winters)
Date:
Subject: Re: lots of updates on small table