Re: slow joining very large table to smaller ones - Mailing list pgsql-performance

From John A Meinel
Subject Re: slow joining very large table to smaller ones
Date
Msg-id 42D6F177.3000002@arbash-meinel.com
Whole thread Raw
In response to Re: slow joining very large table to smaller ones  (Dan Harris <fbsd@drivefaster.net>)
Responses Re: slow joining very large table to smaller ones
Re: slow joining very large table to smaller ones
List pgsql-performance
Dan Harris wrote:
>
> On Jul 14, 2005, at 9:42 AM, John A Meinel wrote:
>
>>
>>
>> You might try giving it a little bit more freedom with:
>>
>> EXPLAIN ANALYZE
>> SELECT recordtext FROM eventactivity, k_r, k_b
>>  WHERE eventactivity.incidentid = k_r.incidentid
>>    AND eventactivity.incidentid = k_b.incidentid
>>    AND k_r.id = 94
>>    AND k_b.id = 107
>> -- AND k_r.incidentid = k_b.incidentid
>> ;
>>
>> I'm pretty sure that would give identical results, just let the  planner
>> have a little bit more freedom about how it does it.
>> Also the last line is commented out, because I think it is redundant.
>>
>
> Ok, I tried this one.  My ssh keeps getting cut off by a router
> somewhere between me and the server due to inactivity timeouts, so  all
> I know is that both the select and explain analyze are taking  over an
> hour to run.  Here's the explain select for that one, since  that's the
> best I can get.
>
> explain select recordtext from eventactivity,k_r,k_b where
> eventactivity.incidentid = k_r.incidentid and  eventactivity.incidentid
> = k_b.incidentid and k_r.id = 94 and k_b.id  = 107;
>                                                   QUERY PLAN
> ------------------------------------------------------------------------
> --------------------------------------
> Merge Join  (cost=9624.61..4679590.52 rows=151009549 width=35)
>    Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>    ->  Merge Join  (cost=4766.92..4547684.26 rows=16072733 width=117)
>          Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>          ->  Index Scan using eventactivity1 on eventactivity
> (cost=0.00..4186753.16 rows=46029271 width=49)
>          ->  Sort  (cost=4766.92..4771.47 rows=1821 width=68)
>                Sort Key: (k_b.incidentid)::text
>                ->  Index Scan using k_b_idx on k_b   (cost=0.00..4668.31
> rows=1821 width=68)
>                      Index Cond: (id = 107)
>    ->  Sort  (cost=4857.69..4862.39 rows=1879 width=68)
>          Sort Key: (k_r.incidentid)::text
>          ->  Index Scan using k_r_idx on k_r  (cost=0.00..4755.52
> rows=1879 width=68)
>                Index Cond: (id = 94)
> (13 rows)
>

If anything, the estimations have gotten worse. As now it thinks there
will be 1800 rows returned each, whereas you were thinking it would be
more around 100.

Since you didn't say, you did VACUUM ANALYZE recently, right?

>

...

>>
>> You can also try disabling merge joins, and see how that changes  things.
>>
>
> Are there any negative sideaffects of doing this?

If the planner is estimating things correctly, you want to give it the
most flexibility of plans to pick from, because sometimes a merge join
is faster (postgres doesn't pick things because it wants to go slower).
The only reason for the disable flags is that sometimes the planner
doesn't estimate correctly. Usually disabling a method is not the final
solution, but a way to try out different methods, and see what happens
to the results.

Using: SET enable_mergejoin TO off;
You can disable it just for the current session (not for the entire
database). Which is the recommended way if you have a query that
postgres is messing up on. (Usually it is correct elsewhere).


>>
>> Well, postgres is estimating around 500 rows each, is that way off?  Try
>> just doing:
>> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
>> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;

Once again, do this and post the results. We might just need to tweak
your settings so that it estimates the number of rows correctly, and we
don't need to do anything else.

>>
>> And see if postgres estimates the number of rows properly.
>>
>> I assume you have recently VACUUM ANALYZEd, which means you might need
>> to update the statistics target (ALTER TABLE k_b ALTER COLUMN
>> incidientid SET STATISTICS 100) default is IIRC 10, ranges from  1-1000,
>> higher is more accurate, but makes ANALYZE slower.
>>


...

>> EXPLAIN ANALYZE
>> SELECT recordtext FROM eventactivity
>>  JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid)
>>     WHERE k_r.id = 94 AND k_b.id = 107)
>> USING (incidentid);
>>
>
> This one looks like the same plan as the others:
>
> explain select recordtext from eventactivity join ( select incidentid
> from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id =  107
> ) a  using (incidentid );

Well, the planner is powerful enough to flatten nested selects. To make
it less "intelligent" you can do:
SET join_collapse_limit 1;
or
SET join_collapse_limit 0;
Which should tell postgres to not try and get tricky with your query.
Again, *usually* the planner knows better than you do. So again just do
it to see what you get.

The problem is that if you are only using EXPLAIN SELECT, you will
probably get something which *looks* worse. Because if it looked better,
the planner would have used it. That is why you really need the EXPLAIN
ANALYZE, so that you can see where the planner is incorrect in it's
estimates.


>                                                   QUERY PLAN
> ------------------------------------------------------------------------
> --------------------------------------
> Merge Join  (cost=9793.33..4693149.15 rows=156544758 width=35)
>    Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>    ->  Merge Join  (cost=4847.75..4557237.59 rows=16365843 width=117)
>          Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>          ->  Index Scan using eventactivity1 on eventactivity
> (cost=0.00..4191691.79 rows=46084161 width=49)
>          ->  Sort  (cost=4847.75..4852.38 rows=1852 width=68)
>                Sort Key: (k_b.incidentid)::text
>                ->  Index Scan using k_b_idx on k_b   (cost=0.00..4747.24
> rows=1852 width=68)
>                      Index Cond: (id = 107)
>    ->  Sort  (cost=4945.58..4950.36 rows=1913 width=68)
>          Sort Key: (k_r.incidentid)::text
>          ->  Index Scan using k_r_idx on k_r  (cost=0.00..4841.30
> rows=1913 width=68)
>                Index Cond: (id = 94)
> (13 rows)
>

What I don't understand is that the planner is actually estimating that
joining against the new table is going to *increase* the number of
returned rows.
Because the final number of rows here is 156M while the initial join
shows only 16M.
And it also thinks that it will only grab 46M rows from eventactivity.

If you have analyzed recently can you do:
SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity';

It is a cheaper form than "SELECT count(*) FROM eventactivity" to get an
approximate estimate of the number of rows. But if it isn't too
expensive, please also give the value from SELECT count(*) FROM
eventactivity.

Again, that helps us know if your tables are up-to-date.


>
>
>> I don't know how selective your keys are, but one of these queries
>> should probably structure it better for the planner. It depends a  lot on
>> how selective your query is.
>
>
> eventactivity currently has around 36 million rows in it. There  should
> only be maybe 200-300 incidentids at most that will be matched  with the
> combination of k_b and k_r.  That's why I was thinking I  could somehow
> get a list of just the incidentids that matched the id  = 94 and id =
> 107 in k_b and k_r first. Then, I would only need to  grab a few hundred
> out of 36 million rows from eventactivity.
>

Well, you can also try:
SELECT count(*) FROM k_b JOIN k_r USING (incidentid)
 WHERE k_b.id=?? AND k_r.id=??
;

That will tell you how many rows they have in common.

>> If you have 100M rows, the above query looks like it expects k_r to
>> restrict it to 44M rows, and k_r + k_b down to 11M rows, which really
>> should be a seq scan (> 10% of the rows = seq scan). But if you are
>> saying the selectivity is mis-estimated it could be different.
>>
>
> Yeah, if I understand you correctly, I think the previous paragraph
> shows this is a significant misestimate.
>

Well, if you look at the latest plans, things have gone up from 44M to
156M, I don't know why it is worse, but it is getting there.

John
=:->

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: slow joining very large table to smaller ones
Next
From: Tom Lane
Date:
Subject: Re: slow joining very large table to smaller ones