Re: How to force Nested Loop plan? - Mailing list pgsql-performance

From Tom Lane
Subject Re: How to force Nested Loop plan?
Date
Msg-id 24637.1062255721@sss.pgh.pa.us
Whole thread Raw
In response to How to force Nested Loop plan?  (Rob Nagler <nagler@bivio.biz>)
Responses Re: How to force Nested Loop plan?  (Rob Nagler <nagler@bivio.biz>)
List pgsql-performance
Rob Nagler <nagler@bivio.biz> writes:
> I'm trying to understand how I can get the planner to always do the
> right thing with this query:

>     SELECT
>     aa_t.min_date_time
>     FROM
>     aa_t
>     , bb_t
>     , cc_t
>     WHERE bb_t.bb_id = aa_t.bb_id
>     AND aa_t.realm_id = cc_t.realm_id
>     AND aa_t.server_id = 21
>     ORDER BY aa_t.min_date_time desc
>     LIMIT 1
>     OFFSET 674


>                ->  Index Scan Backward using aa_t20 on aa_t  (cost=0.00..76738.77 rows=3454 width=46) (actual
time=0.10..31.30rows=676 loops=1) 
>                      Filter: (server_id = 21::numeric)

The reason the planner does not much like this plan is that it's
estimating that quite a lot of rows will have to be hit in min_date_time
order before it finds enough rows with server_id = 21.  Thus the high
cost estimate for the above step.

I suspect that the reason you like this plan is that there's actually
substantial correlation between server_id and min_date_time, such that
the required rows are found quickly.  Before trying to force the planner
into what you consider an optimal plan, you had better ask yourself
whether you can expect that correlation to hold up in the future.
If not, your plan could become pessimal pretty quickly.

I'd suggest creating a double-column index:

    create index aa_ti on aa_t(server_id, min_date_time);

and altering the query to read

    ORDER BY aa_t.server_id DESC, aa_t.min_date_time DESC

(you need this kluge to make sure the planner recognizes that the new
index matches the ORDER BY request).  Then you should get a plan with
a much smaller cost coefficient for this step.

            regards, tom lane

PS: does server_id really need to be NUMERIC?  Why not integer, or at
worst bigint?

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Selecting random rows efficiently
Next
From: Tom Lane
Date:
Subject: Re: Selecting random rows efficiently