Re: How to force Nested Loop plan? - Mailing list pgsql-performance
From | Rob Nagler |
---|---|
Subject | Re: How to force Nested Loop plan? |
Date | |
Msg-id | 16208.50934.351000.90704@gargle.gargle.HOWL Whole thread Raw |
In response to | Re: How to force Nested Loop plan? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: How to force Nested Loop plan?
Re: How to force Nested Loop plan? |
List | pgsql-performance |
Tom Lane writes: > The reason the planner does not much like this plan is that it's > estimating that quite a lot of rows will have to be hit in min_date_time > order before it finds enough rows with server_id = 21. Thus the high > cost estimate for the above step. Thanks for the speedy and useful reply! More questions follow. :) Very interesting. How does it know "quite a lot"? Is there something I can do to get the planner to analyze the data better? > I suspect that the reason you like this plan is that there's actually > substantial correlation between server_id and min_date_time, such that > the required rows are found quickly. Before trying to force the planner > into what you consider an optimal plan, you had better ask yourself > whether you can expect that correlation to hold up in the future. > If not, your plan could become pessimal pretty quickly. The correlation holds. min_date_time increases over time as records are inserted. server_id is uniformly distributed over time. There's no randomness. There is at least one 21 record for every value of min_date_time. 21 is a special server_id containing aggregate (denormalized) data for the other servers. I thought about putting it in a separate table, but this would complicate the code as the data is identical to the non-aggregated case. Do you have any suggestions for organizing the data/query now that you know this? > I'd suggest creating a double-column index: Thanks. I'll try this. I'm a very big fan of declarative programming. However, there's a danger in declarative programming when the interperter isn't smart enough. When I add this index, I will slow down inserts (about 20K/day) and increase data size (this is the second largest table in the database). Moreover, if the planner is improved, I've should fix my code, delete the index, etc. Is there a way of giving the planner direct hints as in Oracle? They can be ignored when the optimizer is improved, just as "register" is ignored by C compilers nowadays. Adding the extra index and ORDER BY is also not easy in our case. The query is dynamically generated. I can force the query ORDER BY to be whatever I want, but I would lose the ability to do interesting things, like the automatic generation of ORDER BY when someone clicks on a column header in the application. Indeed there are times when people want to sort on other columns in the query. I reduced the problem to the salient details for my post to this board. What if the ORDER BY was: ORDER BY aa_t.server_id DESC, cc_t.name ASC Would the planner do the right thing? > PS: does server_id really need to be NUMERIC? Why not integer, or at > worst bigint? It is a NUMERIC(18). It could be a bigint. What would be the change in performance of this query if we changed it to bigint? BTW, my customer is probably going to be switching to Oracle. This particular query has been one of the reasons. Maybe this change will help us stay with Postgres. Thanks, Rob
pgsql-performance by date: