Thread: JIT compilation per plan node
Hi hackers,
After discussing this with David offlist, I decided to reinitiate this discussion that has already been raised and discussed several times in the past. [1] [2]
Currently, if JIT is enabled, the decision for JIT compilation is purely tied to the total cost of the query. The number of expressions to be JIT compiled is not taken into consideration, however the time spent JITing also depends on that number. This may cause the cost of JITing to become too much that it hurts rather than improves anything.
An example case would be that you have many partitions and run a query that touches one, or only a few, of those partitions. If there is no partition pruning done in planning time, all 1000 partitions are JIT compiled while most will not even be executed.
Proposed patch (based on the patch from [1]) simply changes consideration of JIT from plan level to per-plan-node level. Instead of depending on the total cost, we decide whether to perform JIT on a node or not by considering only that node's cost. This allows us to only JIT compile plan nodes with high costs.
Here is a small test case to see the issue and the benefit of the patch:
CREATE TABLE listp(a int, b int) PARTITION BY LIST(a);
SELECT 'CREATE TABLE listp'|| x || ' PARTITION OF listp FOR VALUES IN ('||x||');' FROM generate_Series(1,1000) x; \gexec
INSERT INTO listp SELECT 1,x FROM generate_series(1,10000000) x;
EXPLAIN (VERBOSE, ANALYZE) SELECT COUNT(*) FROM listp WHERE b < 0;
master jit=off:
Planning Time: 25.113 ms
Execution Time: 315.896 ms
master jit=on:
Planning Time: 24.664 ms
JIT:
Functions: 9008
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 290.705 ms (Deform 108.274 ms), Inlining 0.000 ms, Optimization 165.991 ms, Emission 3232.775 ms, Total 3689.472 ms
Execution Time: 1612.817 ms
patch jit=on:
Planning Time: 24.055 ms
JIT:
Functions: 17
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.463 ms (Deform 0.232 ms), Inlining 0.000 ms, Optimization 0.766 ms, Emission 11.609 ms, Total 13.837 ms
Execution Time: 299.721 ms
A bit more on what this patch does:
- It introduces a new context to keep track of the number of estimated calls and if JIT is decided for each node that the context applies.
- The number of estimated calls are especially useful where a node is expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested Loop. Knowing the estimated number of calls for a node allows us to rely on total cost multiplied by the estimated calls instead of only total cost for the node.
- For each node, the planner considers if the node should be JITed. If the cost of the node * the number of estimated calls is greater than jit_above_cost, it's decided to be JIT compiled. Note that this changes the meaning of jit_above_cost, it's now a threshold for a single plan node and not the whole query. Additionally, this change in JIT consideration is only for JIT compilations. Inlining and optimizations continue to be for the whole query and based on the overall cost of the query.
- EXPLAIN shows estimated number of "loops" and whether JIT is true or not for the node. For text format, JIT=true/false information is shown only if it's VERBOSE. (no reason to not show this info even if not VERBOSE. Showing for only VERBOSE just requires less changes in tests, so I did this for simplicity at the moment).
There are also some things that I'm not sure of:
- What are other places where a node is likely to be rescanned, thus we need to take estimated calls into account properly? Maybe recursive CTEs?
- This change can make jit_above_cost mean something different. Should we rename it or introduce a new GUC? If it'll be kept as it is now, then it would probably be better to change its default value at least.
- What can we do for inlining and optimization? AFAIU performing those per node may be very costly and not make that much sense.But I'm not sure about how to handle those operations.
- What about parallel queries? Total cost of the node is divided by the number of workers, which can seem like the cost reduced quite a bit. The patch amplifies the cost by the number of workers (by setting estimated calls to the number of workers) to make it more likely to perform JIT for Gather/Gather Merge nodes. OTOH JIT compilations are performed per worker and this can make workers decide JIT compile when it's not really needed.
I'd appreciate any thought/feedback.
Thanks,
Attachment
Hi Melih, On 1/2/24 20:50, Melih Mutlu wrote: > Hi hackers, > > After discussing this with David offlist, I decided to reinitiate this > discussion that has already been raised and discussed several times in the > past. [1] [2] > > Currently, if JIT is enabled, the decision for JIT compilation is purely > tied to the total cost of the query. The number of expressions to be JIT > compiled is not taken into consideration, however the time spent JITing > also depends on that number. This may cause the cost of JITing to become > too much that it hurts rather than improves anything. > An example case would be that you have many partitions and run a query that > touches one, or only a few, of those partitions. If there is no partition > pruning done in planning time, all 1000 partitions are JIT compiled while > most will not even be executed. > > Proposed patch (based on the patch from [1]) simply changes consideration > of JIT from plan level to per-plan-node level. Instead of depending on the > total cost, we decide whether to perform JIT on a node or not by > considering only that node's cost. This allows us to only JIT compile plan > nodes with high costs. > > Here is a small test case to see the issue and the benefit of the patch: > > CREATE TABLE listp(a int, b int) PARTITION BY LIST(a); > SELECT 'CREATE TABLE listp'|| x || ' PARTITION OF listp FOR VALUES IN > ('||x||');' FROM generate_Series(1,1000) x; \gexec > INSERT INTO listp SELECT 1,x FROM generate_series(1,10000000) x; > > EXPLAIN (VERBOSE, ANALYZE) SELECT COUNT(*) FROM listp WHERE b < 0; > > master jit=off: > Planning Time: 25.113 ms > Execution Time: 315.896 ms > > master jit=on: > Planning Time: 24.664 ms > JIT: > Functions: 9008 > Options: Inlining false, Optimization false, Expressions true, Deforming > true > Timing: Generation 290.705 ms (Deform 108.274 ms), Inlining 0.000 ms, > Optimization 165.991 ms, Emission 3232.775 ms, Total 3689.472 ms > Execution Time: 1612.817 ms > > patch jit=on: > Planning Time: 24.055 ms > JIT: > Functions: 17 > Options: Inlining false, Optimization false, Expressions true, Deforming > true > Timing: Generation 1.463 ms (Deform 0.232 ms), Inlining 0.000 ms, > Optimization 0.766 ms, Emission 11.609 ms, Total 13.837 ms > Execution Time: 299.721 ms > Thanks for the updated / refreshed patch. I think one of the main challenges this patch faces is that there's a couple old threads with previous attempts, and the current thread simply builds on top of them, without explaining stuff fully. But people either don't realize that, or don't have time to read old threads just in case, so can't follow some of the decisions :-( I think it'd be good to maybe try to explain some of the problems and solutions more thoroughly, or at least point to the relevant places in the old threads ... > > A bit more on what this patch does: > - It introduces a new context to keep track of the number of estimated > calls and if JIT is decided for each node that the context applies. AFAIK this is an attempt to deal with passing the necessary information while constructing the plan, which David originally tried [1] doing by passing est_calls during create_plan ... I doubt CreatePlanContext is a great way to achieve this. For one, it breaks the long-standing custom that PlannerInfo is the first parameter, usually followed by RelOptInfo, etc. CreatePlanContext is added to some functions (but not all), which makes it ... unpredictable. FWIW it's not clear to me if/how this solves the problem with early create_plan() calls for subplans. Or is it still the same? Wouldn't it be simpler to just build the plan as we do now, and then have an expression_tree_walker that walks the complete plan top-down, inspects the nodes, enables JIT where appropriate and so on? That can have arbitrary context, no problem with that. Considering we decide JIT pretty late anyway (long after costing and other stuff that might affect the plan selection), the result should be exactly the same, without the extensive createplan.c disruption ... (usual caveat: I haven't tried, maybe there's something that means this can't work) > - The number of estimated calls are especially useful where a node is > expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested > Loop. Knowing the estimated number of calls for a node allows us to rely on > total cost multiplied by the estimated calls instead of only total cost for > the node. Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...) more likely to be rescanned compared to other nodes? > - For each node, the planner considers if the node should be JITed. If the > cost of the node * the number of estimated calls is greater than > jit_above_cost, it's decided to be JIT compiled. Note that this changes > the meaning of jit_above_cost, it's now a threshold for a single plan node > and not the whole query. Additionally, this change in JIT consideration is > only for JIT compilations. Inlining and optimizations continue to be for > the whole query and based on the overall cost of the query. It's not clear to me why JIT compilation is decided for each node, while the inlining/optimization is decided for the plan as a whole. I'm not familiar with the JIT stuff, so maybe it's obvious to others ... > - EXPLAIN shows estimated number of "loops" and whether JIT is true or not > for the node. For text format, JIT=true/false information is shown only if > it's VERBOSE. (no reason to not show this info even if not VERBOSE. Showing > for only VERBOSE just requires less changes in tests, so I did this for > simplicity at the moment). > typo in sgml docs: ovarall > > There are also some things that I'm not sure of: > - What are other places where a node is likely to be rescanned, thus we > need to take estimated calls into account properly? Maybe recursive CTEs? Why would it matter if a node is more/less likely to be rescanned? Either the node is rescanned in the plan or not, and we have nloops to know how many rescans to expect. > - This change can make jit_above_cost mean something different. Should we > rename it or introduce a new GUC? If it'll be kept as it is now, then it > would probably be better to change its default value at least. You mean it changes from per-query to per-node threshold? In general I think we should not repurpose GUCs (because people are likely to copy and keep the old config, not realizing it works differently). But I'm not sure this change is different enough to be an issue. And it's in the opposite direction than usually causes problems (i.e. it would disable JIT in cases where it was enabled before). > - What can we do for inlining and optimization? AFAIU performing those per > node may be very costly and not make that much sense.But I'm not sure about > how to handle those operations. Not sure. I don't think I understand JIT details enough to have a good opinion on this. > - What about parallel queries? Total cost of the node is divided by the > number of workers, which can seem like the cost reduced quite a bit. The > patch amplifies the cost by the number of workers (by setting estimated > calls to the number of workers) to make it more likely to perform JIT for > Gather/Gather Merge nodes. OTOH JIT compilations are performed per worker > and this can make workers decide JIT compile when it's not really needed. > Using the number of workers as "number of calls" seems wrong to me. Why shouldn't each worker do the JIT decision on it's own, as if it was the only worker running (but seeing only it's fraction of the data)? Kinda as if there were multiple independent backends running "small" queries? regards [1] https://www.postgresql.org/message-id/CAApHDvoq5VhV%3D2euyjgBN2bC8Bds9Dtr0bG7R%3DreeefJWKJRXA%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 20 Feb 2024 at 05:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I doubt CreatePlanContext is a great way to achieve this. For one, it > breaks the long-standing custom that PlannerInfo is the first parameter, > usually followed by RelOptInfo, etc. CreatePlanContext is added to some > functions (but not all), which makes it ... unpredictable. I suggested this to Melih as a way to do this based on what Andy wrote in [1]. I agree with Andy that it's not great to add est_calls to every function in createplan.c. I feel that CreatePlanContext is a way to take the hit of adding a parameter once and hopefully never having to do it again to this degree. I wondered if PlannerInfo could be a field in CreatePlanContext. > FWIW it's not clear to me if/how this solves the problem with early > create_plan() calls for subplans. Or is it still the same? > > Wouldn't it be simpler to just build the plan as we do now, and then > have an expression_tree_walker that walks the complete plan top-down, > inspects the nodes, enables JIT where appropriate and so on? That can > have arbitrary context, no problem with that. Why walk the entire plan tree again to do something we could do when building it in the first place? Recursively walking trees isn't great from a performance point of view. It would be nice to avoid this if we can find some other way to handle subplans. I do have a few other reasons up my sleeve that subplan creation should be delayed until later, so maybe we should fix that to unblock those issues. > Considering we decide JIT pretty late anyway (long after costing and > other stuff that might affect the plan selection), the result should be > exactly the same, without the extensive createplan.c disruption ... > > (usual caveat: I haven't tried, maybe there's something that means this > can't work) It's not like we can look at the top-node's cost as a pre-check to skip the recursive step for cheap plans as it's perfectly valid that a node closer to the root of the plan tree have a lower total cost than that node's subnodes. e.g LIMIT 1. > > - The number of estimated calls are especially useful where a node is > > expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested > > Loop. Knowing the estimated number of calls for a node allows us to rely on > > total cost multiplied by the estimated calls instead of only total cost for > > the node. > > Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...) > more likely to be rescanned compared to other nodes? I think Melih is listing nodes that can change the est_calls. Any node can be rescanned, but only a subset of nodes can adjust the number of times they call their subnode vs how many times they themselves are called. > > - For each node, the planner considers if the node should be JITed. If the > > cost of the node * the number of estimated calls is greater than > > jit_above_cost, it's decided to be JIT compiled. Note that this changes > > the meaning of jit_above_cost, it's now a threshold for a single plan node > > and not the whole query. Additionally, this change in JIT consideration is > > only for JIT compilations. Inlining and optimizations continue to be for > > the whole query and based on the overall cost of the query. > > It's not clear to me why JIT compilation is decided for each node, while > the inlining/optimization is decided for the plan as a whole. I'm not > familiar with the JIT stuff, so maybe it's obvious to others ... This is a problem with LLVM, IIRC. The problem is it's a decision that has to be made for an entire compilation unit and it can't be decided at the expression level. This is pretty annoying as it's pretty hard to decide the best logic to use to enable optimisations and inlining :-( I think the best thing we could discuss right now is, is this the best way to fix the JIT costing problem. In [2] I did link to a complaint about the JIT costings. See [3]. The OP there wanted to keep the plan private, but I did get to see it and described the problem on the list. Also, I don't happen to think the decision about JITting per plan node is perfect as the node's total costs can be high for reasons other than the cost of evaluating expressions. Also, the number of times a given expression is evaluated can vary quite a bit based on when the expression is evaluated. For example, a foreign table scan that does most of the filtering remotely, but has a non-shippable expr that needs to be evaluated locally. The foreign scan might be very expensive, especially if lots of filtering is done by a Seq Scan and not many rows might make it back to the local server to benefit from JITting the non-shippable expression. A counter-example is the join condition of a non-parameterized nested loop. Those get evaluated n_outer_rows * n_inner_rows times. I think the savings JIT gives us on evaluation of expressions is going to be more closely tied to the number of times an expression is evaluated than the total cost of the node. However, it's likely more complex for optimisations and inlining as I imagine the size and complexity of the comparison function matters too. It would be good to all agree on how we're going to fix this problem exactly before Melih gets in too deep fixing the finer details of the patch. If anyone isn't convinced enough there's a problem with the JIT costings then I can see if I can dig up other threads where this is being complained about. Does anyone want to disagree with the general idea of making the compilation decision based on the total cost of the node? Or have a better idea? David [1] https://postgr.es/m/CAKU4AWqqSAi%2B-1ZaFawY300WknH79J9dhx%3DpU5%2BbyAbShjUjCQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com [3] https://www.postgresql.org/message-id/7736C40E-6DB5-4E7A-8FE3-4B2AB8E22793@elevated-dev.com
David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 20 Feb 2024 at 05:26, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> Wouldn't it be simpler to just build the plan as we do now, and then >> have an expression_tree_walker that walks the complete plan top-down, >> inspects the nodes, enables JIT where appropriate and so on? That can >> have arbitrary context, no problem with that. > Why walk the entire plan tree again to do something we could do when > building it in the first place? FWIW, I seriously doubt that an extra walk of the plan tree is even measurable compared to the number of cycles JIT compilation will expend if it's called. So I don't buy your argument here. We would be better off to do this in a way that's clean and doesn't add overhead for non-JIT-enabled builds. regards, tom lane
On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, I seriously doubt that an extra walk of the plan tree is even > measurable compared to the number of cycles JIT compilation will > expend if it's called. So I don't buy your argument here. > We would be better off to do this in a way that's clean and doesn't > add overhead for non-JIT-enabled builds. The extra walk of the tree would need to be done for every plan, not just the ones where we do JIT. I'd rather find a way to not add this extra plan tree walk, especially since the vast majority of cases on an average instance won't be doing any JIT. David
On 2/20/24 06:38, David Rowley wrote: > On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> FWIW, I seriously doubt that an extra walk of the plan tree is even >> measurable compared to the number of cycles JIT compilation will >> expend if it's called. So I don't buy your argument here. >> We would be better off to do this in a way that's clean and doesn't >> add overhead for non-JIT-enabled builds. > > The extra walk of the tree would need to be done for every plan, not > just the ones where we do JIT. I'd rather find a way to not add this > extra plan tree walk, especially since the vast majority of cases on > an average instance won't be doing any JIT. > I believe Tom was talking about non-JIT-enabled-builds, i.e. builds that either don't support JIT at all, or where jit=off. Those would certainly not need the extra walk. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 20 Feb 2024 at 23:04, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/20/24 06:38, David Rowley wrote: > > On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> FWIW, I seriously doubt that an extra walk of the plan tree is even > >> measurable compared to the number of cycles JIT compilation will > >> expend if it's called. So I don't buy your argument here. > >> We would be better off to do this in a way that's clean and doesn't > >> add overhead for non-JIT-enabled builds. > > > > The extra walk of the tree would need to be done for every plan, not > > just the ones where we do JIT. I'd rather find a way to not add this > > extra plan tree walk, especially since the vast majority of cases on > > an average instance won't be doing any JIT. > > > > I believe Tom was talking about non-JIT-enabled-builds, i.e. builds that > either don't support JIT at all, or where jit=off. Those would certainly > not need the extra walk. I don't believe so as he talked about the fact that the JIT cycles would drown out the tree walk. There are no JIT cycles when the cost threshold isn't met, but we still incur the cost of walking the plan tree. David
On 2/20/24 06:14, David Rowley wrote: > On Tue, 20 Feb 2024 at 05:26, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I doubt CreatePlanContext is a great way to achieve this. For one, it >> breaks the long-standing custom that PlannerInfo is the first parameter, >> usually followed by RelOptInfo, etc. CreatePlanContext is added to some >> functions (but not all), which makes it ... unpredictable. > > I suggested this to Melih as a way to do this based on what Andy wrote > in [1]. I agree with Andy that it's not great to add est_calls to > every function in createplan.c. I feel that CreatePlanContext is a way > to take the hit of adding a parameter once and hopefully never having > to do it again to this degree. I wondered if PlannerInfo could be a > field in CreatePlanContext. > You mean we'd be adding more parameters to CreatePlanContext in the future? Not sure that's a great idea, there are reasons why we have separate arguments in function signature and not a single struct. I think contexts are nice to group attributes with a particular purpose, not as a replacement for arguments to make function signatures simpler. >> FWIW it's not clear to me if/how this solves the problem with early >> create_plan() calls for subplans. Or is it still the same? >> >> Wouldn't it be simpler to just build the plan as we do now, and then >> have an expression_tree_walker that walks the complete plan top-down, >> inspects the nodes, enables JIT where appropriate and so on? That can >> have arbitrary context, no problem with that. > > Why walk the entire plan tree again to do something we could do when > building it in the first place? Recursively walking trees isn't great > from a performance point of view. It would be nice to avoid this if we > can find some other way to handle subplans. I do have a few other > reasons up my sleeve that subplan creation should be delayed until > later, so maybe we should fix that to unblock those issues. > >> Considering we decide JIT pretty late anyway (long after costing and >> other stuff that might affect the plan selection), the result should be >> exactly the same, without the extensive createplan.c disruption ... >> >> (usual caveat: I haven't tried, maybe there's something that means this >> can't work) > > It's not like we can look at the top-node's cost as a pre-check to > skip the recursive step for cheap plans as it's perfectly valid that a > node closer to the root of the plan tree have a lower total cost than > that node's subnodes. e.g LIMIT 1. > I'd argue that's actually a reason to do the precheck, exactly because of the LIMIT. The fact that some node has high total cost does not matter if there is LIMIT 1 above it. What matters is which fraction of the plan we execute, not the total cost. Imagine you have something like -> Limit 1 (cost=0..1 rows=1 ...) -> Seqscan (cost=0..100000000 rows=1000000 ...) I'd argue JIT-ing the seqscan is likely pointless, because on average we'll execute ~1/1000000 of the scan, and the actual cost will be ~100. >>> - The number of estimated calls are especially useful where a node is >>> expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested >>> Loop. Knowing the estimated number of calls for a node allows us to rely on >>> total cost multiplied by the estimated calls instead of only total cost for >>> the node. >> >> Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...) >> more likely to be rescanned compared to other nodes? > > I think Melih is listing nodes that can change the est_calls. Any > node can be rescanned, but only a subset of nodes can adjust the > number of times they call their subnode vs how many times they > themselves are called. > OK >>> - For each node, the planner considers if the node should be JITed. If the >>> cost of the node * the number of estimated calls is greater than >>> jit_above_cost, it's decided to be JIT compiled. Note that this changes >>> the meaning of jit_above_cost, it's now a threshold for a single plan node >>> and not the whole query. Additionally, this change in JIT consideration is >>> only for JIT compilations. Inlining and optimizations continue to be for >>> the whole query and based on the overall cost of the query. >> >> It's not clear to me why JIT compilation is decided for each node, while >> the inlining/optimization is decided for the plan as a whole. I'm not >> familiar with the JIT stuff, so maybe it's obvious to others ... > > This is a problem with LLVM, IIRC. The problem is it's a decision > that has to be made for an entire compilation unit and it can't be > decided at the expression level. This is pretty annoying as it's > pretty hard to decide the best logic to use to enable optimisations > and inlining :-( > > I think the best thing we could discuss right now is, is this the best > way to fix the JIT costing problem. In [2] I did link to a complaint > about the JIT costings. See [3]. The OP there wanted to keep the plan > private, but I did get to see it and described the problem on the > list. > > Also, I don't happen to think the decision about JITting per plan node > is perfect as the node's total costs can be high for reasons other > than the cost of evaluating expressions. Also, the number of times a > given expression is evaluated can vary quite a bit based on when the > expression is evaluated. For example, a foreign table scan that does > most of the filtering remotely, but has a non-shippable expr that > needs to be evaluated locally. The foreign scan might be very > expensive, especially if lots of filtering is done by a Seq Scan and > not many rows might make it back to the local server to benefit from > JITting the non-shippable expression. > > A counter-example is the join condition of a non-parameterized nested > loop. Those get evaluated n_outer_rows * n_inner_rows times. > > I think the savings JIT gives us on evaluation of expressions is going > to be more closely tied to the number of times an expression is > evaluated than the total cost of the node. However, it's likely more > complex for optimisations and inlining as I imagine the size and > complexity of the comparison function matters too. > > It would be good to all agree on how we're going to fix this problem > exactly before Melih gets in too deep fixing the finer details of the > patch. If anyone isn't convinced enough there's a problem with the JIT > costings then I can see if I can dig up other threads where this is > being complained about. > > Does anyone want to disagree with the general idea of making the > compilation decision based on the total cost of the node? Or have a > better idea? > I certainly agree that the current JIT costing is quite crude, and we've all seen cases where the decision turns out to not be great. And I think the plan to make the decisions at the node level makes sense, so +1 to that in general. And I think you're right that looking just at the node total cost may not be sufficient - that we may need a better cost model, considering how many times an expression is executed and so on. But I think we should try to do this in smaller steps, meaningful on their own, otherwise we won't move at all. The two threads linked by Melih are ~4y old and *nothing* changed since then, AFAIK. I think it's reasonable to start by moving the decision to the node level - it's where the JIT happens, anyway. It may not be perfect, but it seems like a clear improvement. And if we then choose to improve the "JIT cost model" to address some of the issues you pointed out, surely that would need to happen at the node level too ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I certainly agree that the current JIT costing is quite crude, and we've > all seen cases where the decision turns out to not be great. And I think > the plan to make the decisions at the node level makes sense, so +1 to > that in general. Seems reasonable to me also. > And I think you're right that looking just at the node total cost may > not be sufficient - that we may need a better cost model, considering > how many times an expression is executed and so on. But I think we > should try to do this in smaller steps, meaningful on their own, > otherwise we won't move at all. The two threads linked by Melih are ~4y > old and *nothing* changed since then, AFAIK. > > I think it's reasonable to start by moving the decision to the node > level - it's where the JIT happens, anyway. It may not be perfect, but > it seems like a clear improvement. And if we then choose to improve the > "JIT cost model" to address some of the issues you pointed out, surely > that would need to happen at the node level too ... I'm not sure I understand whether you (Tomas) think that this patch is a good idea or a bad idea as it stands. I read the first of these two paragraphs to suggest that the patch hasn't really evolved much in the last few years, perhaps suggesting that if it wasn't good enough to commit back then, it still isn't now. But the second of these two paragraphs seems more supportive. From my own point of view, I definitely agree with David's statement that what we really want to know is how many times each expression will be evaluated. If we had that information, or just an estimate, I think we could make much better decisions in this area. But we don't have that infrastructure now, and it doesn't seem easy to create, so it seems to me that what we have to decide now is whether applying a cost threshold on a per-plan-node basis will produce better or worse results than what making one decision for the whole plan. David's provided an example of where it does indeed work better back in https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com - but could there be enough cases where the opposite happens to make us think that the patch is overall a bad idea? I personally find that a bit unlikely, although not impossible. I see a couple of ways that using the per-node cost can distort things -- it seems like it will tend to heavily feature JIT for "interior" plan nodes because the cost of a plan node includes it's children -- and as was mentioned previously, it doesn't really care whether the node cost is high because of expression evaluation or something else. But neither of those things seem like they'd be bad enough to make this a bad way forward over all. For the patch to lose, it seems like we'd need a case where the overall plan cost would have been high enough to trigger JIT pre-patch, but most of the benefit would have come from relatively low-cost nodes that don't get JITted post-patch. The easiest way for that to happen is if the planner's estimates are off, but that's not really an argument against this patch as much as it is an argument that query planning is hard in general. A slightly subtler way the patch could lose is if the new threshold is harder to adjust than the old one. For example, imagine that you have a query that does a Cartesian join. That makes the cost of the input nodes rather small compared to the cost of the join node, and it also means that JITting the inner join child in particular is probably rather important. But if you set join_above_cost low enough to JIT that node post-patch, then maybe you'll also JIT a bunch of things that aren't on the inner side of a nested loop and which might therefore not really need JIT. Unless I'm missing something, this is a fairly realistic example of where this patch's approach to costing could turn out to be painful ... but it's not like the current system is pain-free either. I don't really know what's best here, but I'm mildly inclined to believe that the patch might be a change for the better. I have not reviewed the implementation and have no comment on whether it's good or bad from that point of view. -- Robert Haas EDB: http://www.enterprisedb.com
Thanks for chipping in here. On Fri, 15 Mar 2024 at 08:14, Robert Haas <robertmhaas@gmail.com> wrote: > A slightly subtler way the patch could lose is if the new threshold is > harder to adjust than the old one. For example, imagine that you have > a query that does a Cartesian join. That makes the cost of the input > nodes rather small compared to the cost of the join node, and it also > means that JITting the inner join child in particular is probably > rather important. But if you set join_above_cost low enough to JIT > that node post-patch, then maybe you'll also JIT a bunch of things > that aren't on the inner side of a nested loop and which might > therefore not really need JIT. Unless I'm missing something, this is a > fairly realistic example of where this patch's approach to costing > could turn out to be painful ... but it's not like the current system > is pain-free either. I think this case would be covered as the cost of the inner side of the join would be multiplied by the estimated outer-side rows. Effectively, making this part work is the bulk of the patch as we currently don't know the estimated number of loops of a node during create plan. David
On 3/14/24 20:14, Robert Haas wrote: > On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I certainly agree that the current JIT costing is quite crude, and we've >> all seen cases where the decision turns out to not be great. And I think >> the plan to make the decisions at the node level makes sense, so +1 to >> that in general. > > Seems reasonable to me also. > >> And I think you're right that looking just at the node total cost may >> not be sufficient - that we may need a better cost model, considering >> how many times an expression is executed and so on. But I think we >> should try to do this in smaller steps, meaningful on their own, >> otherwise we won't move at all. The two threads linked by Melih are ~4y >> old and *nothing* changed since then, AFAIK. >> >> I think it's reasonable to start by moving the decision to the node >> level - it's where the JIT happens, anyway. It may not be perfect, but >> it seems like a clear improvement. And if we then choose to improve the >> "JIT cost model" to address some of the issues you pointed out, surely >> that would need to happen at the node level too ... > > I'm not sure I understand whether you (Tomas) think that this patch is > a good idea or a bad idea as it stands. I read the first of these two > paragraphs to suggest that the patch hasn't really evolved much in the > last few years, perhaps suggesting that if it wasn't good enough to > commit back then, it still isn't now. But the second of these two > paragraphs seems more supportive. > To clarify, I think the patch is a step in the right direction, and a meaningful improvement. It may not be the perfect solution we imagine (but who knows how far we are from that), but AFAIK moving these decisions to the node level is something the ideal solution would need to do too. The reference to the 4y old patches was meant to support this patch as an improvement - perhaps incomplete, but still an improvement. We keep imagining "perfect solutions" and then end up doing nothing. I recognize there's a risk we may never get to have the ideal solution (e.g. because it requires information we don't possess). But I still think moving the decision to the node level would allow us to do better decisions compared to just doing it for the query as a whole. > From my own point of view, I definitely agree with David's statement > that what we really want to know is how many times each expression > will be evaluated. If we had that information, or just an estimate, I > think we could make much better decisions in this area. But we don't > have that infrastructure now, and it doesn't seem easy to create, so > it seems to me that what we have to decide now is whether applying a > cost threshold on a per-plan-node basis will produce better or worse > results than what making one decision for the whole plan. David's > provided an example of where it does indeed work better back in > https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com > - but could there be enough cases where the opposite happens to make > us think that the patch is overall a bad idea? > Right, this risk or regression is always there, and I'm sure it'd be possible to construct such cases. But considering how crude the current costing is, I'd be surprised if this ends up being a net negative. Also, is the number of executions really the thing we're missing? Surely we know the number of rows the node is dealing with, so we could use this (yes, I realize there are issues, but we deal with that when costing quals too). Isn't it much bigger issue that we have pretty much no cost model for the actual JIT (compilation/optimization) depending on how many expressions it deals with? > I personally find that a bit unlikely, although not impossible. I see > a couple of ways that using the per-node cost can distort things -- it > seems like it will tend to heavily feature JIT for "interior" plan > nodes because the cost of a plan node includes it's children -- and as > was mentioned previously, it doesn't really care whether the node cost > is high because of expression evaluation or something else. But > neither of those things seem like they'd be bad enough to make this a > bad way forward over all. For the patch to lose, it seems like we'd > need a case where the overall plan cost would have been high enough to > trigger JIT pre-patch, but most of the benefit would have come from > relatively low-cost nodes that don't get JITted post-patch. The > easiest way for that to happen is if the planner's estimates are off, > but that's not really an argument against this patch as much as it is > an argument that query planning is hard in general. > > A slightly subtler way the patch could lose is if the new threshold is > harder to adjust than the old one. For example, imagine that you have > a query that does a Cartesian join. That makes the cost of the input > nodes rather small compared to the cost of the join node, and it also > means that JITting the inner join child in particular is probably > rather important. But if you set join_above_cost low enough to JIT > that node post-patch, then maybe you'll also JIT a bunch of things > that aren't on the inner side of a nested loop and which might > therefore not really need JIT. Unless I'm missing something, this is a > fairly realistic example of where this patch's approach to costing > could turn out to be painful ... but it's not like the current system > is pain-free either. > > I don't really know what's best here, but I'm mildly inclined to > believe that the patch might be a change for the better. I have not > reviewed the implementation and have no comment on whether it's good > or bad from that point of view. > I think it would be good to construct a bunch of cases where we think this approach would misbehave, and see if the current code handles that any better and/or if there's a way to improve that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 15 Mar 2024 at 10:13, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > To clarify, I think the patch is a step in the right direction, and a > meaningful improvement. It may not be the perfect solution we imagine > (but who knows how far we are from that), but AFAIK moving these > decisions to the node level is something the ideal solution would need > to do too. I got thinking about this patch again while working on [1]. I want to write this down as I don't quite have time to get fully back into this right now... Currently, during execution, ExecCreateExprSetupSteps() traverses the Node tree of the Expr to figure out the max varattno of for each slot. That's done so all of the tuple deforming happens at once rather than incrementally. Figuring out the max varattno is a price we have to pay for every execution of the query. I think we'd be better off doing that in the planner. To do this, I thought that setrefs.c could do this processing in fix_join_expr / fix_upper_expr and wrap up the expression in a new Node type that stores the max varattno for each special var type. This idea is related to this discussion because another thing that could be stored in the very same struct is the "num_exec" value. I feel the number of executions of an ExprState is a better gauge of how useful JIT will be than the cost of the plan node. Now, looking at set_join_references(), the execution estimates are not exactly perfect. For example; #define NUM_EXEC_QUAL(parentplan) ((parentplan)->plan_rows * 2.0) that's not a great estimate for a Nested Loop's joinqual, but we could easily make efforts to improve those and that could likely be done independently and concurrently with other work to make JIT more granular. The problem with doing this is that there's just a huge amount of code churn in the executor. I am keen to come up with a prototype so I can get a better understanding of if this is a practical solution. I don't want to go down that path if it's just me that thinks the number of times an ExprState is evaluated is a better measure to go on for JIT vs no JIT than the plan node's total cost. Does anyone have any thoughts on that idea? David [1] https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=kEa4g@mail.gmail.com
On Tue, 20 Feb 2024 at 06:38, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > FWIW, I seriously doubt that an extra walk of the plan tree is even > > measurable compared to the number of cycles JIT compilation will > > expend if it's called. So I don't buy your argument here. > > We would be better off to do this in a way that's clean and doesn't > > add overhead for non-JIT-enabled builds. > > The extra walk of the tree would need to be done for every plan, not > just the ones where we do JIT. I'd rather find a way to not add this > extra plan tree walk, especially since the vast majority of cases on > an average instance won't be doing any JIT. I'm not saying I'd prefer the extra walk, but I don't think you'd need to do this extra walk for all plans. Afaict you could skip the extra walk when top_plan->total_cost < jit_above_cost. i.e. only doing the extra walk to determine which exact nodes to JIT for cases where we currently JIT all nodes. That would limit the extra walk overhead to cases where we currently already spend significant resources on JITing stuff.
On Tue, 14 May 2024 at 19:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > I'm not saying I'd prefer the extra walk, but I don't think you'd need > to do this extra walk for all plans. Afaict you could skip the extra > walk when top_plan->total_cost < jit_above_cost. i.e. only doing the > extra walk to determine which exact nodes to JIT for cases where we > currently JIT all nodes. That would limit the extra walk overhead to > cases where we currently already spend significant resources on JITing > stuff. You could do that, but wouldn't it just cause us to sometimes miss doing JIT for plan nodes that have a total cost above the top node's? To me, it seems like a shortcut that someone might complain about one day and fixing it might require removing the short cut, which would lead to traversing the whole plan tree. Here's a plan where the total cost of a subnode is greater than the total cost of the top node: set max_parallel_workers_per_gather=0; create table t0 as select a from generate_Series(1,1000000)a; analyze t0; explain select * from t0 order by a limit 1; QUERY PLAN ------------------------------------------------------------------------ Limit (cost=19480.00..19480.00 rows=1 width=4) -> Sort (cost=19480.00..21980.00 rows=1000000 width=4) Sort Key: a -> Seq Scan on t0 (cost=0.00..14480.00 rows=1000000 width=4) Anyway, I don't think it's worth talking in detail about specifics about implementation for the total cost of the node idea when the whole replacement costing model design is still undecided. It feels like we're trying to decide what colour to paint the bathroom when we haven't even come up with a design for the house yet. I'd be interested to hear your thoughts on using the estimated number of invocations of the function to drive the JIT flags on a per-expression level. David
On Tue, 14 May 2024 at 10:19, David Rowley <dgrowleyml@gmail.com> wrote: > Here's a plan where the total cost of a subnode is greater than the > total cost of the top node: Ah I didn't realize it was possible for that to happen. **reads up on plan costs** This actually makes me think that using total_cost of the sub-nodes is not the enough to determine determining if the node should be JITet. We wouldn't want to start jitting plans like this, i.e. introducing all the JIT setup overhead for just a single row: set max_parallel_workers_per_gather=0; create table t0 as select a from generate_Series(1,1000000)a; analyze t0; explain select a+a*a+a*a+a from t0 limit 1; QUERY PLAN ----------------------------------------------------- Limit (cost=0.00..0.03 rows=1 width=4) -> Seq Scan on t0 (cost=0.00..26980.00 rows=1000000 width=4) An easy way to work around that issue I guess is by using the minimum total_cost of all the total_costs from the current sub-node up to the root node. The current minimum could be passed along as a part of the context I guess.