Thread: JIT compilation per plan node

JIT compilation per plan node

From

Melih Mutlu

Date:

02 January 2024, 19:50:17

Hi hackers,

After discussing this with David offlist, I decided to reinitiate this discussion that has already been raised and discussed several times in the past. [1] [2]

Currently, if JIT is enabled, the decision for JIT compilation is purely tied to the total cost of the query. The number of expressions to be JIT compiled is not taken into consideration, however the time spent JITing also depends on that number. This may cause the cost of JITing to become too much that it hurts rather than improves anything.

An example case would be that you have many partitions and run a query that touches one, or only a few, of those partitions. If there is no partition pruning done in planning time, all 1000 partitions are JIT compiled while most will not even be executed.

Proposed patch (based on the patch from [1]) simply changes consideration of JIT from plan level to per-plan-node level. Instead of depending on the total cost, we decide whether to perform JIT on a node or not by considering only that node's cost. This allows us to only JIT compile plan nodes with high costs.

Here is a small test case to see the issue and the benefit of the patch:

CREATE TABLE listp(a int, b int) PARTITION BY LIST(a);

SELECT 'CREATE TABLE listp'|| x || ' PARTITION OF listp FOR VALUES IN ('||x||');' FROM generate_Series(1,1000) x; \gexec

INSERT INTO listp SELECT 1,x FROM generate_series(1,10000000) x;

EXPLAIN (VERBOSE, ANALYZE) SELECT COUNT(*) FROM listp WHERE b < 0;

master jit=off:

Planning Time: 25.113 ms

Execution Time: 315.896 ms

master jit=on:

Planning Time: 24.664 ms

JIT:

Functions: 9008

Options: Inlining false, Optimization false, Expressions true, Deforming true

Timing: Generation 290.705 ms (Deform 108.274 ms), Inlining 0.000 ms, Optimization 165.991 ms, Emission 3232.775 ms, Total 3689.472 ms

Execution Time: 1612.817 ms

patch jit=on:

Planning Time: 24.055 ms

JIT:

Functions: 17

Options: Inlining false, Optimization false, Expressions true, Deforming true

Timing: Generation 1.463 ms (Deform 0.232 ms), Inlining 0.000 ms, Optimization 0.766 ms, Emission 11.609 ms, Total 13.837 ms

Execution Time: 299.721 ms

A bit more on what this patch does:

- It introduces a new context to keep track of the number of estimated calls and if JIT is decided for each node that the context applies.

- The number of estimated calls are especially useful where a node is expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested Loop. Knowing the estimated number of calls for a node allows us to rely on total cost multiplied by the estimated calls instead of only total cost for the node.

- For each node, the planner considers if the node should be JITed. If the cost of the node * the number of estimated calls is greater than jit_above_cost, it's decided to be JIT compiled. Note that this changes the meaning of jit_above_cost, it's now a threshold for a single plan node and not the whole query. Additionally, this change in JIT consideration is only for JIT compilations. Inlining and optimizations continue to be for the whole query and based on the overall cost of the query.

- EXPLAIN shows estimated number of "loops" and whether JIT is true or not for the node. For text format, JIT=true/false information is shown only if it's VERBOSE. (no reason to not show this info even if not VERBOSE. Showing for only VERBOSE just requires less changes in tests, so I did this for simplicity at the moment).

There are also some things that I'm not sure of:

- What are other places where a node is likely to be rescanned, thus we need to take estimated calls into account properly? Maybe recursive CTEs?

- This change can make jit_above_cost mean something different. Should we rename it or introduce a new GUC? If it'll be kept as it is now, then it would probably be better to change its default value at least.

- What can we do for inlining and optimization? AFAIU performing those per node may be very costly and not make that much sense.But I'm not sure about how to handle those operations.

- What about parallel queries? Total cost of the node is divided by the number of workers, which can seem like the cost reduced quite a bit. The patch amplifies the cost by the number of workers (by setting estimated calls to the number of workers) to make it more likely to perform JIT for Gather/Gather Merge nodes. OTOH JIT compilations are performed per worker and this can make workers decide JIT compile when it's not really needed.

I'd appreciate any thought/feedback.

Thanks,

Melih Mutlu

Microsoft

[1] https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CAApHDvrEoQ5p61NjDCKVgEWaH0qm1KprYw2-7m8-6ZGGJ8A2Dw%40mail.gmail.com

Thread: JIT compilation per plan node

Attachment