Re: Lazy JIT IR code generation to increase JIT speed with partitions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date
Msg-id 20201230015757.2hlnyp5k2ww5hjyf@alap3.anarazel.de
Whole thread Raw
In response to Lazy JIT IR code generation to increase JIT speed with partitions  (Luc Vlaming <luc@swarm64.com>)
Responses Re: Lazy JIT IR code generation to increase JIT speed with partitions
List pgsql-hackers
Hi,

Great to see work in this area!

On 2020-12-28 09:44:26 +0100, Luc Vlaming wrote:
> I would like to propose a small patch to the JIT machinery which makes the
> IR code generation lazy. The reason for postponing the generation of the IR
> code is that with partitions we get an explosion in the number of JIT
> functions generated as many child tables are involved, each with their own
> JITted functions, especially when e.g. partition-aware joins/aggregates are
> enabled. However, only a fraction of those functions is actually executed
> because the Parallel Append node distributes the workers among the nodes.
> With the attached patch we get a lazy generation which makes that this is no
> longer a problem.

I unfortunately don't think this is quite good enough, because it'll
lead to emitting all functions separately, which can also lead to very
substantial increases of the required time (as emitting code is an
expensive step). Obviously that is only relevant in the cases where the
generated functions actually end up being used - which isn't the case in
your example.

If you e.g. look at a query like
  SELECT blub, count(*),sum(zap) FROM foo WHERE blarg = 3 GROUP BY blub;
on a table without indexes, you would end up with functions for

- WHERE clause (including deforming)
- projection (including deforming)
- grouping key
- aggregate transition
- aggregate result projection

with your patch each of these would be emitted separately, instead of
one go. Which IIRC increases the required time by a significant amount,
especially if inlining is done (where each separate code generation ends
up with copies of the inlined code).


As far as I can see you've basically falsified the second part of this
comment (which you moved):

> +
> +    /*
> +     * Don't immediately emit nor actually generate the function.
> +     * instead do so the first time the expression is actually evaluated.
> +     * That allows to emit a lot of functions together, avoiding a lot of
> +     * repeated llvm and memory remapping overhead. It also helps with not
> +     * compiling functions that will never be evaluated, as can be the case
> +     * if e.g. a parallel append node is distributing workers between its
> +     * child nodes.
> +     */

> -    /*
> -     * Don't immediately emit function, instead do so the first time the
> -     * expression is actually evaluated. That allows to emit a lot of
> -     * functions together, avoiding a lot of repeated llvm and memory
> -     * remapping overhead.
> -     */

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] LWLock self-deadlock detection