Re: JIT compiling with LLVM v12 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: JIT compiling with LLVM v12 |
Date | |
Msg-id | 20180822224320.duuquijw46vfdlzm@alap3.anarazel.de Whole thread Raw |
In response to | Re: JIT compiling with LLVM v12 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: JIT compiling with LLVM v12
|
List | pgsql-hackers |
Hi, On 2018-08-22 18:15:29 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-08-22 06:20:21 +0000, Noah Misch wrote: > >> Regardless of the choice of jit={on|off} default, these numbers tell me that > >> some or all of jit_*_cost defaults are too low. > > > I don't think it really shows that. The reason that JITing gets started > > there is that the tables aren't analyzed and we end up with crazy ass > > estimates about the cost of the queries. No useful setting of the cost > > limits will protect against that... :( > > I don't buy that line of argument one bit. No, we generally don't > analyze most of the regression test tables, but the planner still > knows that they're not very large. If JIT is kicking in for those > queries, the defaults are set wrong. I looked at the queries that get JITed, I didn't just make that claim up out of thin air. The first query that's JITed e.g. is: +explain analyze SELECT '' AS tf_12, BOOLTBL1.*, BOOLTBL2.* + FROM BOOLTBL1, BOOLTBL2 + WHERE BOOLTBL2.f1 <> BOOLTBL1.f1; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Nested Loop (cost=0.00..118524.73 rows=3948050 width=34) (actual time=8.376..8.390 rows=12 loops=1) + Join Filter: (booltbl2.f1 <> booltbl1.f1) + Rows Removed by Join Filter: 4 + -> Seq Scan on booltbl1 (cost=0.00..38.10 rows=2810 width=1) (actual time=0.018..0.019 rows=4 loops=1) + -> Materialize (cost=0.00..52.15 rows=2810 width=1) (actual time=0.004..0.005 rows=4 loops=4) + -> Seq Scan on booltbl2 (cost=0.00..38.10 rows=2810 width=1) (actual time=0.007..0.009 rows=4 loops=1) + Planning Time: 0.074 ms + JIT: + Functions: 6 + Generation Time: 0.935 ms + Inlining: false + Inlining Time: 0.000 ms + Optimization: false + Optimization Time: 0.451 ms + Emission Time: 7.716 ms + Execution Time: 43.466 ms +(16 rows) Now you can say that'd be solved by bumping the cost up, sure. But obviously the row / cost model is pretty much out of whack here, I don't see how we can make reasonable decisions in a trivial query that has a misestimation by five orders of magnitude. Another subsequent case is: set enable_sort = off; -- try to make it pick a hash setop implementation select '(2,5)'::cashrange except select '(5,6)'::cashrange; which is expensive because a sort is chosen even though sort is disabled (yes, this might be a bug in the test): EXPLAIN select '(2,5)'::cashrange except select '(5,6)'::cashrange; ┌────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ SetOp Except (cost=10000000000.06..10000000000.07 rows=1 width=36) │ │ -> Sort (cost=10000000000.06..10000000000.06 rows=2 width=36) │ │ Sort Key: ('($2.00,$5.00)'::cashrange) │ │ -> Append (cost=0.00..0.05 rows=2 width=36) │ │ -> Subquery Scan on "*SELECT* 1" (cost=0.00..0.02 rows=1 width=36) │ │ -> Result (cost=0.00..0.01 rows=1 width=32) │ │ -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1 width=36) │ │ -> Result (cost=0.00..0.01 rows=1 width=32) │ │ JIT: │ │ Functions: 7 │ │ Inlining: true │ │ Optimization: true │ └────────────────────────────────────────────────────────────────────────────────────┘ (12 rows) Obviously the high costing here distorts things. Many of the other cases here are along similar lines as the two cases before. > Additional evidence for the > defaults being wrong is the number of reports we've had of JIT making > things slower. Maybe. Greetings, Andres Freund
pgsql-hackers by date: