Re: JIT compiling with LLVM v12 - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: JIT compiling with LLVM v12
Date
Msg-id alpine.DEB.2.21.1808260800360.11066@lancre
Whole thread Raw
In response to Re: JIT compiling with LLVM v12  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
>> Now you can say that'd be solved by bumping the cost up, sure. But
>> obviously the row / cost model is pretty much out of whack here, I don't
>> see how we can make reasonable decisions in a trivial query that has a
>> misestimation by five orders of magnitude.
>
> Before JIT, it didn't matter whether the costing was wrong, provided
> that the path with the lowest cost was the cheapest path (or at least
> close enough to the cheapest path not to bother anyone).  Now it does.
> If the intended path is chosen but the costing is higher than it
> should be, JIT will erroneously activate.  If you had designed this in
> such a way that we added separate paths for the JIT and non-JIT
> versions and the JIT version had a bigger startup cost but a reduced
> runtime cost, then you probably would not have run into this issue, or
> at least not to the same degree.  But as it is, JIT activates when the
> plan looks expensive, regardless of whether activating JIT will do
> anything to make it cheaper.  As a blindingly obvious example, turning
> on JIT to mitigate the effects of disable_cost is senseless, but as
> you point out, that's exactly what happens right now.
>
> I'd guess that, as you read this, you're thinking, well, but if I'd
> added JIT and non-JIT paths for every option, it would have doubled
> the number of paths, and that would have slowed the planner down way
> too much.  That's certainly true, but my point is just that the
> problem is probably not as simple as "the defaults are too low".  I
> think the problem is more fundamentally that the model you've chosen
> is kinda broken.  I'm not saying I know how you could have done any
> better, but I do think we're going to have to try to figure out
> something to do about it, because saying, "check-pg_upgrade is 4x
> slower, but that's just because of all those bad estimates" is not
> going to fly.  Those bad estimates were harmlessly bad before, and now
> they are harmfully bad, and similar bad estimates are going to exist
> in real-world queries, and those are going to be harmful now too.
>
> Blaming the bad costing is a red herring.  The problem is that you've
> made the costing matter in a way that it previously didn't.

My 0.02€ on this interesting subject.

Historically, external IOs, ak rotating disk accesses, have been the main 
cost (by several order of magnitude) of executing database queries, and 
cpu costs are relatively very low in most queries. The point of the query 
planner is mostly to avoid very bad path wrt to IOs.

Now, even with significanly faster IOs, eg SSD's, IOs are still a few 
order of magnitude slower, but less so, so cpu may matter more.

Now again, for small database data are often in memory and stay there, in 
which case CPU is the only cost.

This would suggest the following approach to evaluating costs in the 
planner:

(1) are the needed data already in memory? if so use cpu only costs this 
implies that the planner would know about it... which is probably not the 
case.

(2) if not, then optimise for IOs first, because they are likely to
be the main cost driver anyway.

(3) once an "IO-optimal" (eg not too bad) plan is selected, consider 
whether to apply JIT to part of it: if cpu costs are significant and some 
parts are likely to be executed a lot, with a significant high margin 
because JIT costs.

Basically, I'm suggesting to reevaluate the selected plan, without 
changing it, with a JIT cost to improve it, as a second stage.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: wal_sender_timeout should ignore server-side latency
Next
From: "Tels"
Date:
Subject: Re: JIT compiling with LLVM v12