Re: JIT compiling with LLVM v12 - Mailing list pgsql-hackers

From Tels
Subject Re: JIT compiling with LLVM v12
Date
Msg-id 5f395081219a955151d730ff21938a7e.squirrel@sm.webmail.pair.com
Whole thread Raw
In response to Re: JIT compiling with LLVM v12  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Moin,

On Sat, August 25, 2018 9:34 pm, Robert Haas wrote:
> On Wed, Aug 22, 2018 at 6:43 PM, Andres Freund <andres@anarazel.de> wrote:
>> Now you can say that'd be solved by bumping the cost up, sure. But
>> obviously the row / cost model is pretty much out of whack here, I don't
>> see how we can make reasonable decisions in a trivial query that has a
>> misestimation by five orders of magnitude.
>
> Before JIT, it didn't matter whether the costing was wrong, provided
> that the path with the lowest cost was the cheapest path (or at least
> close enough to the cheapest path not to bother anyone).  Now it does.
> If the intended path is chosen but the costing is higher than it
> should be, JIT will erroneously activate.  If you had designed this in
> such a way that we added separate paths for the JIT and non-JIT
> versions and the JIT version had a bigger startup cost but a reduced
> runtime cost, then you probably would not have run into this issue, or
> at least not to the same degree.  But as it is, JIT activates when the
> plan looks expensive, regardless of whether activating JIT will do
> anything to make it cheaper.  As a blindingly obvious example, turning
> on JIT to mitigate the effects of disable_cost is senseless, but as
> you point out, that's exactly what happens right now.
>
> I'd guess that, as you read this, you're thinking, well, but if I'd
> added JIT and non-JIT paths for every option, it would have doubled
> the number of paths, and that would have slowed the planner down way
> too much.  That's certainly true, but my point is just that the
> problem is probably not as simple as "the defaults are too low".  I
> think the problem is more fundamentally that the model you've chosen
> is kinda broken.  I'm not saying I know how you could have done any
> better, but I do think we're going to have to try to figure out
> something to do about it, because saying, "check-pg_upgrade is 4x
> slower, but that's just because of all those bad estimates" is not
> going to fly.  Those bad estimates were harmlessly bad before, and now
> they are harmfully bad, and similar bad estimates are going to exist
> in real-world queries, and those are going to be harmful now too.
>
> Blaming the bad costing is a red herring.  The problem is that you've
> made the costing matter in a way that it previously didn't.

Hm, no, I don't quite follow this argument. Isn't trying to avoid "bad
costing having bad consequences" just hiding the symponts instead of
curing them? It would have a high development cost, and still bad
estimates could ruin your day in other places.

Wouldn't it be much smarter to look at why and how the bad costing appears
and try to fix this? If a query that returns 12 rows was estimated to
return about 4 million, something is wrong on a ridiculous scale.

If the costing didn't produce so much "to the moon" values, then it
wouldn't matter so much what later decisions do depending on it. I mean,
JIT is not the only thing here, even choosing the wrong plan can lead to
large runtime differences (think of a sort that spills to disk etc.)

So, is there a limit on how many rows can be estimated? Maybe based on
things like:

* how big the table is? E.g. a table with 2 pages can't have a million rows.
* what the column types are? E.g. if you do:

  SELECT * FROM table WHERE id >= 100 AND id < 200;

you cannot have more than 100 rows as a result if "id" is a unique integer
column.
* Index size: You can't pull out more rows from an index than it contains,
maybe this helps limiting "worst estimate"?

These things might also be cheaper to implement that rewriting the entire
JIT model.

Also, why does PG allow the stats to be that outdated - or missing, I'm
not sure which case it is in this example. Shouldn't the system aim to
have at least some basic stats, even if the user never runs ANALYZE? Or is
this on purpose for these tests to see what happens?

Best regards,

Tels


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: JIT compiling with LLVM v12
Next
From: Amit Kapila
Date:
Subject: Re: Postgres 11 release notes