Re: a path towards replacing GEQO with something better - Mailing list pgsql-hackers

From AJG
Subject Re: a path towards replacing GEQO with something better
Date
Msg-id 1624135138590-0.post@n3.nabble.com
Whole thread Raw
In response to a path towards replacing GEQO with something better  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
Hi,

I stumbled across this which may be of interest to this topic and GEQO
alternative.

The main creator/author of Neo and Bao (ML for Query Optimizer) Ryan Marcus
(finishing Postdoc and looking for job) recently posted [1] about Bao for
distributed systems.

But what was interesting was the links he had to a 2020 Neo YouTube [2]
which discussed better cardinality estimation / 90% less errors (vs.
Postgres 10) only improved query latency by 2-3%, and other MLs made worse
in other scenarios.

Other interesting takeaways from the video (summary):

PostgreSQL Query Optimizer – 40k LOC. PG10 70% worse/slower than Oracle. PG
has 3 major flaws in QO, 1 fixed in PG11. Neo 10-25% better than PG QO after
30hr training (using GPU). Neo drops to 10% better if 3 flaws were / could
be fixed.

MS SQL – 1 million LOC.

Oracle – 45-55 FTEs working on it. No LOC given by Oracle. Appear to focus
on TPC-DS. NEO better than Oracle after 60hr training (using GPU).

Humans and hand tuning will always beat ML. I.e. Neo (and Bao) good for
those who cannot afford a fulltime DBA doing query optimizing.


Bao – follow-on work from Neo.
“This is a prototype implementation of Bao for PostgreSQL. Bao is a learned
query optimizer that learns to "steer" the PostgreSQL optimizer by issuing
coarse-grained query hints. For more information about Bao”

BAO GitHub here [3] and is AGPLv3 license (not sure if that’s good or bad).

Bao drawbacks… (but may not matter from a GEQO perspective??)

“Of course, Bao does come with some drawbacks. Bao causes query optimization
to take a little bit more time (~300ms), requiring quite a bit more
computation. We studied this overhead in our SIGMOD paper. For data
warehouse workloads, which largely consists of long-running, resource
intensive queries, Bao’s increased overhead is hardly noticeable. However,
for workloads with a lot of short running queries, like OLTP workloads, this
might not be the case. We are currently working on new approaches to
mitigate that problem – so stay tuned!”


[1] https://rmarcus.info/blog/2021/06/17/bao-distributed.html
[2] https://www.youtube.com/watch?v=lMb1yNbIopc
Cardinality errors impact on latency - Starting at 8:00, interesting at
10:10 approx.
[3] https://github.com/learnedsystems/baoforpostgresql




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Race condition in InvalidateObsoleteReplicationSlots()
Next
From: Noah Misch
Date:
Subject: Re: Race between KeepFileRestoredFromArchive() and restartpoint