Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) - Mailing list pgsql-hackers

From Stefan Keller
Subject Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date
Msg-id CAFcOn29U7X14d9ZcZDcsNJe6GadRtFTmeh7JkJJSG+rLTMuG7g@mail.gmail.com
Whole thread Raw
In response to Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Just for the records: A learned index as no more foreknowledge about
the dataset as other indices.

I'd give learned indexes at least a change to provide a
proof-of-concept. And I want to learn more about the requirements to
be accepted as a new index (before undergoing month's of code
sprints).

As you may have seen, the "Stonebraker paper" I cited [1] is also
sceptic requiring full parity on features (like "concurrency control,
recovery, non main memory,and multi-user settings")! Non main memory
code I understand.
=> But index read/write operations and multi-user settings are part of
a separate software (transaction manager), aren't they?

~Stefan

[1] https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf


Am Di., 20. Apr. 2021 um 22:22 Uhr schrieb Peter Geoghegan <pg@bowt.ie>:
>
> On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris <jonah.harris@gmail.com> wrote:
> >> Maybe I'll be wrong about learned indexes - who knows? But the burden
> >> of proof is not mine. I prefer to spend my time on things that I am
> >> reasonably confident will work out well ahead of time.
> >
> >
> > Agreed on all of your takes, Peter. In time, they will probably be more realistic.
>
> A big problem when critically evaluating any complicated top-down
> model in the abstract is that it's too easy for the designer to hide
> *risk* (perhaps inadvertently). If you are allowed to make what
> amounts to an assumption that you have perfect foreknowledge of the
> dataset, then sure, you can do a lot with that certainty. You can
> easily find a way to make things faster or more space efficient by
> some ridiculous multiple that way (like 10x, 100x, whatever).
>
> None of these papers ever get around to explaining why what they've
> come up with is not simply fool's gold. The assumption that you can
> have robust foreknowledge of the dataset seems incredibly fragile,
> even if your model is *almost* miraculously good. I have no idea how
> fair that is. But my job is to make Postgres better, not to judge
> papers. My mindset is very matter of fact and practical.
>
> --
> Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: RFE: Make statistics robust for unplanned events
Next
From: Tom Lane
Date:
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)