Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date
Msg-id CAH2-Wzm8-QfCH=74JrHSZQVzUv62cNhjM8Vvr8zWRDvigcJbSg@mail.gmail.com
Whole thread Raw
In response to Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Stefan Keller <sfkeller@gmail.com>)
Responses Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Stefan Keller <sfkeller@gmail.com>)
List pgsql-hackers
On Tue, Apr 20, 2021 at 2:29 PM Stefan Keller <sfkeller@gmail.com> wrote:
> Just for the records: A learned index as no more foreknowledge about
> the dataset as other indices.

Maybe. ML models are famously prone to over-interpreting training
data. In any case I am simply not competent to assess how true this
is.

> I'd give learned indexes at least a change to provide a
> proof-of-concept. And I want to learn more about the requirements to
> be accepted as a new index (before undergoing month's of code
> sprints).

I have everything to gain and nothing to lose by giving them a chance
-- I'm not required to do anything to give them a chance, after all. I
just want to be clear that I'm a skeptic now rather than later. I'm
not the one making a big investment of my time here.

> As you may have seen, the "Stonebraker paper" I cited [1] is also
> sceptic requiring full parity on features (like "concurrency control,
> recovery, non main memory,and multi-user settings")! Non main memory
> code I understand.
> => But index read/write operations and multi-user settings are part of
> a separate software (transaction manager), aren't they?

It's easy for me to be a skeptic -- again, what do I have to lose by
freely expressing my opinion? Mostly I'm just saying that I wouldn't
work on this because ISTM that there is significant uncertainty about
the outcome, but much less uncertainty about the outcome of
alternative projects of comparable difficulty. That's fundamentally
how I assess what to work on. There is plenty of uncertainty on my end
-- but that's beside the point.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Next
From: Mark Dilger
Date:
Subject: Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]