Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) - Mailing list pgsql-hackers

From Stefan Keller
Subject Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date
Msg-id CAFcOn29OBRezBGVHTZJkhpAGQJ9ZADyR_NhBYBtsgrTTW8rF=g@mail.gmail.com
Whole thread Raw
In response to Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Di., 20. Apr. 2021 23:50 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There's enough support these days that you can build a new index
> type as an extension, without touching the core code at all.

Thanks. I'm ramping up knowledge about extending PG with C++.

I'm still interested to understand in principle what an index has to
do with concurrency control, in order to divide
concerns/reponsibilities of code.

Di., 20. Apr. 2021 23:51 Uhr Peter Geoghegan <pg@bowt.ie> wrote:
> It's easy for me to be a skeptic

Isn't being skeptic a requirement for all of us to be a db engineer :-)

> but much less uncertainty about the outcome of alternative projects of comparable difficulty

Oh. As mentioned above I'm trying to get an overview of indices. So,
if you have hints about other new indexes (like PGM, VODKA for
text/ts, or Hippo), I'm interested.

 ~Stefan

Am Di., 20. Apr. 2021 um 23:51 Uhr schrieb Peter Geoghegan <pg@bowt.ie>:
>
> On Tue, Apr 20, 2021 at 2:29 PM Stefan Keller <sfkeller@gmail.com> wrote:
> > Just for the records: A learned index as no more foreknowledge about
> > the dataset as other indices.
>
> Maybe. ML models are famously prone to over-interpreting training
> data. In any case I am simply not competent to assess how true this
> is.
>
> > I'd give learned indexes at least a change to provide a
> > proof-of-concept. And I want to learn more about the requirements to
> > be accepted as a new index (before undergoing month's of code
> > sprints).
>
> I have everything to gain and nothing to lose by giving them a chance
> -- I'm not required to do anything to give them a chance, after all. I
> just want to be clear that I'm a skeptic now rather than later. I'm
> not the one making a big investment of my time here.
>
> > As you may have seen, the "Stonebraker paper" I cited [1] is also
> > sceptic requiring full parity on features (like "concurrency control,
> > recovery, non main memory,and multi-user settings")! Non main memory
> > code I understand.
> > => But index read/write operations and multi-user settings are part of
> > a separate software (transaction manager), aren't they?
>
> It's easy for me to be a skeptic -- again, what do I have to lose by
> freely expressing my opinion? Mostly I'm just saying that I wouldn't
> work on this because ISTM that there is significant uncertainty about
> the outcome, but much less uncertainty about the outcome of
> alternative projects of comparable difficulty. That's fundamentally
> how I assess what to work on. There is plenty of uncertainty on my end
> -- but that's beside the point.
>
> --
> Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: prerequisites of pull_up_sublinks
Next
From: Masahiko Sawada
Date:
Subject: Re: Replication slot stats misgivings