Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date
Msg-id CAH2-WzmHq5JrJrJtumj3q_U9SfTkNRyDnpUhXMCL6J3pAhZ=UA@mail.gmail.com
Whole thread Raw
In response to Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)  (Chapman Flack <chap@anastigmatix.net>)
List pgsql-hackers
On Tue, Apr 20, 2021 at 11:18 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> BTW take a look into PGM [0]. I'm slowly working on implementing it.
> I think it is kind of straightforward to implement it as extension.
> I've started from forking B-tree[1]. I've removed support of anything that is not int4.
> Then I plan to remove opclass\comparator abstraction layer.
> And then add interpolation search over the page before binsearch.

FWIW I'm a skeptic of learned indexes. There are lots of reasons why I
don't think that they're practical, but it boils down to this: in
general data structures that work well don't need anybody to make a
case for them. They simply work well for the task they were designed
for.

Anything is possible, but I strongly doubt that learned indexes are
the exception to that general rule. Why hasn't anybody been able to
apply the techniques in any real world database system to date? I'm
sure that it's possible to make things much faster if it's possible to
make certain wide-reaching assumptions. They talk about big
improvements in space utilization compared to B-Trees as if there was
some very fixed idea of how that works in B-Trees. Why haven't they
compared their model to grotty heuristics that practical experience
has shown to work, such as the rightmost leaf page split heuristic?
Any paper about learned indexes that I've ever read doesn't even
acknowledge the existence of these heuristics.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Erik Rijkers
Date:
Subject: Re: fix old confusing JSON example
Next
From: Chapman Flack
Date:
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)