Home > mailing lists

Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: MaxOffsetNumber for Table AMs
Date	May 5, 2021 17:15:16
Msg-id	CAH2-WzmDCbnFn87ODpu0zGwP+PZXWtMGsT8Ds_eLAPrzYP-L-Q@mail.gmail.com Whole thread Raw
In response to	Re: MaxOffsetNumber for Table AMs (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: MaxOffsetNumber for Table AMs Re: MaxOffsetNumber for Table AMs
List	pgsql-hackers

Tree view

On Wed, May 5, 2021 at 9:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 5, 2021 at 11:50 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > I'm being very vocal here because I'm concerned that we're going about
> > generalizing TIDs in the wrong way. To me it feels like there is a
> > loss of perspective about what really matters.
>
> Well, which things matter is a question of opinion, not fact.

I'm not trying to win an argument here. I am giving an opinion in the
hopes that it leads to some kind of useful synthesis, based on all of
our opinions.

> > No other database system has something like indirect indexes. They
> > have clustered indexes, but that's rather different.
>
> I don't think this is true at all. If you have a clustered index -
> i.e. the table is physically arranged according to the index ordering
> - then your secondary indexes all pretty much have to be what we're
> calling indirect indexes. They can hardly point to a physical
> identifier if rows are being moved around. I believe InnoDB works this
> way, and I think Oracle's index-organized tables do too. I suspect
> there are other examples.

But these systems don't have indirect indexes *on a heap table*! Why
would they ever do it that way? They already have rowid/TID as a
stable identifier of logical rows, so having indirect indexes that
point to a heap table's rows would be strictly worse than the generic
approach for indexes on a heap table.

What we call indirect indexes seem to me to be a failed attempt to
solve the "TID is not a stable identifier of logical row" issue that
is baked-in to Postgres. If I thought it was worth solving that
problem then I suppose I'd solve it directly. The "indirection" of
indirect indexes actuallys buys you nothing! It just moves some of the
problem somewhere else, at the cost of even more complexity. Indirect
indexes (without a clustered index) are a muddled idea.

Of course I accept that clustered indexes make sense in general
(though less and less these days). But the fact that these systems
"use indirect indexes" for secondary indexes is precisely why
clustered indexes don't seem like a great design with modern hardware!
Should we invest a huge amount of work in order to have all of the
disadvantages, and none of the advantages?

> My point is that so far I am not seeing a whole lot of value of this
> proposed approach. For a 64-bit TID to be valuable to you, one of two
> things has to be true: you either don't care about having indexes that
> store TIDs on your new table type, or the index types you want to use
> can store those 64-bit TIDs. Now, I have not yet heard of anyone
> working on a table AM who does not want to be able to support adding
> btree indexes. There may be someone that I don't know about, and if
> so, fine. But otherwise, we need a way to store them. And that
> requires changing the page format for btree indexes. But surely we do
> not want to make all TIDs everywhere wider in future btree versions,
> so at least two TID widths - 6 bytes and 8 bytes - would have to be
> supported.

I agree that we don't want a performance/space overhead for simple
cases that are quite happy with the current format.

> And if we're at all going to do that, I think it's
> certainly worth asking whether supporting varlena TIDs would really be
> all that much harder. You seem to think it is, and you might be right,
> but I'm not ready to give up, because I do not see how we are ever
> going to get global indexes or indirect indexes without doing it, and
> those would be good features to have.

I think that global indexes are well worth having, and should be
solved some completely different way. The partition key can be an
additive thing. I strongly suspect that indirect indexes (without a
clustered index) are 100% useless in both theory and practice, so
naturally I have little to no interest.

The difficulty of supporting (say) 6 byte and 8 byte TIDs together is
vastly lower than variable-width TIDs, for all kinds of reasons. See
my remarks to Andres upthread about deduplication.

> If we can't ever get them, so be it, but you seem to kind of be saying
> that things like global indexes and indirect indexes are hard, and
> therefore they don't count as reasons why we might want variable-width
> TIDs.But one very large reason why those things are hard is that they
> require variable-width TIDs, so AFAICS this boils down to saying that
> we don't want the feature because it's hard to implement.

More like very hard to implement for a very low benefit.

> But we
> should not conflate feasibility with desirability. I am quite sure
> that lots of people want global indexes.

I do too!

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Jeff Davis
Date: 05 May 2021, 17:12:59
Subject: Re: MaxOffsetNumber for Table AMs

From: Jeff Davis
Date: 05 May 2021, 17:21:58
Subject: Re: MaxOffsetNumber for Table AMs

Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

Previous

Next