Thread: Legacy GiST invalid tuples

Legacy GiST invalid tuples

From
Andrey Borodin
Date:
Hi, hackers!

There is bunch of code in current GiST implementation checking for GistTupleIsInvalid(). PostgreSQL since 9.1 do not
createinvalid tuples. Should we support this tuples forever? 

Invalid tuples were created in case of crash during GiST page split. They were used as a recovery measure. During index
scaninvalid tuples are treated as unconditional match. Any update in subtree of such tuple will error with
errhint("PleaseREINDEX it."). Any VACUUM will log same complaint (but succeed). 

I think that there are two options:
1. Bold one: just assume that there are no more invalid tuples.
2. For v12 raise the severity to ERROR on any encounter, then goto option 1 for vNext

What is the purpose of dropping invalid tuples?
1. Make code cleaner
2. Free some spare bytes in t_tid for advancing GiST (for example, see my other patch on GiST intrapage indexing [0] )

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/7780A07B-4D04-41E2-B228-166B41D07EEE@yandex-team.ru

Re: Legacy GiST invalid tuples

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> There is bunch of code in current GiST implementation checking for GistTupleIsInvalid(). PostgreSQL since 9.1 do not
createinvalid tuples. Should we support this tuples forever? 

The question is not whether we still create such tuples.  The reason
the code is still there is that an index that's been pg_upgraded from
before 9.1 might still contain such tuples.  We can't drop the support
unless either we provide logic to clean up invalid entries, or we're
willing to force users to REINDEX old GiST indexes to get rid of them
that way.  The latter seems like a pretty high price just to get rid of
some crufty old code.

            regards, tom lane


Re: Legacy GiST invalid tuples

From
Andrey Borodin
Date:

> 4 июля 2018 г., в 19:22, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> There is bunch of code in current GiST implementation checking for GistTupleIsInvalid(). PostgreSQL since 9.1 do not
createinvalid tuples. Should we support this tuples forever? 
>
> The question is not whether we still create such tuples.  The reason
> the code is still there is that an index that's been pg_upgraded from
> before 9.1 might still contain such tuples.  We can't drop the support
> unless either we provide logic to clean up invalid entries, or we're
> willing to force users to REINDEX old GiST indexes to get rid of them
> that way.  The latter seems like a pretty high price just to get rid of
> some crufty old code.

Thanks, Tom!

So, I can create the script for pg_upgrade that will walk through each old enough[0] GiST index, scan for invalid
tuplesand repair them. This procedure seems quite trivial, but there will be more code that we have now. Does it sound
reasonable?

[0] Actually, I do not know how to understand which index is old enough.

Best regards, Andrey Borodin.



Re: Legacy GiST invalid tuples

From
Alvaro Herrera
Date:
On 2018-Jul-04, Andrey Borodin wrote:

> Thanks, Tom!
> 
> So, I can create the script for pg_upgrade that will walk through each old enough[0] GiST index, scan for invalid
tuplesand repair them. This procedure seems quite trivial, but there will be more code that we have now. Does it sound
reasonable?
> 
> [0] Actually, I do not know how to understand which index is old enough.

Requiring a scan of all indexes during pg_upgrade might increase the
upgrade time prohibitively for some sites, so I don't think that's a
good solution.

I think keeping track of which indexes might be old enough not to have
invalid tuples anymore is a good idea in the long run.  If we start
doing it in pg12, then by the time pg17 comes about and we abandon pg11
(the one without the cataloguing) then we can retire the code to support
invalid tuples.  Most people, come this point, say "naaah this too long,
this project is useless" so the cataloguing is never done :-)

*If* we make it to 2023, that is.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Legacy GiST invalid tuples

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Requiring a scan of all indexes during pg_upgrade might increase the
> upgrade time prohibitively for some sites, so I don't think that's a
> good solution.

Perhaps VACUUM could be taught to clean up invalid entries?  That
wouldn't help Andrey's unstated goal of being able to re-use the bits
for some other purpose in v12, but it might be practical to re-use
them sometime sooner than v17.

            regards, tom lane


Re: Legacy GiST invalid tuples

From
Alvaro Herrera
Date:
On 2018-Jul-04, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Requiring a scan of all indexes during pg_upgrade might increase the
> > upgrade time prohibitively for some sites, so I don't think that's a
> > good solution.
> 
> Perhaps VACUUM could be taught to clean up invalid entries?  That
> wouldn't help Andrey's unstated goal of being able to re-use the bits
> for some other purpose in v12, but it might be practical to re-use
> them sometime sooner than v17.

We tried to clean up HEAP_MOVED_IN / HEAP_MOVED_OFF a long time ago, but
gave up :-)  I recall Andres posted a write-up about adding columns to
pg_class to indicate "what version did last vacuum this whole table", as
a marker for features that are no longer needed.  Maybe that idea can be
reused for this purpose.  I'm thinking pg_upgrade can use its --check
phase to look for indexes marked as older than X (possibly containing
invalid tuples), so it instructs the user to run vacuum on it prior to
the upgrade.  I think the soonest this can work is to add the column in
pg12 so that it can be used to upgrade to pg13.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Legacy GiST invalid tuples

From
Andres Freund
Date:
Hi,

On 2018-07-04 16:43:19 -0400, Alvaro Herrera wrote:
> On 2018-Jul-04, Tom Lane wrote:
> 
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > > Requiring a scan of all indexes during pg_upgrade might increase the
> > > upgrade time prohibitively for some sites, so I don't think that's a
> > > good solution.
> > 
> > Perhaps VACUUM could be taught to clean up invalid entries?  That
> > wouldn't help Andrey's unstated goal of being able to re-use the bits
> > for some other purpose in v12, but it might be practical to re-use
> > them sometime sooner than v17.
> 
> We tried to clean up HEAP_MOVED_IN / HEAP_MOVED_OFF a long time ago, but
> gave up :-)  I recall Andres posted a write-up about adding columns to
> pg_class to indicate "what version did last vacuum this whole table", as
> a marker for features that are no longer needed.

Right. I plan to pick that up as soon as I've some free cycles. I'd not
object to somebody else working on it first though.


> Maybe that idea can be reused for this purpose.

Yes, I think that's explicitly in scope.


> I'm thinking pg_upgrade can use its --check phase to look for indexes
> marked as older than X (possibly containing invalid tuples), so it
> instructs the user to run vacuum on it prior to the upgrade.

I kinda wondered about making the feature back-patchable (by storing the
version in reloptions or such), but I think that'd just make it less
likely to get in.


> I think the soonest this can work is to add the column in pg12 so that
> it can be used to upgrade to pg13.

I don't think we can easily require that everyone pg_upgrading to v13+
upgrades to v12 first?

Greetings,

Andres Freund


Re: Legacy GiST invalid tuples

From
Alvaro Herrera
Date:
On 2018-Jul-04, Andres Freund wrote:

> > I think the soonest this can work is to add the column in pg12 so that
> > it can be used to upgrade to pg13.
> 
> I don't think we can easily require that everyone pg_upgrading to v13+
> upgrades to v12 first?

We've never done that, I don't know if we can get away with it.
Upgrades with pg_upgrade are not cheap enough for that, methinks.
Maybe I'm wrong, but people complain even at a *single* pg_upgrade --
seems everybody wants in-place upgrades nowadays ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Legacy GiST invalid tuples

From
Andres Freund
Date:
On 2018-07-04 17:02:01 -0400, Alvaro Herrera wrote:
> On 2018-Jul-04, Andres Freund wrote:
> 
> > > I think the soonest this can work is to add the column in pg12 so that
> > > it can be used to upgrade to pg13.
> > 
> > I don't think we can easily require that everyone pg_upgrading to v13+
> > upgrades to v12 first?
> 
> We've never done that, I don't know if we can get away with it.
> Upgrades with pg_upgrade are not cheap enough for that, methinks.
> Maybe I'm wrong, but people complain even at a *single* pg_upgrade --
> seems everybody wants in-place upgrades nowadays ...

Right. That's why I was wondering about making the "last full scan"
feature backpatchable...

Greetings,

Andres Freund


Re: Legacy GiST invalid tuples

From
Alvaro Herrera
Date:
On 2018-Jul-04, Andres Freund wrote:

> On 2018-07-04 17:02:01 -0400, Alvaro Herrera wrote:
> > On 2018-Jul-04, Andres Freund wrote:
> > 
> > > > I think the soonest this can work is to add the column in pg12 so that
> > > > it can be used to upgrade to pg13.
> > > 
> > > I don't think we can easily require that everyone pg_upgrading to v13+
> > > upgrades to v12 first?
> > 
> > We've never done that, I don't know if we can get away with it.
> > Upgrades with pg_upgrade are not cheap enough for that, methinks.
> > Maybe I'm wrong, but people complain even at a *single* pg_upgrade --
> > seems everybody wants in-place upgrades nowadays ...
> 
> Right. That's why I was wondering about making the "last full scan"
> feature backpatchable...

Maybe it's not so bad to make it a reloption in back branches and a
full-fledged column going forward?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Legacy GiST invalid tuples

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-07-04 17:02:01 -0400, Alvaro Herrera wrote:
>> On 2018-Jul-04, Andres Freund wrote:
>>> I don't think we can easily require that everyone pg_upgrading to v13+
>>> upgrades to v12 first?

>> We've never done that, I don't know if we can get away with it.
>> Upgrades with pg_upgrade are not cheap enough for that, methinks.
>> Maybe I'm wrong, but people complain even at a *single* pg_upgrade --
>> seems everybody wants in-place upgrades nowadays ...

> Right. That's why I was wondering about making the "last full scan"
> feature backpatchable...

ISTM this is closely related to the business about on-line checksum
enabling that Magnus et al have been working on.  In general, we have a
lot of pent-up demand for changes that require on-disk data modification,
so it seems like it's time to design a generic mechanism for doing that
in a nice way (ie without downtime or huge performance penalties).

            regards, tom lane