Re: Toast issues with OldestXmin going backwards - Mailing list pgsql-hackers
From | Andrew Gierth |
---|---|
Subject | Re: Toast issues with OldestXmin going backwards |
Date | |
Msg-id | 87d0yscktp.fsf@news-spur.riddles.org.uk Whole thread Raw |
In response to | Re: Toast issues with OldestXmin going backwards (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Toast issues with OldestXmin going backwards
Re: Toast issues with OldestXmin going backwards |
List | pgsql-hackers |
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: Andres> Please wait for a bit. This isn't a trivial change, and I would Andres> like to wrap my head around it. Sure. Andres> At the very least this seems like it could cause extreme Andres> slowdowns for large tables that have the right update/delete Andres> patterns? I'm working on the basis that fetching the same set of toast rows twice in succession will at least mean they are likely in cache for the second time, so we shouldn't be doing any more actual I/O than before. (This is assuming that a system with really big toast values is going to also have plenty of RAM, which is a generally safe assumption since such values are copied many times in memory by pg_dump.) There is still some overhead in the check, of course; I will see about doing some benchmarks. >> + /* >> + * OldestXmin going backwards can mean we have a RECENTLY_DEAD row that >> + * contains toast pointers whose toast rows have already been vacuumed >> + * away (or in the worst but unlikely case, recycled). If so, then the >> + * row must really be dead to all snapshots that could access it, so >> + * treat it as DEAD instead. >> + */ Andres> How is it guaranteed that the recheck doesn't find a different Andres> toast tuple at the relevant position? It's not, but it doesn't matter as long as the new toast rows can pass through toast_fetch_datum without throwing an error (which is why toast_validate_datum has to scan all the entries and check all their lengths). Firstly, we can assume the following statements (or at least if we can't, we have far worse problems than this): 1. OldestXmin going backwards cannot make any tuple visible in any (non-vacuum) snapshot that it wasn't visible in before; 2. If a tuple is non-vacuumable itself, it won't have its toast entries vacuumed away; 3. Once a tuple is vacuumable at any single instant, it will never become visible in any snapshot. So the only cases where a recently-dead tuple can have pointers to a now-recycled toast entry OID is if it was at some point vacuumable, such that a toast-table-only vacuum removed its toast entries, and a new insert or update subsequently inserted a value of the same length that was assigned the OID. Since the tuple was vacuumable at that point, it will never again become visible in any snapshot, so the only code that should ever access its toast values is vacuum full/cluster, which only ever copies the value and does not attempt to decompress or otherwise interpret it. So the only possible consequences for copying the wrong value, as long as we don't throw an error in the process, are that (a) some space is possibly wasted in the toast table - which actually happens already in far more common circumstances - and (b) someone doing forensics or other non-standard access to dead tuples might (in this vanishingly rare case) see something unexpected. At no time would incorrect data be visible in any query. (regarding wasted space in the toast table: vacuum full / cluster will insert live toast entries for copied recently-dead tuples, and if those toast entries are not also referenced from any live tuple, they will never be removed by subsequent normal vacuums, since nothing can ever mark those entries dead; only a later vacuum full / cluster can remove them.) -- Andrew (irc:RhodiumToad)
pgsql-hackers by date: