Thread: Fix brin_form_tuple to properly detoast data
Hi, As pointed out in [1], BRIN is not properly handling toasted data, which may easily lead to index tuples referencing TOAST-ed values. Which is clearly wrong - it's trivial to trigger failues after a DELETE. Attached is a patch that aims to fix this - AFAIK the brin_form_tuple was simply missing the TOAST_INDEX_HACK stuff from index_form_tuple, which ensures the data is detoasted and (possibly) re-compressed. The code is mostly the same, with some BRIN-specific tweaks (looking at oi_typecache instead of the index descriptor, etc.). I also attach a simple SQL script that I used to trigger the issue. This needs to be turned into a regression test, I'll work on that tomorrow. A separate question is what to do about existing indexes - ISTM the only thing we can do is to tell the users to reindex all BRIN indexes on varlena values. Something like this: select * from pg_class where relam = (select oid from pg_am where amname = 'brin') and oid in (select attrelid from pg_attribute where attlen = -1 and attstorage in ('e', 'x')); regards [1] https://www.postgresql.org/message-id/20201001184133.oq5uq75sb45pu3aw%40development -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 11/4/20 2:05 AM, Tomas Vondra wrote: > Hi, > > As pointed out in [1], BRIN is not properly handling toasted data, which > may easily lead to index tuples referencing TOAST-ed values. Which is > clearly wrong - it's trivial to trigger failues after a DELETE. > > Attached is a patch that aims to fix this - AFAIK the brin_form_tuple > was simply missing the TOAST_INDEX_HACK stuff from index_form_tuple, > which ensures the data is detoasted and (possibly) re-compressed. The > code is mostly the same, with some BRIN-specific tweaks (looking at > oi_typecache instead of the index descriptor, etc.). > > I also attach a simple SQL script that I used to trigger the issue. This > needs to be turned into a regression test, I'll work on that tomorrow. > OK, so here's an improved version of the fix - aside from the code (same as in v1), there are two patches with regression tests. Ultimately those should be merged with the fix, but this way it's possible to apply the regression tests to trigger the issue. The first test is fairly trivial - it simply builds index on toasted data and then shows how an insert and select fail. There's a caveat, that this requires a DELETE + VACUUM, and the VACUUM actually has to cleanup the rows. So there must be no concurrent transactions that might need the rows, which is unlikely in regression tests. So this requires waiting for all running transactions to finish - I did that by building an index concurrently. It's a bit strange, but it's better than any other solution I could think of (timeout or some custom wait for xacts). The second test is a bit redundant - it merely checks that both CREATE INDEX and INSERT INTO fail the same way when the index tuple gets too large. Before the fix there were some inconsistencies - the CREATE INDEX succeeded because it used TOASTed data. So ultimately this tests the same thing, but from a different perspective. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-Nov-04, Tomas Vondra wrote: > The first test is fairly trivial - it simply builds index on toasted data > and then shows how an insert and select fail. There's a caveat, that this > requires a DELETE + VACUUM, and the VACUUM actually has to cleanup the rows. > So there must be no concurrent transactions that might need the rows, which > is unlikely in regression tests. So this requires waiting for all running > transactions to finish - I did that by building an index concurrently. It's > a bit strange, but it's better than any other solution I could think of > (timeout or some custom wait for xacts). There are recent changes in vacuum for temp tables (commit 94bc27b57680?) that would maybe make this stable enough, without having to have the CIC there. At least, I tried it locally a few times and it appears to work well. This won't work for older releases though, just master. This is patch 0001 attached here. > The second test is a bit redundant - it merely checks that both CREATE INDEX > and INSERT INTO fail the same way when the index tuple gets too large. > Before the fix there were some inconsistencies - the CREATE INDEX succeeded > because it used TOASTed data. So ultimately this tests the same thing, but > from a different perspective. Hmm. This one shows page size in the error messages, so it'll fail on nonstandard builds. I think we try to stay away from introducing those, so I'd leave this test out. The code fix looks all right -- I'd just move the #include lines to their place. Patch 0002. You add this comment: > + /* > + * Do nothing if value is not of varlena type. We don't need to > + * care about NULL values here, thanks to bv_allnulls above. > + * > + * If value is stored EXTERNAL, must fetch it so we are not > + * depending on outside storage. > + * > + * XXX Is this actually true? Could it be that the summary is > + * NULL even for range with non-NULL data? E.g. degenerate bloom > + * filter may be thrown away, etc. > + */ I think the XXX comment points to a bug that we don't have right now, since neither minmax nor inclusion can end up with a NULL summary starting from non-NULL data. But if the comment about bloom is correct, then surely it'll become a bug when bloom is added. I don't think we need the second part of this comment: > +/* > + * This enables de-toasting of index entries. Needed until VACUUM is > + * smart enough to rebuild indexes from scratch. > + */ ... because, surely, we're now never working on having VACUUM rebuild indexes from scratch. In fact, I wonder if we need the #define at all. I propose to remove all those #ifdef lines in your patch. The fix looks good to me. I just added a comment in 0003.
Attachment
On 11/5/20 6:17 PM, Alvaro Herrera wrote: > On 2020-Nov-04, Tomas Vondra wrote: > >> The first test is fairly trivial - it simply builds index on toasted data >> and then shows how an insert and select fail. There's a caveat, that this >> requires a DELETE + VACUUM, and the VACUUM actually has to cleanup the rows. >> So there must be no concurrent transactions that might need the rows, which >> is unlikely in regression tests. So this requires waiting for all running >> transactions to finish - I did that by building an index concurrently. It's >> a bit strange, but it's better than any other solution I could think of >> (timeout or some custom wait for xacts). > > There are recent changes in vacuum for temp tables (commit 94bc27b57680?) > that would maybe make this stable enough, without having to have the CIC > there. At least, I tried it locally a few times and it appears to work well. > This won't work for older releases though, just master. This is patch > 0001 attached here. > IIUC you're suggesting to use a temporary table in the test? Unfortunately, this does not work on older releases, and IMHO the test should be backpatched too. IMHO the CIC "hack" is acceptable, unless there's a better solution that I'm not aware of. >> The second test is a bit redundant - it merely checks that both CREATE INDEX >> and INSERT INTO fail the same way when the index tuple gets too large. >> Before the fix there were some inconsistencies - the CREATE INDEX succeeded >> because it used TOASTed data. So ultimately this tests the same thing, but >> from a different perspective. > > Hmm. This one shows page size in the error messages, so it'll fail on > nonstandard builds. I think we try to stay away from introducing those, > so I'd leave this test out. Hmm, OK. I don't think having "redundant" test is a big deal, but I haven't thought about builds with different block sizes. I'll leave this out. > > The code fix looks all right -- I'd just move the #include lines to > their place. Patch 0002. > OK > You add this comment: > >> + /* >> + * Do nothing if value is not of varlena type. We don't need to >> + * care about NULL values here, thanks to bv_allnulls above. >> + * >> + * If value is stored EXTERNAL, must fetch it so we are not >> + * depending on outside storage. >> + * >> + * XXX Is this actually true? Could it be that the summary is >> + * NULL even for range with non-NULL data? E.g. degenerate bloom >> + * filter may be thrown away, etc. >> + */ > > I think the XXX comment points to a bug that we don't have right now, > since neither minmax nor inclusion can end up with a NULL summary > starting from non-NULL data. But if the comment about bloom is correct, > then surely it'll become a bug when bloom is added. > Yeah, but that'd be something for the bloom patch to fix, I think. This got me thinking though - wouldn't it be better to handle too large values by treating the range as "degenerate" (i.e. store NULL and consider it as matching all queries), instead of failing the CREATE INDEX or DML? I find the current behavior rather annoying, because it depends on the other rows in the page range, not just on the one row the user deals with. Perhaps this might even be considered an information leak about the other data. Of course, not something this patch should deal with. > I don't think we need the second part of this comment: > >> +/* >> + * This enables de-toasting of index entries. Needed until VACUUM is >> + * smart enough to rebuild indexes from scratch. >> + */ > > ... because, surely, we're now never working on having VACUUM rebuild > indexes from scratch. In fact, I wonder if we need the #define at > all. I propose to remove all those #ifdef lines in your patch. > That's a verbatim copy of a comment from indextuple.c. IMHO we should keep it the same in both places. > The fix looks good to me. I just added a comment in 0003. > Thanks. Any opinions on fixing this in existing clusters? Any better ideas than just giving users the SQL query to list possibly-affected indexes? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 5 Nov 2020 18:16:04 -0300 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2020-Nov-05, Tomas Vondra wrote: > > > On 11/5/20 6:17 PM, Alvaro Herrera wrote: > > > > There are recent changes in vacuum for temp tables (commit > > > 94bc27b57680?) that would maybe make this stable enough, without > > > having to have the CIC there. At least, I tried it locally a few > > > times and it appears to work well. This won't work for older > > > releases though, just master. This is patch 0001 attached here. > > > > IIUC you're suggesting to use a temporary table in the test? > > Unfortunately, this does not work on older releases, and IMHO the > > test should be backpatched too. IMHO the CIC "hack" is acceptable, > > unless there's a better solution that I'm not aware of. > > Oh, sure, the CIC hack is acceptable for the older branches. I'm just > saying that you can use a temp table everywhere, and keep CIC in the > old branches and no CIC in master. > > > This got me thinking though - wouldn't it be better to handle too > > large values by treating the range as "degenerate" (i.e. store NULL > > and consider it as matching all queries), instead of failing the > > CREATE INDEX or DML? I find the current behavior rather annoying, > > because it depends on the other rows in the page range, not just on > > the one row the user deals with. Perhaps this might even be > > considered an information leak about the other data. Of course, not > > something this patch should deal with. > > Hmm. Regarding text I remember thinking we could just truncate values > (as we do for LIKE, as I recall). I suppose that strategy would work > even for bytea. > > > > > > +/* > > > > + * This enables de-toasting of index entries. Needed until > > > > VACUUM is > > > > + * smart enough to rebuild indexes from scratch. > > > > + */ > > > > > > ... because, surely, we're now never working on having VACUUM > > > rebuild indexes from scratch. In fact, I wonder if we need the > > > #define at all. I propose to remove all those #ifdef lines in > > > your patch. > > > > That's a verbatim copy of a comment from indextuple.c. IMHO we > > should keep it the same in both places. > > Sure, if you want to. > > > > The fix looks good to me. I just added a comment in 0003. > > > > Thanks. Any opinions on fixing this in existing clusters? Any > > better ideas than just giving users the SQL query to list > > possibly-affected indexes? > > None here. > OK, pushed and backpatched all the way back to 9.5. I decided not to use the temporary table - I'd still need to use the CIC trick on older releases, and there were enough differences already. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company