Thread: Assert in pageinspect with NULL pages

Assert in pageinspect with NULL pages

From
Daria Lepikhova
Date:
Hi, hackers!

If we trying to call pageinspect functions for pages which are filled 
with nulls, we will get core dump. It happens with null pages for all 
indexes in pageinspect and for page_checksum. This problem was founded 
firstly by Anastasia Lubennikova, and now I continue this task.

For example, next script leads to fail.
CREATE TABLE test1 (
     x bigserial,
     y bigint DEFAULT 0
);
INSERT INTO test1(y) SELECT 0 FROM generate_series(1,1E6) AS x;
SELECT page_checksum(repeat(E'\\000', 8192)::bytea, 1);

server closed the connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
fatal: connection to server was lost
LOG:  server process (PID 16465) was terminated by signal 6
DETAIL:  Failed process was running: select 
page_checksum(repeat(E'\\000', 8192)::bytea, 1);
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2022-02-16 
14:03:16 +05
LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  redo starts at 0/14F1B20
LOG:  invalid record length at 0/5D40CD8: wanted 24, got 0
LOG:  redo done at 0/5D40BC0 system usage: CPU: user: 0.98 s, system: 
0.02 s, elapsed: 1.01 s
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  checkpoint complete: wrote 5500 buffers (33.6%); 0 WAL file(s) 
added, 0 removed, 4 recycled; write=0.064 s, sync=0.007 s, total=0.080 
s; sync files=45, longest=0.004 s, average=0.001 s; distance=74044 kB, 
estimate=74044 kB
LOG:  database system is ready to accept connections

Also it is possible to use
     select brin_metapage_info(repeat(E'\\000', 8192)::bytea);
or
     select bt_page_items(repeat(E'\\000', 8192)::bytea);
for getting fail.

I tried to make some additional checks for null pages into pageinspect' 
functions. The attached patch also contains tests for this case.
What do you think?

-- 
Daria Lepikhova
Postgres Professional

Attachment

Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Thu, Feb 17, 2022 at 01:46:40PM +0500, Daria Lepikhova wrote:
> INSERT INTO test1(y) SELECT 0 FROM generate_series(1,1E6) AS x;
> SELECT page_checksum(repeat(E'\\000', 8192)::bytea, 1);

Indeed.  Good catch, and that seems pretty old at quick glance for the
checksum part.  I'll try to look at all that tomorrow.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Justin Pryzby
Date:
BRIN can also crash if passed a non-brin index.

I've been sitting on this one for awhile.  Feel free to include it in your
patchset.

commit 08010a6037fc4e24a9ba05e5386e766f4310d35e
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Tue Jan 19 00:25:15 2021 -0600

    pageinspect: brin_page_items(): check that given relation is not only an index, but a brin one

diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index f1e64a39ef2..3de6dd943ef 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -16,6 +16,7 @@
 #include "access/brin_tuple.h"
 #include "access/htup_details.h"
 #include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -59,7 +60,7 @@ brin_page_type(PG_FUNCTION_ARGS)
     if (raw_page_size != BLCKSZ)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("input page too small"),
+                 errmsg("input page wrong size"),
                  errdetail("Expected size %d, got %d",
                            BLCKSZ, raw_page_size)));
 
@@ -97,7 +98,7 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype)
     if (raw_page_size != BLCKSZ)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("input page too small"),
+                 errmsg("input page wrong size"),
                  errdetail("Expected size %d, got %d",
                            BLCKSZ, raw_page_size)));
 
@@ -169,7 +170,14 @@ brin_page_items(PG_FUNCTION_ARGS)
     MemoryContextSwitchTo(oldcontext);
 
     indexRel = index_open(indexRelid, AccessShareLock);
-    bdesc = brin_build_desc(indexRel);
+
+    /* Must be a BRIN index */
+    if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
+            indexRel->rd_rel->relam != BRIN_AM_OID)
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("\"%s\" is not a BRIN index",
+                     RelationGetRelationName(indexRel))));
 
     /* minimally verify the page we got */
     page = verify_brin_page(raw_page, BRIN_PAGETYPE_REGULAR, "regular");
@@ -178,6 +186,7 @@ brin_page_items(PG_FUNCTION_ARGS)
      * Initialize output functions for all indexed datatypes; simplifies
      * calling them later.
      */
+    bdesc = brin_build_desc(indexRel);
     columns = palloc(sizeof(brin_column_state *) * RelationGetDescr(indexRel)->natts);
     for (attno = 1; attno <= bdesc->bd_tupdesc->natts; attno++)
     {



Re: Assert in pageinspect with NULL pages

From
Alexander Lakhin
Date:
Hello Michael,

18.02.2022 06:07, Michael Paquier wrpte:
> On Thu, Feb 17, 2022 at 09:00:20PM -0600, Justin Pryzby wrote:
>> BRIN can also crash if passed a non-brin index.
>>
>> I've been sitting on this one for awhile.  Feel free to include it in your
>> patchset.
> Ugh.  Thanks!  I am keeping a note about this one.
Could you please confirm before committing the patchset that it fixes
the bug #16527 [1]? Or maybe I could check it?
(Original patch proposed by Daria doesn't cover that case, but if the
patch going to be improved, probably it's worth fixing that bug too.)

[1]
https://www.postgresql.org/message-id/flat/16527-ef7606186f0610a1%40postgresql.org

Best regards,
Alexander



Re: Assert in pageinspect with NULL pages

From
Daria Lepikhova
Date:
First of all, thanks for the review and remarks!

18.02.2022 08:02, Michael Paquier пишет:
> On Thu, Feb 17, 2022 at 05:40:41PM +0800, Julien Rouhaud wrote:
>> About the patch, it's incorrectly using a hardcoded 8192 block-size rather than
>> using the computed :block_size (or computing one when it's not already the
>> case).
> Well, the tests of pageinspect fail would already fail when using a
> different page size than 8k, like the checksum ones :)
>
> Anywa, I agree with your point that if this is not a reason to not
> make the tests more portable if we can do easily.

Fixed.

>> I'm also wondering if it wouldn't be better to return NULL rather than throwing
>> an error for uninitialized pages.
> Agreed that this is a sensible choice.  NULL would be helpful for the
> case where one compiles all the checksums of a relation with a full
> scan based on the relation size, for example.  All these behaviors
> ought to be documented properly, as well.  For SRFs, this should just
> return no rows rather than generating an error.

So, I tried to implement this remark. However, further getting rid of 
throwing an error and replacing it with a NULL return led to reproduce 
the problem that Alexander Lakhin mentioned And here I need little more 
time to figure it out.


And one more addition. In the previous version of the patch, I forgot to 
add tests for the gist index, but the described problem is also relevant 
for it.

-- 
Daria Lepikhova
Postgres Professional

Attachment

Re: Assert in pageinspect with NULL pages

From
Daria Lepikhova
Date:
18.02.2022 08:00, Justin Pryzby пишет:
> BRIN can also crash if passed a non-brin index.
>
> I've been sitting on this one for awhile.  Feel free to include it in your
> patchset.
>
> commit 08010a6037fc4e24a9ba05e5386e766f4310d35e
> Author: Justin Pryzby <pryzbyj@telsasoft.com>
> Date:   Tue Jan 19 00:25:15 2021 -0600
>
>      pageinspect: brin_page_items(): check that given relation is not only an index, but a brin one
>
> diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
> index f1e64a39ef2..3de6dd943ef 100644
> --- a/contrib/pageinspect/brinfuncs.c
> +++ b/contrib/pageinspect/brinfuncs.c
> @@ -16,6 +16,7 @@
>   #include "access/brin_tuple.h"
>   #include "access/htup_details.h"
>   #include "catalog/index.h"
> +#include "catalog/pg_am.h"
>   #include "catalog/pg_type.h"
>   #include "funcapi.h"
>   #include "lib/stringinfo.h"
> @@ -59,7 +60,7 @@ brin_page_type(PG_FUNCTION_ARGS)
>       if (raw_page_size != BLCKSZ)
>           ereport(ERROR,
>                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -                 errmsg("input page too small"),
> +                 errmsg("input page wrong size"),
>                    errdetail("Expected size %d, got %d",
>                              BLCKSZ, raw_page_size)));
>   
> @@ -97,7 +98,7 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype)
>       if (raw_page_size != BLCKSZ)
>           ereport(ERROR,
>                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -                 errmsg("input page too small"),
> +                 errmsg("input page wrong size"),
>                    errdetail("Expected size %d, got %d",
>                              BLCKSZ, raw_page_size)));
>   
> @@ -169,7 +170,14 @@ brin_page_items(PG_FUNCTION_ARGS)
>       MemoryContextSwitchTo(oldcontext);
>   
>       indexRel = index_open(indexRelid, AccessShareLock);
> -    bdesc = brin_build_desc(indexRel);
> +
> +    /* Must be a BRIN index */
> +    if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
> +            indexRel->rd_rel->relam != BRIN_AM_OID)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                 errmsg("\"%s\" is not a BRIN index",
> +                     RelationGetRelationName(indexRel))));
>   
>       /* minimally verify the page we got */
>       page = verify_brin_page(raw_page, BRIN_PAGETYPE_REGULAR, "regular");
> @@ -178,6 +186,7 @@ brin_page_items(PG_FUNCTION_ARGS)
>        * Initialize output functions for all indexed datatypes; simplifies
>        * calling them later.
>        */
> +    bdesc = brin_build_desc(indexRel);
>       columns = palloc(sizeof(brin_column_state *) * RelationGetDescr(indexRel)->natts);
>       for (attno = 1; attno <= bdesc->bd_tupdesc->natts; attno++)
>       {


Thanks! This is a very valuable note. I think I will definitely add it 
to the next version of the patch, as soon as I deal with the error 
exception.

-- 

Daria Lepikhova
Postgres Professional




Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Thu, Feb 17, 2022 at 09:00:20PM -0600, Justin Pryzby wrote:
> BRIN can also crash if passed a non-brin index.
>
> I've been sitting on this one for awhile.  Feel free to include it in your
> patchset.

So, I have begun a close lookup of this thread, and the problem you
are reporting here is worth fixing on its own, before we do something
for new pages as reported originally.  There is more that caught my
attention than the brin AM not being check properly, because a couple
of code paths are fuzzy with the checks on the page sizes.  My
impression of the whole thing is that we'd better generalize the use
of get_page_from_raw(), improving the code on many sides when the user
can provide a raw page in input (also I'd like to think that it is a
better practice anyway as any of those functions may finish to look
8-byte fields, but the current coding would silently break in
alignment-picky environments):
- Some code paths (hash, btree) would trigger elog(ERROR) but we
cannot use that for errors that can be triggered by the user.
- bt_page_items_bytea(), page_header() and fsm_page_contents() were
fuzzy on the page size check, so it was rather easy to generate
garbage with random data.
- page_checksum_internal() used a PageHeader, where I would have
expected a Page.
- More consistency in the error strings, meaning less contents to
translate in the long-term.

This first batch leads me to the attached, with tests to stress all
that for all the functions taking raw pages in input.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Justin Pryzby
Date:
On Tue, Mar 15, 2022 at 06:32:44PM +0900, Michael Paquier wrote:

> +    if (!IS_BRIN(indexRel))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                 errmsg("\"%s\" is not a %s index",
> +                        RelationGetRelationName(indexRel), "BRIN")));

If it were me, I'd write this without the extra parens around (errcode()).

> +-- Suppress the DETAIL message, to allow the tests to work across various
> +-- default page sizes.

I think you mean "various non-default page sizes" or "various page sizes".

-- 
Justin



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Tue, Mar 15, 2022 at 06:56:46AM -0500, Justin Pryzby wrote:
> On Tue, Mar 15, 2022 at 06:32:44PM +0900, Michael Paquier wrote:
>> +-- Suppress the DETAIL message, to allow the tests to work across various
>> +-- default page sizes.
>
> I think you mean "various non-default page sizes" or "various page sizes".

Thanks.  Indeed, this sounded a bit weird.  I have been able to look
at all that this morning and backpatched this first part.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Mon, Feb 21, 2022 at 10:00:00AM +0300, Alexander Lakhin wrote:
> Could you please confirm before committing the patchset that it fixes
> the bug #16527 [1]? Or maybe I could check it?
> (Original patch proposed by Daria doesn't cover that case, but if the
> patch going to be improved, probably it's worth fixing that bug too.)
>
> [1]
> https://www.postgresql.org/message-id/flat/16527-ef7606186f0610a1%40postgresql.org

FWIW, I think that this one has been fixed by 076f4d9, where we make
sure that the page is correctly aligned.  Are you still seeing this
problem?
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Wed, Feb 23, 2022 at 12:09:02PM +0500, Daria Lepikhova wrote:
> And one more addition. In the previous version of the patch, I forgot to add
> tests for the gist index, but the described problem is also relevant for it.

So, I have looked at this second part of the thread, and concluded
that we should not fail for empty pages.  First, we fetch pages from
the buffer pool in normal mode, where empty pages are valid.  There is
also a second point in favor of doing so: code paths dedicated to hash
indexes already do that, marking such pages as simply "unused".  The
proposal from Julien upthread sounds cleaner to me though in the long
run, as NULL gives the user the possibility to do a full-table scan
with simple clauses to filter out anything found as NULL.

Painting more PageIsNew() across the place requires a bit more work
than a simple ereport(ERROR) in get_page_from_raw(), of course, but
the gain is the portability of the functions.

(One can have a lot of fun playing with random inputs and breaking
most code paths, but that's not new.)
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Alexander Lakhin
Date:
Hello Michael,
16.03.2022 10:39, Michael Paquier wrote:
> On Mon, Feb 21, 2022 at 10:00:00AM +0300, Alexander Lakhin wrote:
>> Could you please confirm before committing the patchset that it fixes
>> the bug #16527 [1]? Or maybe I could check it?
>> (Original patch proposed by Daria doesn't cover that case, but if the
>> patch going to be improved, probably it's worth fixing that bug too.)
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/16527-ef7606186f0610a1%40postgresql.org
> FWIW, I think that this one has been fixed by 076f4d9, where we make
> sure that the page is correctly aligned.  Are you still seeing this
> problem?
>
Yes, I've reproduced that issue on master (46d9bfb0). 
pageinspect-zeros-v2.patch doesn't help too.

Best regards.
Alexander



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Wed, Mar 16, 2022 at 08:35:59PM +0300, Alexander Lakhin wrote:
> Yes, I've reproduced that issue on master (46d9bfb0).

Okay.  I'll try to look more closely at that, then.  It feels like we
are just missing a MAXALIGN() in those code paths.  Are you using any
specific CFLAGS or something environment-sensitive (macos, 32b, etc.)?

> pageinspect-zeros-v2.patch doesn't help too.

Well, the scope is different, so that's not a surprise.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Alexander Lakhin
Date:
Hello Michael,
18.03.2022 05:06, Michael Paquier wrote:
> Okay.  I'll try to look more closely at that, then.  It feels like we
> are just missing a MAXALIGN() in those code paths.  Are you using any
> specific CFLAGS or something environment-sensitive (macos, 32b, etc.)?
No, just x86_64, Ubuntu 20.04, gcc 11, valgrind 3.15. I just put that 
query in page.sql and see the server abort.

Best regards,
Alexander



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Fri, Mar 18, 2022 at 06:43:39AM +0300, Alexander Lakhin wrote:
> Hello Michael,
> No, just x86_64, Ubuntu 20.04, gcc 11, valgrind 3.15. I just put that query
> in page.sql and see the server abort.

Bah, of course, I have missed the valgrind part of the problem.

The problem is that we attempt to verify a heap page here but its
pd_special is BLCKSZ.  This causes BrinPageType() to grab back a
position of the area at the exact end of the page, via
PageGetSpecialPointer(), hence the failure in reading two bytes
outside the page.  The result here is that the set of defenses in
verify_brin_page() is poor: we should at least make sure that the
special area is available for a read.  As far as I can see, this is
also possible in bt_page_items_bytea(), gist_page_opaque_info(),
gin_metapage_info() and gin_page_opaque_info().  All those code paths
should be protected with some checks on PageGetSpecialSize(), I
guess, before attempting to read the special area of the page.  Hash
indexes are protected by checking the expected size of the special
area, and one code path of btree relies on the opened relation to be a
btree index.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Julien Rouhaud
Date:
Hi,

On Wed, Mar 23, 2022 at 05:16:41PM +0900, Michael Paquier wrote:
> On Fri, Mar 18, 2022 at 06:43:39AM +0300, Alexander Lakhin wrote:
> > Hello Michael,
> > No, just x86_64, Ubuntu 20.04, gcc 11, valgrind 3.15. I just put that query
> > in page.sql and see the server abort.
> 
> Bah, of course, I have missed the valgrind part of the problem.
> 
> The problem is that we attempt to verify a heap page here but its
> pd_special is BLCKSZ.  This causes BrinPageType() to grab back a
> position of the area at the exact end of the page, via
> PageGetSpecialPointer(), hence the failure in reading two bytes
> outside the page.  The result here is that the set of defenses in
> verify_brin_page() is poor: we should at least make sure that the
> special area is available for a read.  As far as I can see, this is
> also possible in bt_page_items_bytea(), gist_page_opaque_info(),
> gin_metapage_info() and gin_page_opaque_info().  All those code paths
> should be protected with some checks on PageGetSpecialSize(), I
> guess, before attempting to read the special area of the page.  Hash
> indexes are protected by checking the expected size of the special
> area, and one code path of btree relies on the opened relation to be a
> btree index.

I worked on a patch to fix the problem.  The functions you mention were indeed
missing some check, but after digging a bit more I found that other problem
existed.  For instance, feeding a btree page to a gist_page_items_bytea() (both
pages have the same special size) can be quite problematic too, as you can end
up accessing bogus data when retrieving the items.  I also added additional
sanity checks with what is available (gist_page_id for gist, zero-level for
btree leaf pages and so on), and tried to cover everything with tests.

I'm a bit worried about the btree tests stability.  I avoid emitting the level
found to help with that, but it still depends on what other AM will put in
their special page.

Attachment

Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Thu, Mar 24, 2022 at 04:27:40PM +0800, Julien Rouhaud wrote:
> I worked on a patch to fix the problem.  The functions you mention were indeed
> missing some check, but after digging a bit more I found that other problem
> existed.  For instance, feeding a btree page to a gist_page_items_bytea() (both
> pages have the same special size) can be quite problematic too, as you can end
> up accessing bogus data when retrieving the items.  I also added additional
> sanity checks with what is available (gist_page_id for gist, zero-level for
> btree leaf pages and so on), and tried to cover everything with tests.

Thanks for the patch!

I have reviewed what you have sent, bumping on a couple of issues:
- The tests of btree and BRIN failed with 32-bit builds, because
MAXALIGN returns shorter special area sizes in those cases.  This can
be fixed by abusing of \set VERBOSITY to mask the error details.  We
already do that in some of the tests to make them portable.
- Some of the tests were not necessary, overlapping with stuff that
already existed.
- Some checks missed a MAXALIGN().
- Did some tweaks with the error messages.
- Some error messages used an incorrect term to define the index AM
type, aka s/gist/GiST/ or s/brin/BRIN/.
- errdetail() requires a sentence, finishing with a dot (some places
of hashfuncs.c missed that.
- Not your fault, but hash indexes would complain about corrupted
indexes which could be misleading for the user if passing down a
correct page from an incorrect index AM.
- While on it, I have made the error messages more generic in the
places where I could do so.  What I have finished with seems to have a
good balance.

> I'm a bit worried about the btree tests stability.  I avoid emitting the level
> found to help with that, but it still depends on what other AM will put in
> their special page.

Well, the limit of the pageinspect model comes from the fact that it
is possible to pass down any bytea and all those code paths would
happily process the blobs as long as they are 8kB.  Pages can be
crafted as well to bypass some of the checks.  This is superuser-only,
so people have to be careful, but preventing out-of-bound reads is a
different class of problem, as long as these come from valid pages.

With all that in place, I get the attached.  It is Friday, so I am not
going to take my bets on the buildfarm today with a rather-risky
patch.  Monday/Tuesday will be fine.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Julien Rouhaud
Date:
On Fri, Mar 25, 2022 at 11:44:26AM +0900, Michael Paquier wrote:
> I have reviewed what you have sent, bumping on a couple of issues:

Thanks!

I'm happy with all the changes, except:

+        if (P_ISLEAF(opaque) && opaque->btpo_level != 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("block is not a valid leaf page")));

All other messages specify which kind of page it's about, so I think it would
be better to specify "btree" leaf page here, especially since some other AMs
also have leaf pages.

> - The tests of btree and BRIN failed with 32-bit builds, because
> MAXALIGN returns shorter special area sizes in those cases.  This can
> be fixed by abusing of \set VERBOSITY to mask the error details.  We
> already do that in some of the tests to make them portable.

Yeah, that's the other stability problem I was worried about.  I should have
tried to compile with -m32.

> > I'm a bit worried about the btree tests stability.  I avoid emitting the level
> > found to help with that, but it still depends on what other AM will put in
> > their special page.
> 
> Well, the limit of the pageinspect model comes from the fact that it
> is possible to pass down any bytea and all those code paths would
> happily process the blobs as long as they are 8kB.  Pages can be
> crafted as well to bypass some of the checks.  This is superuser-only,
> so people have to be careful, but preventing out-of-bound reads is a
> different class of problem, as long as these come from valid pages.

Agreed.  Also pageinspect can be handy when debugging corruption, so I think it
shouldn't try too hard to discard buggy pages.



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Fri, Mar 25, 2022 at 11:03:47AM +0800, Julien Rouhaud wrote:
> I'm happy with all the changes, except:
>
> +        if (P_ISLEAF(opaque) && opaque->btpo_level != 0)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                     errmsg("block is not a valid leaf page")));
>
> All other messages specify which kind of page it's about, so I think it would
> be better to specify "btree" leaf page here, especially since some other AMs
> also have leaf pages.

Makes sense.  Fine by me to stick an extra "btree" here.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Peter Geoghegan
Date:
On Wed, Mar 23, 2022 at 1:16 AM Michael Paquier <michael@paquier.xyz> wrote:
> As far as I can see, this is
> also possible in bt_page_items_bytea(), gist_page_opaque_info(),
> gin_metapage_info() and gin_page_opaque_info().  All those code paths
> should be protected with some checks on PageGetSpecialSize(), I
> guess, before attempting to read the special area of the page.  Hash
> indexes are protected by checking the expected size of the special
> area, and one code path of btree relies on the opened relation to be a
> btree index.

amcheck's palloc_btree_page() function validates that an 8KiB page is
in fact an nbtree page, in a maximally paranoid way. Might be an
example worth following here.

-- 
Peter Geoghegan



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Thu, Mar 24, 2022 at 08:54:14PM -0700, Peter Geoghegan wrote:
> amcheck's palloc_btree_page() function validates that an 8KiB page is
> in fact an nbtree page, in a maximally paranoid way. Might be an
> example worth following here.

Sure, we could do that.  However, I don't think that going down to
that is something we have any need for in pageinspect, as the module
is also useful to look at the contents of the full page, even if
slightly corrupted, and too many checks would prevent a lookup at the
internal contents of a page.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Fri, Mar 25, 2022 at 12:27:24PM +0900, Michael Paquier wrote:
> Makes sense.  Fine by me to stick an extra "btree" here.

I have been able to look at that again today (earlier than expected),
and except for one incorrect thing in the GIN tests where we were not
testing the correct pattern, the rest was correct.  So applied and
backpatched.  The buildfarm is not complaining based on the first
reports.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Robert Haas
Date:
On Fri, Mar 25, 2022 at 12:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Mar 24, 2022 at 08:54:14PM -0700, Peter Geoghegan wrote:
> > amcheck's palloc_btree_page() function validates that an 8KiB page is
> > in fact an nbtree page, in a maximally paranoid way. Might be an
> > example worth following here.
>
> Sure, we could do that.  However, I don't think that going down to
> that is something we have any need for in pageinspect, as the module
> is also useful to look at the contents of the full page, even if
> slightly corrupted, and too many checks would prevent a lookup at the
> internal contents of a page.

It's my impression that there are many ways of crashing the system
using pageinspect right now. We aren't very careful about making sure
that our code that reads from disk doesn't crash if the data on disk
is corrupted, and all of that systemic weakness is inherited by
pageinspect. But, with pageinspect, you can supply your own pages, not
just use the ones that actually exist on disk.

Fixing this seems like a lot of work and problematic for various
reasons. And I do agree that if we can allow people to look at what's
going on with a corrupt page, that's useful. On the other hand, if by
"looking" they crash the system, that sucks a lot. So overall I think
we need a lot more sanity checks here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Assert in pageinspect with NULL pages

From
Matthias van de Meent
Date:
On Sun, 27 Mar 2022 at 16:24, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 25, 2022 at 12:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Thu, Mar 24, 2022 at 08:54:14PM -0700, Peter Geoghegan wrote:
> > > amcheck's palloc_btree_page() function validates that an 8KiB page is
> > > in fact an nbtree page, in a maximally paranoid way. Might be an
> > > example worth following here.
> >
> > Sure, we could do that.  However, I don't think that going down to
> > that is something we have any need for in pageinspect, as the module
> > is also useful to look at the contents of the full page, even if
> > slightly corrupted, and too many checks would prevent a lookup at the
> > internal contents of a page.
>
> It's my impression that there are many ways of crashing the system
> using pageinspect right now. We aren't very careful about making sure
> that our code that reads from disk doesn't crash if the data on disk
> is corrupted, and all of that systemic weakness is inherited by
> pageinspect. But, with pageinspect, you can supply your own pages, not
> just use the ones that actually exist on disk.
>
> Fixing this seems like a lot of work and problematic for various
> reasons. And I do agree that if we can allow people to look at what's
> going on with a corrupt page, that's useful. On the other hand, if by
> "looking" they crash the system, that sucks a lot. So overall I think
> we need a lot more sanity checks here.

I noticed this thread due to recent commit 291e517a, and noticed that
it has some overlap with one of my patches [0], in which I fix the
corresponding issue in core postgres by assuming (and in
assert-enabled builds, verifying) the size and location of the special
area. As such, you might be interested in that patch.

Note that currently in core postgres a corrupted special area pointer
will likely target neighbouring blocks in the buffer pool; resulting
in spreading corruption when the special area is updated. This
spreading corruption should be limited to only the corrupted block
with my patch.

Kind regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/37/3543/
https://www.postgresql.org/message-id/CAEze2WjE3+tGO9Fs9+iZMU+z6mMZKo54W1Zt98WKqbEUHbHOBg@mail.gmail.com



Re: Assert in pageinspect with NULL pages

From
Peter Geoghegan
Date:
On Sun, Mar 27, 2022 at 7:24 AM Robert Haas <robertmhaas@gmail.com> wrote:
> It's my impression that there are many ways of crashing the system
> using pageinspect right now. We aren't very careful about making sure
> that our code that reads from disk doesn't crash if the data on disk
> is corrupted, and all of that systemic weakness is inherited by
> pageinspect.

It's significantly worse than the core code, I'd say (before we even
consider the fact that you can supply your own pages). At least with
index AMs, where functions like _bt_checkpage() and gistcheckpage()
provide the most basic sanity checks for any page that is read from
disk.

> Fixing this seems like a lot of work and problematic for various
> reasons. And I do agree that if we can allow people to look at what's
> going on with a corrupt page, that's useful. On the other hand, if by
> "looking" they crash the system, that sucks a lot. So overall I think
> we need a lot more sanity checks here.

I don't think it's particularly hard to do a little more. That's all
it would take to prevent practically all crashes. We're not dealing
with adversarial page images here.

Sure, it would be overkill to fully adapt something like
palloc_btree_page() for pageinspect, for the reason Michael gave. But
there are a couple of checks that it does that would avoid practically
all real world hard crashes, without any plausible downside:

* The basic _bt_checkpage() call that every nbtree page uses in the
core code (as well as in amcheck).

* The maxoffset > MaxIndexTuplesPerPage check.

You could even go a bit further, and care about individual index
tuples. My commit 93ee38eade added some basic sanity checks for index
tuples to bt_page_items(). That could go a bit further as well. In
particular, the sanity checks from amcheck's PageGetItemIdCareful()
function could be carried over. That would make it impossible for
bt_page_items() to read past the end of the page when given a page
that has an index tuple whose item pointer offset has some wildly
unreasonable value.

I'm not volunteering. Just saying that this is quite possible.

-- 
Peter Geoghegan



Re: Assert in pageinspect with NULL pages

From
Robert Haas
Date:
On Sun, Mar 27, 2022 at 4:26 PM Peter Geoghegan <pg@bowt.ie> wrote:
> We're not dealing
> with adversarial page images here.

I think it's bad that we have to make that assumption, considering
that there's nothing whatever to keep users from supplying arbitrary
page images to pageinspect. But I also agree that if we're unable or
unwilling to make things perfect, it's still good to make them better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Assert in pageinspect with NULL pages

From
Peter Geoghegan
Date:
On Sun, Mar 27, 2022 at 2:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Mar 27, 2022 at 4:26 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > We're not dealing
> > with adversarial page images here.
>
> I think it's bad that we have to make that assumption, considering
> that there's nothing whatever to keep users from supplying arbitrary
> page images to pageinspect.

Maybe it isn't strictly necessary for bt_page_items(), but making that
level of guarantee is really hard, and not particularly useful. And
that's the easy case for pageinspect: gist_page_items() takes a raw
bytea, and puts it through the underlying types output functions.

I think that it might actually be fundamentally impossible to
guarantee that that'll be safe, because we have no idea what the
output function might be doing. It's arbitrary user-defined code that
could easily be implemented in C. Combined with an arbitrary page
image.

> But I also agree that if we're unable or
> unwilling to make things perfect, it's still good to make them better.

I think that most of the functions can approach being perfectly
robust, with a little work. In practical terms they can almost
certainly be made so robust that no real user of bt_page_items() will
ever crash the server. Somebody that goes out of their way to do that
*might* find a way (even with the easier cases), but that doesn't
particularly concern me.

-- 
Peter Geoghegan



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Sun, Mar 27, 2022 at 02:36:54PM -0700, Peter Geoghegan wrote:
> I think that it might actually be fundamentally impossible to
> guarantee that that'll be safe, because we have no idea what the
> output function might be doing. It's arbitrary user-defined code that
> could easily be implemented in C. Combined with an arbitrary page
> image.

My guess that you'd bring in a lot more safety if we completely cut
the capacity to fetch and pass down raw pages across the SQL calls
because you can add checks for the wanted AM, replacing all that with
a (regclass,blkno) pair.  I am however ready to buy that this may not
be worth rewriting 10~15 years after the fact.

> I think that most of the functions can approach being perfectly
> robust, with a little work. In practical terms they can almost
> certainly be made so robust that no real user of bt_page_items() will
> ever crash the server. Somebody that goes out of their way to do that
> *might* find a way (even with the easier cases), but that doesn't
> particularly concern me.

That does not concern me either, and my own take is to take as
realistic things that can be fetched from a get_raw_page() call rather
than a bytea specifically crafted.  Now, I have found myself in cases
where it was useful to see the contents of a page, as long as you can
go through the initial checks, particularly in cases where corruptions
involved unaligned contents.  Trigerring an error on a sanity check
for a specific block may be fine, now I have also found myself doing
some full scans to see in one shot the extent of a damaged relation
file using the functions of pageinspect.  Fun times.
--
Michael

Attachment

Re: Assert in pageinspect with NULL pages

From
Maxim Orlov
Date:

I've suddenly found that the test in this patch is based on a fact that heap pages don't have PageSpecial or it is of different size with btree pages Special area:

CREATE TABLE test1 (a int8, b int4range);

SELECT bt_page_items(get_raw_page('test1', 0));


In the current state is is so, but it is not guaranteed. For example, in the proposed 64xid patch [1] we introduce PageSpecial into heap pages and its size on occasion equals to the size of btree page special so check from above is false. Even if we pass heap page into pageinspect pretending it is btree page. So the test will fail.    


Generally it seems not only a wrong test but the consequence of a bigger scale fact that we can not always distinguish different AM's pages. So I'd propose at least that we don't come to a conclusion that a page is valid based on PageSpecial size is right.

 
--
Best regards,
Maxim Orlov.

Re: Assert in pageinspect with NULL pages

From
Julien Rouhaud
Date:
Hi,

On Mon, Mar 28, 2022 at 08:29:29PM +0300, Maxim Orlov wrote:
> I've suddenly found that the test in this patch is based on a fact that
> heap pages don't have PageSpecial or it is of different size with btree
> pages Special area:
>
> CREATE TABLE test1 (a int8, b int4range);
>
> SELECT bt_page_items(get_raw_page('test1', 0));
>
>
> In the current state is is so, but it is not guaranteed. For example, in
> the proposed 64xid patch [1] we introduce PageSpecial into heap pages and
> its size on occasion equals to the size of btree page special so check from
> above is false. Even if we pass heap page into pageinspect pretending it is
> btree page. So the test will fail.
>
>
> Generally it seems not only a wrong test but the consequence of a bigger
> scale fact that we can not always distinguish different AM's pages. So I'd
> propose at least that we don't come to a conclusion that a page is valid
> based on PageSpecial size is right.

We don't assume that the page is valid, or of the correct AM, just based on the
special area size.  We do reject it if the size is invalid, but we also have
other tests based on what's actually possible to test (checking flag sanity,
magic values and so on).

Note that Peter G. suggested to add more checks for btree based on
palloc_btree_page(), did you check if those were enough to fix this test with
the 64bits xid patchset applied?



Re: Assert in pageinspect with NULL pages

From
Michael Paquier
Date:
On Wed, Mar 16, 2022 at 05:43:48PM +0900, Michael Paquier wrote:
> So, I have looked at this second part of the thread, and concluded
> that we should not fail for empty pages.  First, we fetch pages from
> the buffer pool in normal mode, where empty pages are valid.  There is
> also a second point in favor of doing so: code paths dedicated to hash
> indexes already do that, marking such pages as simply "unused".  The
> proposal from Julien upthread sounds cleaner to me though in the long
> run, as NULL gives the user the possibility to do a full-table scan
> with simple clauses to filter out anything found as NULL.

It has been a couple of weeks, but I have been able to come back to
this set of issues for all-zero pages, double-checked the whole and
applied a set of fixes down to 10.  So we should be completely done
here.
--
Michael

Attachment