Re: pageinspect patch, for showing tuple data - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pageinspect patch, for showing tuple data
Date
Msg-id CAB7nPqR2OGCVtO4wS+D6X8AND16wZHLcKa9SmwKo_YY+e95EMA@mail.gmail.com
Whole thread Raw
In response to Re: pageinspect patch, for showing tuple data  (Nikolay Shaplov <n.shaplov@postgrespro.ru>)
Responses Re: pageinspect patch, for showing tuple data  (Nikolay Shaplov <n.shaplov@postgrespro.ru>)
List pgsql-hackers


On Wed, Sep 2, 2015 at 6:58 PM, Nikolay Shaplov <n.shaplov@postgrespro.ru> wrote:
В письме от 3 августа 2015 15:35:23 пользователь Alvaro Herrera написал:
> Nikolay Shaplov wrote:
> > This patch adds several new functions, available from SQL queries. All
> > these functions are based on heap_page_items, but accept slightly
> > different arguments and has one additional column at the result set:
> >
> > heap_page_tuples - accepts relation name, and bulkno, and returns usual
> > heap_page_items set with additional column that contain snapshot of tuple
> > data area stored in bytea.
>
> I think the API around get_raw_page is more useful generally.  You might
> have table blocks stored in a bytea column for instance, because you
> extracted from some remote server and inserted into a local table for
> examination.  If you only accept relname/blkno, there's no way to
> examine data other than blocks directly in the database dir, which is
> limiting.
>
> > There is also one strange function: _heap_page_items it is useless for
> > practical purposes. As heap_page_items it accepts page data as bytea, but
> > also get a relation name. It tries to apply tuple descriptor of that
> > relation to that page data.
>
> For BRIN, I added something similar (brin_page_items), but it receives
> the index OID rather than name, and constructs a tuple descriptor to
> read the data.  I think OID is better than name generally.  (You can
> cast the relation name to regclass).
>
> It's easy to misuse, but these functions are superuser-only, so there
> should be no security issue at least.  The possibility of a crash
> remains real, though, so maybe we should try to come up with a way to
> harden that.

I've done as you've said: Now patch adds two functions heap_page_item_attrs
and heap_page_item_detoast_attrs and these function takes raw_page and
relation OID as arguments. They also have third optional parameter that allows
to change error mode from ERROR to WARNING.

Is it ok now?

Yeah, I think that's acceptable to have a switch, defaulting to ERROR if caller specifies nothing.

Here are some observations after looking at the code, no functional testing.

+       int             error_mode = PG_NARGS()==3 ? PG_GETARG_BOOL(2) :NULL;

When handling additional arguments, it seems to me that it is more readable to use something like that:
if (PG_NARGS >= 3)
{
     arg = PG_GETARG_XXX(2);
     //etc.
}

+       error_mode = error_mode?WARNING:ERROR;

This generates warnings at compilation.

+       if (SRF_IS_FIRSTCALL() && (error_mode == WARNING))
+       {
+               ereport(WARNING,
+                       (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                               (errmsg("Runnung heap_page_item_attrs in WARNING mode."))));
+       }

I am not convinced that this is necessary.

+               inter_call_data->raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+               if (inter_call_data->raw_page_size < SizeOfPageHeaderData)
Including raw_page_size in the status data does not seem necessary to me. The page data is already available for each loop so you could just extract it from it.

+               ereport(inter_call_data->error_level,
+                               (errcode(ERRCODE_DATA_CORRUPTED),
+                                       errmsg("Data corruption: number of attributes in tuple header is greater than number of attributes in tuple descripor")));
I'd rather switch the error reports related to data corruption from ereport to elog, which is more suited for internal errors, and it seems to me that it is the case here.

heapfuncs.c:370:7: warning: variable 'raw_attr' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                if (!is_null)
                    ^~~~~~~~
heapfuncs.c:419:43: note: uninitialized use occurs here
                raw_attrs = accumArrayResult(raw_attrs, raw_attr, is_null, BYTEAOID, fctx->multi_call_memory_ctx);
                                                        ^~~~~~~~
heapfuncs.c:370:3: note: remove the 'if' if its condition is always true
                if (!is_null)
                ^~~~~~~~~~~~~
heapfuncs.c:357:17: note: initialize the variable 'raw_attr' to silence this warning
                Datum raw_attr;
My compiler complains about uninitialized variables.

+--
+-- _heap_page_items()
+--
+CREATE FUNCTION _heap_page_items(IN page bytea, IN relname text,
I am not sure why this is necessary and visibly it overlaps with the existing declaration of heap_page_items. The last argument is different though, and I recall that we decided not to use anymore the relation name specified as text, but its OID instead in this module.

Documentation is missing, that would be good to have to understand what each function is intended to do. Code has some whitespaces.
--
Michael

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Freeze avoidance of very large table.
Next
From: Tatsuo Ishii
Date:
Subject: Re: BRIN INDEX value