Re: pageinspect patch, for showing tuple data - Mailing list pgsql-hackers

From Nikolay Shaplov
Subject Re: pageinspect patch, for showing tuple data
Date
Msg-id 2932370.e39aerQGYn@nataraj-amd64
Whole thread Raw
In response to Re: pageinspect patch, for showing tuple data  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: pageinspect patch, for showing tuple data  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
В письме от 8 сентября 2015 11:53:24 Вы написали:
> On Sat, Sep 5, 2015 at 1:05 AM, Nikolay Shaplov wrote:
> > В письме от 4 сентября 2015 14:58:29 пользователь Michael Paquier написал:
> > > Documentation is missing, that would be good to have to understand what
> > > each function is intended to do.
> >
> > I were going to add documentation when this patch is commited, or at least
> > approved for commit. So I would know for sure that function definition
> > would
> > not change, so had not to rewrite it again and again.
>
> I doubt that this patch would be committed without documentation, this is a
> requirement for any patch.
Ok. I can do it.

> > So if it is possible I would write documentation and test when this patch
> > is
> > already approved.
>
> I'm fine with that. Let's discuss its shape and come up with an agreement.
> It would have been good to post test cases of what this stuff actually does
> though, people reviewing this thread are limited to guess on what is tried
> to be achieved just by reading the code.

To test non detoasted functions one should do:

CREATE EXTENSION pageinspect;

create table test (a int, b smallint,c varchar);
insert into test VALUES
(-1,2,'111'),
(2,null,'222'),
(3,8,'333'),
(null,16,null);

Then

# select * from heap_page_items(get_raw_page('test', 0));lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid| t_infomask2 | t_infomask | t_hoff |  t_bits  | t_oid |         t_data          

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+----------+-------+------------------------
1|   8152 |        1 |     34 |    763 |      0 |        0 | (0,1)  |           3 |       2050 |     24 |          |
  | \xffffffff020009313131 2 |   8120 |        1 |     32 |    763 |      0 |        0 | (0,2)  |           3 |
2051|     24 | 10100000 |       | \x0200000009323232 3 |   8080 |        1 |     34 |    763 |      0 |        0 |
(0,3) |           3 |       2050 |     24 |          |       | \x03000000080009333333 4 |   8048 |        1 |     26 |
 763 |      0 |        0 | (0,4)  |           3 |       2049 |     24 | 01000000 |       | \x1000 

or

# select * from heap_page_item_attrs(get_raw_page('test', 0),'test'::regclass);lp | lp_off | lp_flags | lp_len | t_xmin
|t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |  t_bits  | t_oid |                 t_attrs
     

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+----------+-------+-----------------------------------------
1|   8152 |        1 |     34 |    763 |      0 |        0 | (0,1)  |           3 |       2050 |     24 |          |
  | {"\\xffffffff","\\x0200","\\x09313131"} 2 |   8120 |        1 |     32 |    763 |      0 |        0 | (0,2)  |
    3 |       2051 |     24 | 10100000 |       | {"\\x02000000",NULL,"\\x09323232"} 3 |   8080 |        1 |     34 |
763|      0 |        0 | (0,3)  |           3 |       2050 |     24 |          |       |
{"\\x03000000","\\x0800","\\x09333333"}4 |   8048 |        1 |     26 |    763 |      0 |        0 | (0,4)  |
3|       2049 |     24 | 01000000 |       | {NULL,"\\x1000",NULL} 
(4 rows)

For detoasted function you should do something like this:

CREATE EXTENSION pageinspect;

create table test2 (c varchar);

insert into test2 VALUES ('++');
insert into test2 VALUES (repeat('+',2100));

Then  heap_page_item_attrs will show you compressed attr data:

# select * from heap_page_item_attrs(get_raw_page('test2', 0),'test2'::regclass);lp | lp_off | lp_flags | lp_len |
t_xmin| t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
    t_attrs                                     

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+-------------------------------------------------------------------------------
1|   8160 |        1 |     27 |    768 |      0 |        0 | (0,1)  |           1 |       2050 |     24 |        |
| {"\\x072b2b"} 2 |   8096 |        1 |     59 |    769 |      0 |        0 | (0,2)  |           1 |       2050 |
24|        |       | {"\\x8e00000034080000fe2b0f01ff0f01ff0f01ff0f01ff0f01ff0f01ff0f01ff010f01aa"} 
(2 rows)

and heap_page_item_detoast_attrs will show you original data

# select * from heap_page_item_detoast_attrs(get_raw_page('test2', 0),'test2'::regclass);
[will not paste output here it is too long, see it for yourself]


> That's actually where
> documentation, even a draft of documentation helps a lot in the review to
> see if what is expected by the developer matches what the code actually
> does.
>
> > Code has some whitespaces.
> > I've found and removed some. Hope that was all of them...
>
> Yeah, it looks that you took completely rid of them.
>
> In details, this patch introduces two independent concepts:
> - add tuple data as a new bytea column of heap_page_items. This is indeed
> where it should be, and note that this is where the several corruption
> checks are done by checking the state of the tuple data.
Sorry do not understand what do you want to say in the second part of the last
sentence. About corruption checks.

> - add heap_page_item_attrs and heap_page_item_detoast_attrs, which is very
> similar to heap_page_items, at the difference that it can take an OID to be
> able to use a TupleDesc and return a bytea array with the data of each
> attribute.
That's right.

And more:

- heap_page_item_attrs and heap_page_item_detoast_attrs has third optional
attribute, set it to true if you want these functions to report about problems in
WARNING mode. This will allow to force attribute parsing even if some consistency
checks are failed. This is necessary for educational purposes, so everyone can
make fake data and see how postgres will try to parse it. Also this will be needed
for test. So we can test that pageinspect will is properly doing all checks. And it is
much easier to do this when it is done in warning mode.

> Honestly, heap_page_item_attrs and heap_page_item_detoast_attrs are way too
> similar to what heap_page_items does, leading to a code maze that is going
> to make future extensions more difficult, which is what lead to the
> refactoring your patch does.
> Hence, instead of all those complications, why not simply introducing two
> functions that take as input the tuple data and the OID of the relation,
> returning those bytea arrays? It seems to be that this would be a more
> handy interface, and as this is for educational purposes I guess that the
> addition of the overhead induced by the extra function call would be
> acceptable.

When I looked at this task I considered doing the same thing. Bun unfortunately it is
not possible. (Or if be more correct it is possible, but if I do so, it would be hard to us
e it)

The thing is, that to properly split tuple data into attributes, we need some values from
tuple header:

t_infomask2: where postgres store actual number of stored attributes, that may differ
from one in tuple data. That will allow to properly parse tuples after
ALTER TABLE ADD COLUMN when it was done without SET DEFAULT option
t_infomask: has bit that indicates that there is some null attributes in tuple.
t_bits: has a bit mask that shows what attributes were set to null.

So if move tuple data parsing into separate function, then we have to pass these
values alongside the tuple data. This is not convenient at all.
So I did it in a way you see in the patch.


> Actually not two functions, but just one, with an extra flag to be
> able to enforce detoasting on the field values where this can be done.

Yeah, I also thought about it. But did not come to any final conclusion. Should
we add forth argument, that enables detoasting, instead of adding separate
function?

--
Nikolay Shaplov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Waits monitoring
Next
From: Ashutosh Bapat
Date:
Subject: Re: Dependency between bgw_notify_pid and bgw_flags