Thread: [BUGS] BUG #14687: pg_xlogdump does only count "main data" for record lengthand leading to incorrect statistics

The following bug has been logged on the website:

Bug reference:      14687
Logged by:          Chen huajun
Email address:      chjischj@163.com
PostgreSQL version: 9.5.0
Operating system:   ALL
Description:

Currently, record length in pg_xlogdump/pg_waldump's output only count
length of "main data",
This results in pg_xlogdump/pg_waldump can not accurately count the size of
the various types of WAL records.

for example
   [postgres@node1 ~]$ pg_waldump --stats=record
pg10data/pg_wal/00000001000000000000001F   Type                                           N      (%)
Record size      (%)             FPI size      (%)        Combined size
(%)   ----                                           -      ---
-----------      ---             --------      ---        -------------
---   XLOG/FPI                                     939 ( 34.68)
22536 ( 27.15)              6897764 ( 99.75)              6920300 ( 98.89)   Transaction/COMMIT
   6 (  0.22)                976 (  1.18)                    0 (  0.00)                  976 (  0.01)   Storage/CREATE
                              3 (  0.11)                120 (  0.14)                    0 (  0.00)                  120
( 0.00)   Standby/LOCK                                   6 (  0.22)                240 (  0.29)                    0 (
0.00)                 240 (  0.00)   Standby/INVALIDATIONS                          3 (  0.11)                264 (
0.32)                   0 (  0.00)                  264 (  0.00)   Heap2/VISIBLE                                  1 (
0.04)                29 (  0.03)                 8192 (  0.12)                 8221 (  0.12)   Heap2/MULTI_INSERT
                   165 (  6.09)                
14606 ( 17.60)                    0 (  0.00)                14606 (  0.21)   Heap2/MULTI_INSERT+INIT
1493 ( 55.13)                
41804 ( 50.37)                    0 (  0.00)                41804 (  0.60)   Heap/INSERT
  29 (  1.07)                783 (  0.94)                    0 (  0.00)                  783 (  0.01)   Heap/INPLACE
                              7 (  0.26)                182 (  0.22)                    0 (  0.00)                  182
( 0.00)   Btree/INSERT_LEAF                             56 (  2.07)               1456 (  1.75)                 8888 (
0.13)               10344 (  0.15)                                           --------                      
--------                      --------                      --------   Total                                       2708
                       
82996 [1.19%]               6914844 [98.81%]              6997840 [100%]

Total Combined size above is only 6997840 byte,but should be 16MB.


So, I modified one line of code in pg_waldump.c

src/bin/pg_waldump/pg_waldump.c:390
       rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
       /*        * Calculate the amount of FPI data in the record.        *        * XXX: We peek into xlogreader's
privatedecoded backup blocks for 
the        * bimg_len indicating the length of FPI data. It doesn't seem worth
it to        * add an accessor macro for this.        */       fpi_len = 0;       for (block_id = 0; block_id <=
record->max_block_id;block_id++)       {               if (XLogRecHasBlockImage(record, block_id))
fpi_len += record->blocks[block_id].bimg_len;       }
 
==>              /*        * Calculate the amount of FPI data in the record.        *        * XXX: We peek into
xlogreader'sprivate decoded backup blocks for 
the        * bimg_len indicating the length of FPI data. It doesn't seem worth
it to        * add an accessor macro for this.        */       fpi_len = 0;       for (block_id = 0; block_id <=
record->max_block_id;block_id++)       {               if (XLogRecHasBlockImage(record, block_id))
fpi_len += record->blocks[block_id].bimg_len;       }       rec_len = XLogRecGetTotalLen(record) - fpi_len;       
 

and got more precise output(correct FPI ratio is 41.37% instead of 98.81%)
   [postgres@node1 ~]$ pg_waldump  --stats=record
pg10data/pg_wal/00000001000000000000001F   Type                                           N      (%)
Record size      (%)             FPI size      (%)        Combined size
(%)   ----                                           -      ---
-----------      ---             --------      ---        -------------
---   XLOG/FPI                                     939 ( 34.68)
46011 (  0.47)              6897764 ( 99.75)              6943775 ( 41.54)   Transaction/COMMIT
   6 (  0.22)                988 (  0.01)                    0 (  0.00)                  988 (  0.01)   Storage/CREATE
                              3 (  0.11)                126 (  0.00)                    0 (  0.00)                  126
( 0.00)   Standby/LOCK                                   6 (  0.22)                252 (  0.00)                    0 (
0.00)                 252 (  0.00)   Standby/INVALIDATIONS                          3 (  0.11)                270 (
0.00)                   0 (  0.00)                  270 (  0.00)   Heap2/VISIBLE                                  1 (
0.04)                64 (  0.00)                 8192 (  0.12)                 8256 (  0.05)   Heap2/MULTI_INSERT
                   165 (  6.09)               
547329 (  5.59)                    0 (  0.00)               547329 (
3.27)   Heap2/MULTI_INSERT+INIT                     1493 ( 55.13)
9194033 ( 93.82)                    0 (  0.00)              9194033 (
55.01)   Heap/INSERT                                   29 (  1.07)               5655 (  0.06)                    0 (
0.00)                5655 (  0.03)   Heap/INPLACE                                   7 (  0.26)               1493 (
0.02)                   0 (  0.00)                 1493 (  0.01)   Btree/INSERT_LEAF                             56 (
2.07)              3723 (  0.04)                 8888 (  0.13)                12611 (  0.08)
              --------                      
--------                      --------                      --------   Total                                       2708
                     
9799944 [58.63%]              6914844 [41.37%]             16714788 [100%]



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Hi,

On 2017-06-03 16:59:39 +0000, chjischj@163.com wrote:
> Currently, record length in pg_xlogdump/pg_waldump's output only count
> length of "main data",

Good catch!  Will fix.

- Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Hi Heikki,

On 2017-06-04 16:10:12 -0700, Andres Freund wrote:
> On 2017-06-03 16:59:39 +0000, chjischj@163.com wrote:
> > Currently, record length in pg_xlogdump/pg_waldump's output only count
> > length of "main data",
> 
> Good catch!  Will fix.

I notice that this seems to originate in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebc2b681b8ffa597137aa7a19619e6a21176e928
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2c03216d831160bedd72d45f712601b6f7d03f1c
and that it's not just stats, but also plain xlogdump displaying wrong
data.

Do you see any better way to get at the !fpi len than to iterate over
the blocks and sum up the image length?  It's not really accurate to
display len (rec/tot) as total-fpi, total, but I don't have a better
backward-compatible idea?  Actually, I don't even have a better idea for
master, so maybe it's ok as proposed?

- Andres

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment
On 06/05/2017 05:25 AM, Andres Freund wrote:
> Hi Heikki,
>
> On 2017-06-04 16:10:12 -0700, Andres Freund wrote:
>> On 2017-06-03 16:59:39 +0000, chjischj@163.com wrote:
>>> Currently, record length in pg_xlogdump/pg_waldump's output only count
>>> length of "main data",
>>
>> Good catch!  Will fix.
>
> I notice that this seems to originate in
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebc2b681b8ffa597137aa7a19619e6a21176e928
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2c03216d831160bedd72d45f712601b6f7d03f1c
> and that it's not just stats, but also plain xlogdump displaying wrong
> data.
>
> Do you see any better way to get at the !fpi len than to iterate over
> the blocks and sum up the image length?  It's not really accurate to
> display len (rec/tot) as total-fpi, total, but I don't have a better
> backward-compatible idea?  Actually, I don't even have a better idea for
> master, so maybe it's ok as proposed?

Your patch seems OK to me. Defining non-FPI size as (total_len - 
fpi_len) seems correct to me, I'm not sure what other definition you 
could have.

> +    /*
> +     * Calculate the length of the record as the total length - the length of
> +     * all the block images.  That's commonly more accurate than just using
> +     * XLogRecGetDataLen(), because that's only the main record, not per-block
> +     * data registered wity XLogRegisterBufData().
> +     */

s/wity/with/. But I would leave out the whole sentence starting with 
"That's commonly more accurate ...". Using XLogRecGetDataLen() would  be 
(and was) just plain wrong.

- Heikki



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-06-05 11:23:43 +0300, Heikki Linnakangas wrote:
> On 06/05/2017 05:25 AM, Andres Freund wrote:
> > Hi Heikki,
> > Do you see any better way to get at the !fpi len than to iterate over
> > the blocks and sum up the image length?  It's not really accurate to
> > display len (rec/tot) as total-fpi, total, but I don't have a better
> > backward-compatible idea?  Actually, I don't even have a better idea for
> > master, so maybe it's ok as proposed?
> 
> Your patch seems OK to me. Defining non-FPI size as (total_len - fpi_len)
> seems correct to me, I'm not sure what other definition you could have.

I'm not sure, that's why I chose the above.  But you could argue that
the block ids etc. shouldn't be counted as part of either, or that the
previous displaying of record len / total len doesn't make much sense
anymore.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Hi,

On 2017-06-03 16:59:39 +0000, chjischj@163.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14687
> Logged by:          Chen huajun
> Email address:      chjischj@163.com
> PostgreSQL version: 9.5.0
> Operating system:   ALL
> Description:        

Thanks for the report, I've pushed a fix to the affected releases.
It'll be included in the next set of back-branch releases.

Regards,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs