Thread: Multiple pg_waldump --rmgr options

Multiple pg_waldump --rmgr options

From
Heikki Linnakangas
Date:
I wanted to dump all heap WAL records with pg_waldump, so I did this:

> $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record
> Type                                           N      (%)          Record size      (%)             FPI size      (%)
      Combined size      (%)
 
> ----                                           -      ---          -----------      ---             --------      ---
      -------------      ---
 
> Heap2/PRUNE                                  268 (  8.74)                18192 (  2.73)                    0 (  0.00)
              18192 (  1.74)
 
> Heap2/VACUUM                                  55 (  1.79)                 4940 (  0.74)                    0 (  0.00)
               4940 (  0.47)
 
> Heap2/FREEZE_PAGE                            277 (  9.03)               186868 ( 28.03)                    0 (  0.00)
             186868 ( 17.86)
 
> Heap2/VISIBLE                                467 ( 15.23)                27783 (  4.17)               376832 ( 99.34)
             404615 ( 38.68)
 
> Heap2/MULTI_INSERT                          1944 ( 63.38)               354800 ( 53.21)                 2520 (  0.66)
             357320 ( 34.16)
 
> Heap2/MULTI_INSERT+INIT                       56 (  1.83)                74152 ( 11.12)                    0 (  0.00)
              74152 (  7.09)
 
>                                         --------                      --------                      --------
           --------
 
> Total                                       3067                        666735 [63.74%]               379352 [36.26%]
            1046087 [100%]
 
> pg_waldump: fatal: error in WAL record at 0/1680118: invalid record length at 0/1680150: wanted 24, got 0

That didn't do what I wanted. It only printed the Heap2 records, not 
Heap, even though I specified both. The reason is that if you specify 
multiple --rmgr options, only the last one takes effect.

I propose the attached to allow selecting multiple rmgrs, by giving 
multiple --rmgr options. With that, it works the way I expected:

> $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record
> Type                                           N      (%)          Record size      (%)             FPI size      (%)
      Combined size      (%)
 
> ----                                           -      ---          -----------      ---             --------      ---
      -------------      ---
 
> Heap2/PRUNE                                  268 (  1.77)                18192 (  0.71)                    0 (  0.00)
              18192 (  0.55)
 
> Heap2/VACUUM                                  55 (  0.36)                 4940 (  0.19)                    0 (  0.00)
               4940 (  0.15)
 
> Heap2/FREEZE_PAGE                            277 (  1.83)               186868 (  7.33)                    0 (  0.00)
             186868 (  5.67)
 
> Heap2/VISIBLE                                467 (  3.09)                27783 (  1.09)               376832 ( 50.37)
             404615 ( 12.27)
 
> Heap2/MULTI_INSERT                          1944 ( 12.86)               354800 ( 13.91)                 2520 (  0.34)
             357320 ( 10.83)
 
> Heap2/MULTI_INSERT+INIT                       56 (  0.37)                74152 (  2.91)                    0 (  0.00)
              74152 (  2.25)
 
> Heap/INSERT                                 9948 ( 65.80)              1433891 ( 56.22)                 8612 (  1.15)
            1442503 ( 43.73)
 
> Heap/DELETE                                  942 (  6.23)                50868 (  1.99)                    0 (  0.00)
              50868 (  1.54)
 
> Heap/UPDATE                                  193 (  1.28)               101265 (  3.97)                 9556 (  1.28)
             110821 (  3.36)
 
> Heap/HOT_UPDATE                              349 (  2.31)                36910 (  1.45)                 1300 (  0.17)
              38210 (  1.16)
 
> Heap/LOCK                                    209 (  1.38)                11481 (  0.45)               316828 ( 42.35)
             328309 (  9.95)
 
> Heap/INPLACE                                 212 (  1.40)                44279 (  1.74)                32444 (  4.34)
              76723 (  2.33)
 
> Heap/INSERT+INIT                             184 (  1.22)               188803 (  7.40)                    0 (  0.00)
             188803 (  5.72)
 
> Heap/UPDATE+INIT                              15 (  0.10)                16273 (  0.64)                    0 (  0.00)
              16273 (  0.49)
 
>                                         --------                      --------                      --------
           --------
 
> Total                                      15119                       2550505 [77.32%]               748092 [22.68%]
            3298597 [100%]
 
> pg_waldump: fatal: error in WAL record at 0/1680150: invalid record length at 0/16801C8: wanted 24, got 0

- Heikki

Attachment

Re: Multiple pg_waldump --rmgr options

From
Julien Rouhaud
Date:
On Tue, May 18, 2021 at 04:50:31PM +0300, Heikki Linnakangas wrote:
> I wanted to dump all heap WAL records with pg_waldump, so I did this:
> 
> > $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record
> > Type                                           N      (%)          Record size      (%)             FPI size
(%)       Combined size      (%)
 
> > ----                                           -      ---          -----------      ---             --------
---       -------------      ---
 
> > Heap2/PRUNE                                  268 (  8.74)                18192 (  2.73)                    0 (
0.00)               18192 (  1.74)
 
> > Heap2/VACUUM                                  55 (  1.79)                 4940 (  0.74)                    0 (
0.00)                4940 (  0.47)
 
> > Heap2/FREEZE_PAGE                            277 (  9.03)               186868 ( 28.03)                    0 (
0.00)              186868 ( 17.86)
 
> > Heap2/VISIBLE                                467 ( 15.23)                27783 (  4.17)               376832 (
99.34)              404615 ( 38.68)
 
> > Heap2/MULTI_INSERT                          1944 ( 63.38)               354800 ( 53.21)                 2520 (
0.66)              357320 ( 34.16)
 
> > Heap2/MULTI_INSERT+INIT                       56 (  1.83)                74152 ( 11.12)                    0 (
0.00)               74152 (  7.09)
 
> >                                         --------                      --------                      --------
             --------
 
> > Total                                       3067                        666735 [63.74%]               379352
[36.26%]             1046087 [100%]
 
> > pg_waldump: fatal: error in WAL record at 0/1680118: invalid record length at 0/1680150: wanted 24, got 0
> 
> That didn't do what I wanted. It only printed the Heap2 records, not Heap,
> even though I specified both. The reason is that if you specify multiple
> --rmgr options, only the last one takes effect.
> 
> I propose the attached to allow selecting multiple rmgrs, by giving multiple
> --rmgr options. With that, it works the way I expected:

The change and the patch look sensible to me.



Re: Multiple pg_waldump --rmgr options

From
Kyotaro Horiguchi
Date:
At Tue, 18 May 2021 23:23:02 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Tue, May 18, 2021 at 04:50:31PM +0300, Heikki Linnakangas wrote:
> > That didn't do what I wanted. It only printed the Heap2 records, not Heap,
> > even though I specified both. The reason is that if you specify multiple
> > --rmgr options, only the last one takes effect.
> > 
> > I propose the attached to allow selecting multiple rmgrs, by giving multiple
> > --rmgr options. With that, it works the way I expected:
> 
> The change and the patch look sensible to me.

+1.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Multiple pg_waldump --rmgr options

From
Michael Paquier
Date:
On Wed, May 19, 2021 at 11:50:52AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 18 May 2021 23:23:02 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
>> The change and the patch look sensible to me.
>
> +1.

Agreed.
--
Michael

Attachment

Re: Multiple pg_waldump --rmgr options

From
Daniel Gustafsson
Date:
> On 18 May 2021, at 15:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> The reason is that if you specify multiple --rmgr options, only the last one takes effect.

That's in line with how options are handled for most binaries, so this will go
against that.  That being said, I don't think thats a problem here really given
what this tool is and it's intended usecase.

This patch makes the special case "--rmgr=list" a bit more awkward than before
IMO, as it breaks the list processing, but nothing we can't live with.

> I propose the attached to allow selecting multiple rmgrs

I agree with the other +1's in this thread, and am marking this as ready for
committer.

As a tiny nitpick for readability, I would move this line inside the string
comparison case where the rmgr is selected.  Not that it makes any difference
in practice, but since that's where the filtering is set it seems a hair
tidier.
+      config.filter_by_rmgr_enabled = true;

--
Daniel Gustafsson        https://vmware.com/




Re: Multiple pg_waldump --rmgr options

From
Heikki Linnakangas
Date:
On 28/06/2021 13:34, Daniel Gustafsson wrote:
>> On 18 May 2021, at 15:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> 
>> The reason is that if you specify multiple --rmgr options, only the last one takes effect.
> 
> That's in line with how options are handled for most binaries, so this will go
> against that.  That being said, I don't think thats a problem here really given
> what this tool is and it's intended usecase.

There is some precedent for this with the pg_dump --table option, for 
example.

In general, I think it's weird that the latest option wins. If you 
specify the same option multiple times, and it's not something like 
--rmgr or --table where it makes sense, it's most likely user error. 
Printing an error would be nicer than ignoring all but the last 
instance. But I'm not going to try changing that now.

>> I propose the attached to allow selecting multiple rmgrs
> 
> I agree with the other +1's in this thread, and am marking this as ready for
> committer.
> 
> As a tiny nitpick for readability, I would move this line inside the string
> comparison case where the rmgr is selected.  Not that it makes any difference
> in practice, but since that's where the filtering is set it seems a hair
> tidier.
> +      config.filter_by_rmgr_enabled = true;

Ok, changed it that way.

I tried to be defensive against WAL records with bogus xl_rmid values here:

> @@ -1098,8 +1100,9 @@ main(int argc, char **argv)
>                 }
>  
>                 /* apply all specified filters */
> -               if (config.filter_by_rmgr != -1 &&
> -                       config.filter_by_rmgr != record->xl_rmid)
> +               if (config.filter_by_rmgr_enabled &&
> +                       (record->xl_rmid < 0 || record->xl_rmid > RM_MAX_ID ||
> +                        !config.filter_by_rmgr[record->xl_rmid]))
>                         continue;

But looking closer, that's pointless. We use record->xl_rmid directly as 
array index elsewhere, and that's OK because ValidXLogRecordHeader() 
checks that xl_rmid <= RM_MAX_ID. And the 'xl_rmid < 0' check is 
unnecessary because the field is unsigned. So I'll remove those, and 
commit this tomorrow.

- Heikki



Re: Multiple pg_waldump --rmgr options

From
Daniel Gustafsson
Date:
> On 30 Jun 2021, at 22:39, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> In general, I think it's weird that the latest option wins. If you specify the same option multiple times, and it's
notsomething like --rmgr or --table where it makes sense, it's most likely user error. Printing an error would be nicer
thanignoring all but the last instance. But I'm not going to try changing that now. 

AFAIK, the traditional "defense" for it when building a commandline with
scripts which loop over input, to avoid the need for any data structure holding
the options for deduplication.  No idea how common that is these days, but I've
seen it in production in the past for sure.

> I tried to be defensive against WAL records with bogus xl_rmid values here:
>
>> @@ -1098,8 +1100,9 @@ main(int argc, char **argv)
>>                }
>>                 /* apply all specified filters */
>> -               if (config.filter_by_rmgr != -1 &&
>> -                       config.filter_by_rmgr != record->xl_rmid)
>> +               if (config.filter_by_rmgr_enabled &&
>> +                       (record->xl_rmid < 0 || record->xl_rmid > RM_MAX_ID ||
>> +                        !config.filter_by_rmgr[record->xl_rmid]))
>>                        continue;
>
> But looking closer, that's pointless. We use record->xl_rmid directly as array index elsewhere, and that's OK because
ValidXLogRecordHeader()checks that xl_rmid <= RM_MAX_ID. And the 'xl_rmid < 0' check is unnecessary because the field
isunsigned. So I'll remove those, and commit this tomorrow. 

+1

--
Daniel Gustafsson        https://vmware.com/