Thread: Multiple pg_waldump --rmgr options
I wanted to dump all heap WAL records with pg_waldump, so I did this: > $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record > Type N (%) Record size (%) FPI size (%) Combined size (%) > ---- - --- ----------- --- -------- --- ------------- --- > Heap2/PRUNE 268 ( 8.74) 18192 ( 2.73) 0 ( 0.00) 18192 ( 1.74) > Heap2/VACUUM 55 ( 1.79) 4940 ( 0.74) 0 ( 0.00) 4940 ( 0.47) > Heap2/FREEZE_PAGE 277 ( 9.03) 186868 ( 28.03) 0 ( 0.00) 186868 ( 17.86) > Heap2/VISIBLE 467 ( 15.23) 27783 ( 4.17) 376832 ( 99.34) 404615 ( 38.68) > Heap2/MULTI_INSERT 1944 ( 63.38) 354800 ( 53.21) 2520 ( 0.66) 357320 ( 34.16) > Heap2/MULTI_INSERT+INIT 56 ( 1.83) 74152 ( 11.12) 0 ( 0.00) 74152 ( 7.09) > -------- -------- -------- -------- > Total 3067 666735 [63.74%] 379352 [36.26%] 1046087 [100%] > pg_waldump: fatal: error in WAL record at 0/1680118: invalid record length at 0/1680150: wanted 24, got 0 That didn't do what I wanted. It only printed the Heap2 records, not Heap, even though I specified both. The reason is that if you specify multiple --rmgr options, only the last one takes effect. I propose the attached to allow selecting multiple rmgrs, by giving multiple --rmgr options. With that, it works the way I expected: > $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record > Type N (%) Record size (%) FPI size (%) Combined size (%) > ---- - --- ----------- --- -------- --- ------------- --- > Heap2/PRUNE 268 ( 1.77) 18192 ( 0.71) 0 ( 0.00) 18192 ( 0.55) > Heap2/VACUUM 55 ( 0.36) 4940 ( 0.19) 0 ( 0.00) 4940 ( 0.15) > Heap2/FREEZE_PAGE 277 ( 1.83) 186868 ( 7.33) 0 ( 0.00) 186868 ( 5.67) > Heap2/VISIBLE 467 ( 3.09) 27783 ( 1.09) 376832 ( 50.37) 404615 ( 12.27) > Heap2/MULTI_INSERT 1944 ( 12.86) 354800 ( 13.91) 2520 ( 0.34) 357320 ( 10.83) > Heap2/MULTI_INSERT+INIT 56 ( 0.37) 74152 ( 2.91) 0 ( 0.00) 74152 ( 2.25) > Heap/INSERT 9948 ( 65.80) 1433891 ( 56.22) 8612 ( 1.15) 1442503 ( 43.73) > Heap/DELETE 942 ( 6.23) 50868 ( 1.99) 0 ( 0.00) 50868 ( 1.54) > Heap/UPDATE 193 ( 1.28) 101265 ( 3.97) 9556 ( 1.28) 110821 ( 3.36) > Heap/HOT_UPDATE 349 ( 2.31) 36910 ( 1.45) 1300 ( 0.17) 38210 ( 1.16) > Heap/LOCK 209 ( 1.38) 11481 ( 0.45) 316828 ( 42.35) 328309 ( 9.95) > Heap/INPLACE 212 ( 1.40) 44279 ( 1.74) 32444 ( 4.34) 76723 ( 2.33) > Heap/INSERT+INIT 184 ( 1.22) 188803 ( 7.40) 0 ( 0.00) 188803 ( 5.72) > Heap/UPDATE+INIT 15 ( 0.10) 16273 ( 0.64) 0 ( 0.00) 16273 ( 0.49) > -------- -------- -------- -------- > Total 15119 2550505 [77.32%] 748092 [22.68%] 3298597 [100%] > pg_waldump: fatal: error in WAL record at 0/1680150: invalid record length at 0/16801C8: wanted 24, got 0 - Heikki
Attachment
On Tue, May 18, 2021 at 04:50:31PM +0300, Heikki Linnakangas wrote: > I wanted to dump all heap WAL records with pg_waldump, so I did this: > > > $ pg_waldump --rmgr=heap --rmgr=heap2 data/pg_wal/000000010000000000000001 --stat=record > > Type N (%) Record size (%) FPI size (%) Combined size (%) > > ---- - --- ----------- --- -------- --- ------------- --- > > Heap2/PRUNE 268 ( 8.74) 18192 ( 2.73) 0 ( 0.00) 18192 ( 1.74) > > Heap2/VACUUM 55 ( 1.79) 4940 ( 0.74) 0 ( 0.00) 4940 ( 0.47) > > Heap2/FREEZE_PAGE 277 ( 9.03) 186868 ( 28.03) 0 ( 0.00) 186868 ( 17.86) > > Heap2/VISIBLE 467 ( 15.23) 27783 ( 4.17) 376832 ( 99.34) 404615 ( 38.68) > > Heap2/MULTI_INSERT 1944 ( 63.38) 354800 ( 53.21) 2520 ( 0.66) 357320 ( 34.16) > > Heap2/MULTI_INSERT+INIT 56 ( 1.83) 74152 ( 11.12) 0 ( 0.00) 74152 ( 7.09) > > -------- -------- -------- -------- > > Total 3067 666735 [63.74%] 379352 [36.26%] 1046087 [100%] > > pg_waldump: fatal: error in WAL record at 0/1680118: invalid record length at 0/1680150: wanted 24, got 0 > > That didn't do what I wanted. It only printed the Heap2 records, not Heap, > even though I specified both. The reason is that if you specify multiple > --rmgr options, only the last one takes effect. > > I propose the attached to allow selecting multiple rmgrs, by giving multiple > --rmgr options. With that, it works the way I expected: The change and the patch look sensible to me.
At Tue, 18 May 2021 23:23:02 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Tue, May 18, 2021 at 04:50:31PM +0300, Heikki Linnakangas wrote: > > That didn't do what I wanted. It only printed the Heap2 records, not Heap, > > even though I specified both. The reason is that if you specify multiple > > --rmgr options, only the last one takes effect. > > > > I propose the attached to allow selecting multiple rmgrs, by giving multiple > > --rmgr options. With that, it works the way I expected: > > The change and the patch look sensible to me. +1. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, May 19, 2021 at 11:50:52AM +0900, Kyotaro Horiguchi wrote: > At Tue, 18 May 2021 23:23:02 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in >> The change and the patch look sensible to me. > > +1. Agreed. -- Michael
Attachment
> On 18 May 2021, at 15:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > The reason is that if you specify multiple --rmgr options, only the last one takes effect. That's in line with how options are handled for most binaries, so this will go against that. That being said, I don't think thats a problem here really given what this tool is and it's intended usecase. This patch makes the special case "--rmgr=list" a bit more awkward than before IMO, as it breaks the list processing, but nothing we can't live with. > I propose the attached to allow selecting multiple rmgrs I agree with the other +1's in this thread, and am marking this as ready for committer. As a tiny nitpick for readability, I would move this line inside the string comparison case where the rmgr is selected. Not that it makes any difference in practice, but since that's where the filtering is set it seems a hair tidier. + config.filter_by_rmgr_enabled = true; -- Daniel Gustafsson https://vmware.com/
On 28/06/2021 13:34, Daniel Gustafsson wrote: >> On 18 May 2021, at 15:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> The reason is that if you specify multiple --rmgr options, only the last one takes effect. > > That's in line with how options are handled for most binaries, so this will go > against that. That being said, I don't think thats a problem here really given > what this tool is and it's intended usecase. There is some precedent for this with the pg_dump --table option, for example. In general, I think it's weird that the latest option wins. If you specify the same option multiple times, and it's not something like --rmgr or --table where it makes sense, it's most likely user error. Printing an error would be nicer than ignoring all but the last instance. But I'm not going to try changing that now. >> I propose the attached to allow selecting multiple rmgrs > > I agree with the other +1's in this thread, and am marking this as ready for > committer. > > As a tiny nitpick for readability, I would move this line inside the string > comparison case where the rmgr is selected. Not that it makes any difference > in practice, but since that's where the filtering is set it seems a hair > tidier. > + config.filter_by_rmgr_enabled = true; Ok, changed it that way. I tried to be defensive against WAL records with bogus xl_rmid values here: > @@ -1098,8 +1100,9 @@ main(int argc, char **argv) > } > > /* apply all specified filters */ > - if (config.filter_by_rmgr != -1 && > - config.filter_by_rmgr != record->xl_rmid) > + if (config.filter_by_rmgr_enabled && > + (record->xl_rmid < 0 || record->xl_rmid > RM_MAX_ID || > + !config.filter_by_rmgr[record->xl_rmid])) > continue; But looking closer, that's pointless. We use record->xl_rmid directly as array index elsewhere, and that's OK because ValidXLogRecordHeader() checks that xl_rmid <= RM_MAX_ID. And the 'xl_rmid < 0' check is unnecessary because the field is unsigned. So I'll remove those, and commit this tomorrow. - Heikki
> On 30 Jun 2021, at 22:39, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > In general, I think it's weird that the latest option wins. If you specify the same option multiple times, and it's notsomething like --rmgr or --table where it makes sense, it's most likely user error. Printing an error would be nicer thanignoring all but the last instance. But I'm not going to try changing that now. AFAIK, the traditional "defense" for it when building a commandline with scripts which loop over input, to avoid the need for any data structure holding the options for deduplication. No idea how common that is these days, but I've seen it in production in the past for sure. > I tried to be defensive against WAL records with bogus xl_rmid values here: > >> @@ -1098,8 +1100,9 @@ main(int argc, char **argv) >> } >> /* apply all specified filters */ >> - if (config.filter_by_rmgr != -1 && >> - config.filter_by_rmgr != record->xl_rmid) >> + if (config.filter_by_rmgr_enabled && >> + (record->xl_rmid < 0 || record->xl_rmid > RM_MAX_ID || >> + !config.filter_by_rmgr[record->xl_rmid])) >> continue; > > But looking closer, that's pointless. We use record->xl_rmid directly as array index elsewhere, and that's OK because ValidXLogRecordHeader()checks that xl_rmid <= RM_MAX_ID. And the 'xl_rmid < 0' check is unnecessary because the field isunsigned. So I'll remove those, and commit this tomorrow. +1 -- Daniel Gustafsson https://vmware.com/