Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: WIP: WAL prefetch (another approach) |
Date | |
Msg-id | 20200605204057.anopw4rntkg6hsag@development Whole thread Raw |
In response to | Re: WIP: WAL prefetch (another approach) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: WIP: WAL prefetch (another approach)
|
List | pgsql-hackers |
On Fri, Jun 05, 2020 at 10:04:14PM +0200, Tomas Vondra wrote: >On Fri, Jun 05, 2020 at 05:20:52PM +0200, Tomas Vondra wrote: >> >>... >> >>which is not particularly great, I guess. There however seems to be >>something wrong, because with the prefetching I see this in the log: >> >>prefetch: >>2020-06-05 02:47:25.970 CEST 1591318045.970 [22961] LOG: recovery no >>longer prefetching: unexpected pageaddr 108/E8000000 in log segment >>0000000100000108000000FF, offset 0 >> >>prefetch2: >>2020-06-05 15:29:23.895 CEST 1591363763.895 [26676] LOG: recovery no >>longer prefetching: unexpected pageaddr 108/E8000000 in log segment >>000000010000010900000001, offset 0 >> >>Which seems pretty suspicious, but I have no idea what's wrong. I admit >>the archive/restore commands are a bit hacky, but I've only seen this >>with prefetching on the SATA storage, while all other cases seem to be >>just fine. I haven't seen in on NVME (which processes much more WAL). >>And the SATA baseline (no prefetching) also worked fine. >> >>Moreover, the pageaddr value is the same in both cases, but the WAL >>segments are different (but just one segment apart). Seems strange. >> > >I suspected it might be due to a somewhat hackish restore_command that >prefetches some of the WAL segments, so I tried again with a much >simpler restore_command - essentially just: > > restore_command = 'cp /archive/%f %p.tmp && mv %p.tmp %p' > >which I think should be fine for testing purposes. And I got this: > > LOG: recovery no longer prefetching: unexpected pageaddr 108/57000000 > in log segment 0000000100000108000000FF, offset 0 > LOG: restored log file "0000000100000108000000FF" from archive > >which is the same segment as in the earlier examples, but with a >different pageaddr value. Of course, there's no such pageaddr in the WAL >segment (and recovery of that segment succeeds). > >So I think there's something broken ... > BTW in all three cases it happens right after the first restart point in the WAL stream: LOG: restored log file "0000000100000108000000FD" from archive LOG: restartpoint starting: time LOG: restored log file "0000000100000108000000FE" from archive LOG: restartpoint complete: wrote 236092 buffers (22.5%); 0 WAL ... LOG: recovery restart point at 108/FC000028 DETAIL: Last completed transaction was at log time 2020-06-04 15:27:00.95139+02. LOG: recovery no longer prefetching: unexpected pageaddr 108/57000000 in log segment 0000000100000108000000FF, offset 0 LOG: restored log file "0000000100000108000000FF" from archive It looks exactly like this in case of all 3 failures ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: