On 29.10.2021 1:01, Andres Freund wrote:
>> The issue manifested again earlier today *after* a REINDEX followed by
>> enabling WAL replica logging on the 24th of October. I saved a snapshot of
>> the filesystem holding the data directory. Would that be useful for further
>> analysis?
> Yes, that's *quite* useful. I assume you can't just share that snapshot?
I am afraid it contains personal data (the mwuser table with e-mail
addresses, passwords, and so on) for multiple different MediaWiki
instances' databases. I will look into scrubbing that kind of data out
later today. I assume dropping the other databases from the cluster
should be fine and will not affect further analysis?
With the personal data scrubbed I will likely be able to provide SSH
access (with su/sudo available) to the VM if needed, though this will
take time (I will need to make a DMZ for that VM). Please inform me if
this would be desirable.
> Once we identified an affected heap and index page with the corruption, we
> should use pg_waldump to scan for all changes to that table.
>
> Do you have the log file(s) from between the 24th and now? That might give us
> a good starting point for the LSN range to scan.
There are multiple WAL log files, the first of them with the timestamp
of Oct 25 09:45.
I am currently moving the snapshot over from my server to the VM I made
for this investigation. I will look into pg_waldump documentation as
soon as possible; I have not had to deal with WAL logs before.
P. S. To possibly make some things simpler: I am on #postgresql on
Libera as Remilia (or IijimaYun in case of disconnects) and am generally
available from 06:30 UTC to around 21:00 UTC.
--
K. R.