On Thu, Nov 4, 2021 at 12:47 PM Maxim Boguk <maxim.boguk@gmail.com> wrote:
> now... and yes during the time of error page 59561917 was very close
> to the end of the table.
> There was a high chance (but not 100%) that the corresponding main
> table entry had been inserted during reindex CONCURRENTLY of the toast
> index run.
It might be useful if you located the leaf page that the missing index
tuple is supposed to be on. It's possible that there is a recognizable
pattern. If you knew the block number of the relevant leaf page in the
index already, then you could easily dump the relevant page to a small
file, and share it with us here. The usual procedure is described
here:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump
The tricky part is figuring out which block number is the one of
interest. I can't think of any easy way of doing that in a production
database. The easiest approach I can think of is to use
pg_buffercache. Restart the database (or more likely an instance of
the database that has the problem, but isn't the live production
database). Then write a query like this:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_toast_2624976286 WHERE
chunk_id = 4040061139;
(The BUFFERS stuff is to verify that you got buffer hits.)
Next query the blocks that you see in pg_buffercache:
SELECT relblocknumber FROM pg_buffercache WHERE relfilenode =
pg_relation_filenode('pg_toast.pg_toast_2624976286_index');
Finally, note down the block numbers that the query returns. There
will probably be 3 or 4. Just send me any that are non-0 (that's just
the metapage). I only really care about the leaf page, but I can
figure that part out myself when I have the pages you access.
(It's a pity there isn't a less cumbersome procedure here.)
--
Peter Geoghegan