回复:POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From 孔凡深(云梳)
Subject 回复:POC: Cleaning up orphaned files using undo logs
Date
Msg-id 09930dc0-aa70-43a2-8b8f-1ea534e1b473.fanshen.kfs@alibaba-inc.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
Responses Re: 回复:POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
Hi, Antonin. I am more interested in zheap. Recently reviewing the patch you submitted.
When I use pg_undodump-tool to dump the undo page chunk, I found that some chunk header is abnormal. 
After reading the relevant codes in 0006-The-initial-implementation-of-the-pg_undodump-tool.patch, 
I feel that there is a bug  in the function parse_undo_page. 
According to my understanding The size in chunk Header includes chunk header + type-specific header + undo record. 
If the entire chunk spans pages,  also need to add the size of the page header. 
But I found Now only the scenario of chunk header spanning pages is considered, and the scenario of type-specific header is not considered.
    /*
    * The page header size must eventually be subtracted from
    * chunk_bytes_left because it's included in the chunk size.  However,
    * since chunk_bytes_left is unsigned, we do not subtract anything from it
    * if it's still zero.  This can happen if we're still reading the chunk
    * header or the type-specific header. (The underflow should not be a
    * problem because the chunk size will eventually be added, but it seems
    * ugly and it makes debugging less convenient.)
    */
    if (s->chunk_bytes_left > 0)
    {
        /* Chunk should not end within page header. */
        Assert(s->chunk_bytes_left >= SizeOfUndoPageHeaderData);
        s->chunk_bytes_left -= SizeOfUndoPageHeaderData;
        s->chunk_bytes_to_skip = 0;
    }
    /* Processing the chunk header?
    */
    else if (s->chunk_hdr_bytes_left > 0 )
        s->chunk_bytes_to_skip = SizeOfUndoPageHeaderData;
    ------------------------------------------------
Should this code be fixed as this?When the type-specific header spans the undo page, the page header should be skipped.

else if (s->chunk_hdr_bytes_left > 0 || s->type_hdr_bytes_left > 0)
    s->chunk_bytes_to_skip = SizeOfUndoPageHeaderData;

------------------------------------------------------------------
发件人:Antonin Houska <ah@cybertec.at>
发送时间:2022年3月29日(星期二) 17:25
收件人:Dmitry Dolgov <9erthalion6@gmail.com>; pgsql-hackers <pgsql-hackers@postgresql.org>
主 题:Re: POC: Cleaning up orphaned files using undo logs


The cfbot complained that the patch series no longer applies, so I've rebased
it and also tried to make sure that the other flags become green.

One particular problem was that pg_upgrade complained that "live undo data"
remains in the old cluster. I found out that the temporary undo log causes the
problem, so I've adjusted the query in check_for_undo_data() accordingly until
the problem gets fixed properly.

The problem of the temporary undo log is that it's loaded into local buffers
and that backend can exit w/o flushing local buffers to disk, and thus we are
not guaranteed to find enough information when trying to discard the undo log
the backend wrote. I'm thinking about the following solutions:

1. Let the backend manage temporary undo log on its own (even the slot
  metadata would stay outside the shared memory, and in particular the
  insertion pointer could start from 1 for each session) and remove the
  segment files at the same moment the temporary relations are removed.

  However, by moving the temporary undo slots away from the shared memory,
  computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
  be affected. It might seem that a transaction which only writes undo log
  for temporary relations does not need to affect oldestFullXidHavingUndo,
  but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
  prevents transactions to be truncated from the CLOG too early, I wonder if
  the following is possible (This scenario is only applicable to the zheap
  storage engine [1], which is not included in this patch, but should already
  be considered.):

  A transaction creates a temporary table, does some (many) changes and then
  gets rolled back. The undo records are being applied and it takes some
  time. Since XID of the transaction did not affect oldestFullXidHavingUndo,
  the XID can disappear from the CLOG due to truncation. However zundo.c in
  [1] indicates that the transaction status *is* checked during undo
  execution, so we might have a problem.

  Or do I miss something? UndoDiscard() in zheap seems to ignore temporary
  undo:

    /* We can't process temporary undo logs. */
  if (log->meta.persistence == UNDO_TEMP)
  continue;

2. Do not load the temporary undo into local buffers. If it's always in the
  shared buffers, we should never see incomplete data when trying to discard
  undo. In this case, persistence levels UNDOPERSISTENCE_UNLOGGED and
  UNDOPERSISTENCE_TEMP could be merged into a single level.

3. Implement the discarding in another way, but I don't have new idea right
  now.

Suggestions are welcome.

[1] https://github.com/EnterpriseDB/zheap/tree/master

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Julien Rouhaud
Date:
Subject: Re: Add parameter jit_warn_above_fraction