Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Kuntal Ghosh
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAGz5QCL15Mm55NumtnkhbRHC_Mvr=XenNbSVFHZy+EC+P1BPPQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (vignesh C <vignesh21@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hello Thomas,

Here are some review comments on 0003-Add-undo-log-manager.patch. I've
tried to avoid duplicate comments as much as possible.

1. In UndoLogAllocate,
+ * time this backend as needed to write to an undo log at all or because
s/as/has

+ * Maintain our tracking of the and the previous transaction start
Do you mean current log's transaction start as well?

2. In UndoLogAllocateInRecovery,
we try to find the current log from the first undo buffer. So, after a
log switch, we always have to register at least one buffer from the
current undo log first. If we're updating something in the previous
log, the respective buffer should be registered after that. I think we
should document this in the comments.

3. In UndoLogGetOldestRecord(UndoLogNumber logno, bool *full),
it seems the 'full' parameter is not used anywhere. Do we still need this?

+ /* It's been recycled.  SO it must have been entirely discarded. */
s/SO/So

4. In CleanUpUndoCheckPointFiles,
we can emit a debug2 message with something similar to : 'removed
unreachable undo metadata files'

+ if (unlink(path) != 0)
+ elog(ERROR, "could not unlink file \"%s\": %m", path);
according to my observation, whenever we deal with a file operation,
we usually emit a ereport message with errcode_for_file_access().
Should we change it to ereport? There are other file operations as
well including read(), OpenTransientFile() etc.

5. In CheckPointUndoLogs,
+ /* Capture snapshot while holding each mutex. */
+ LWLockAcquire(&slot->mutex, LW_EXCLUSIVE);
+ serialized[num_logs++] = slot->meta;
+ LWLockRelease(&slot->mutex);
why do we need an exclusive lock to read something from the slot? A
share lock seems to be sufficient.

pgstat_report_wait_start(WAIT_EVENT_UNDO_CHECKPOINT_SYNC) is called
after pgstat_report_wait_start(WAIT_EVENT_UNDO_CHECKPOINT_WRITE)
without calling     pgstat_report_wait_end(). I think you've done the
same to avoid an extra function call. But, it differs from other
places in the PG code. Perhaps, we should follow this approach
everywhere.

6. In StartupUndoLogs,
+ if (fd < 0)
+ elog(ERROR, "cannot open undo checkpoint snapshot \"%s\": %m", path);
assuming your agreement upon changing above elog to ereport, the
message should be more user friendly. May be something like 'cannot
open pg_undo file'.

+ if ((size = read(fd, &slot->meta, sizeof(slot->meta))) != sizeof(slot->meta))
The usage of size of doesn't look like a problem. But, we can save
some extra padding bytes at the end if we use (offsetof + sizeof)
approach similar to other places in PG.

7. In free_undo_log_slot,
+ /*
+ * When removing an undo log from a slot in shared memory, we acquire
+ * UndoLogLock, log->mutex and log->discard_lock, so that other code can
+ * hold any one of those locks to prevent the slot from being recycled.
+ */
+ LWLockAcquire(UndoLogLock, LW_EXCLUSIVE);
+ LWLockAcquire(&slot->mutex, LW_EXCLUSIVE);
+ Assert(slot->logno != InvalidUndoLogNumber);
+ slot->logno = InvalidUndoLogNumber;
+ memset(&slot->meta, 0, sizeof(slot->meta));
+ LWLockRelease(&slot->mutex);
+ LWLockRelease(UndoLogLock);
you've not taken the discard_lock as mentioned in the comment.

8. In find_undo_log_slot,
+ * 1.  If the calling code knows that it is attached to this lock or is the
s/lock/slot

+ * 2.  All other code should acquire log->mutex before accessing any members,
+ * and after doing so, check that the logno hasn't moved.  If it is not, the
+ * entire undo log must be assumed to be discarded (as if this function
+ * returned NULL) and the caller must behave accordingly.
Perhaps, you meant '..check that the logno remains same. If it is not..'.

+ /*
+ * If we didn't find it, then it must already have been entirely
+ * discarded.  We create a negative cache entry so that we can answer
+ * this question quickly next time.
+ *
+ * TODO: We could track the lowest known undo log number, to reduce
+ * the negative cache entry bloat.
+ */
This is an interesting thought. But, I'm wondering how we are going to
search the discarded logno in the simple hash. I guess that's why it's
in the TODO list.

9. In attach_undo_log,
+ * For now we have a simple linked list of unattached undo logs for each
+ * persistence level.  We'll grovel though it to find something for the
+ * tablespace you asked for.  If you're not using multiple tablespaces
s/though/through

+ if (slot == NULL)
+ {
+ if (UndoLogShared->next_logno > MaxUndoLogNumber)
+ {
+ /*
+ * You've used up all 16 exabytes of undo log addressing space.
+ * This is a difficult state to reach using only 16 exabytes of
+ * WAL.
+ */
+ elog(ERROR, "undo log address space exhausted");
+ }
looks like a potential unlikely() condition.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Initdb failure
Next
From: Thomas Munro
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs