Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAA4eK1KhcJi22uu2dg6yXw8-THBVJaWHpSqPWXp-ija4NCF_pg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Tue, Jul 30, 2019 at 12:18 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jul 23, 2019 at 10:42 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > Yes, I also think that the function would error out only because of > > can't-happen cases, like "too many locks taken" or "out of binary heap > > slots" or "out of memory" (this last one is not such a can't happen > > case). These cases happen probably due to some bugs, I suppose. But I > > was wondering : Generally when the code errors out with such > > can't-happen elog() calls, worst thing that happens is that the > > transaction gets aborted. Whereas, in this case, the worst thing that > > could happen is : the undo action would never get executed, which > > means selects for this tuple will keep on accessing the undo log ? > > This does not sound like any data consistency issue, so we should be > > fine after all ? > > I don't think so. Every XID present in undo has to be something we > can look up in CLOG to figure out which transactions are aborted and > which transactions are committed, so that we know which transactions > need undo. If we forget to undo the transaction, we can't discard it, > which means we can't advance the CLOG transaction horizon, which means > we'll eventually start failing to assign XIDs, leading to a refusal of > all write transactions. Oops. > > More generally, it's not OK for the generic undo layer to make > assumptions about whether the operations performed by the undo > handlers are essential or not. We don't want to impose a design > constraint the undo can only be used for things that are not actually > critical, because that will make it hard to write AMs that use it. > And there's no reason to live with such a design constraint anyway, > because, as noted above, CLOG truncation requires it. > > More generally still, some can't-happen situations should be checked > via Assert() and others via elog(). For example, consider some code > that looks up a syscache tuple and pulls data from the returned tuple. > If the code that handles DDL is written in such a way that the tuple > should always exist, then this is a can't-happen situation, but > generally the code checks this via elog(), not Assert(), because it > could also happen due to the catalog contents being corrupted. If > Assert() were used, the checks would not run in production builds, and > a corrupt catalog would lead to a seg fault. An elog() is much > friendlier. As a general principle, when a certain thing ought to > always be true, but it being true depends on a whole lot of > assumptions elsewhere in the code, and especially if it also depends > on assumptions like "the database is not corrupted," I think elog() is > preferable. Assert() is better for things that are more localized and > that really can't go wrong for any reason other than a bug. In this > case, I think I would tend towards elog(PANIC), but it's arguable. > Agreed, elog(PANIC) seems like a better way for this as compared to Assert. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: