Re: Is it OK to perform logging while holding a LWLock? - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Is it OK to perform logging while holding a LWLock?
Date
Msg-id CAD21AoBSABL10fdNniumcWAGuVGY8=+mNQd1whn6FGH-UP834A@mail.gmail.com
Whole thread
In response to Re: Is it OK to perform logging while holding a LWLock?  (Chao Li <li.evan.chao@gmail.com>)
Responses Re: Is it OK to perform logging while holding a LWLock?
Re: Is it OK to perform logging while holding a LWLock?
List pgsql-hackers
Hi,

On Wed, Feb 11, 2026 at 5:07 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Feb 11, 2026, at 11:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Chao Li <li.evan.chao@gmail.com> writes:
> >> As $SUBJECT says, my understanding is no.
> >
> > It's not a great idea, but I'm not sure it's fatal.  There are places
> > that hold LWLocks for awhile.
> >
> >> I think LWLocks are generally only held for a very short duration,
> >> like a few CPU cycles to read or modify some shared data,
> >
> > Spinlocks are treated that way, but we're willing to hold LWLocks
> > longer.  The main thing I'd be concerned about is that there is no
> > deadlock-detection infrastructure for LWLocks, so you'd better be
> > darn certain there is no possibility of deadlock.  That usually
> > means you want to limit the extent of code that could run while
> > you're holding the lock.
> >
> > In your specific example, the thing I'd be afraid of is that an
> > errcontext callback might do something you're not expecting.
> > We don't forbid errcontext callbacks from doing catalog lookups,
> > for instance.  So on the whole I agree with this patch, with
> > or without any concrete example that fails.
> >
> > regards, tom lane
>
> As Tom has agreed with this patch, added to CF for tracking: https://commitfest.postgresql.org/patch/6477/
>

While I agree with the concern Tom pointed out, I can find that there
are other places where we do ereport() while holding a lwlock. For
instance:

src/backend/access/transam/multixact.c:
    else if (!find_multixact_start(newOldestMulti, &newOldestOffset))
    {
        ereport(LOG,
                (errmsg("cannot truncate up to MultiXact %u because it
does not exist on disk, skipping truncation",
                        newOldestMulti)));
        LWLockRelease(MultiXactTruncationLock);
        return;
    }

src/backend/commands/vacuum.c:
    if (frozenAlreadyWrapped)
    {
        ereport(WARNING,
                (errmsg("some databases have not been vacuumed in over
2 billion transactions"),
                 errdetail("You might have already suffered
transaction-wraparound data loss.")));
        LWLockRelease(WrapLimitsVacuumLock);
        return;
    }

src/backend/postmaster/autovacuum.c:
                LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);

                /*
                 * No other process can put a worker in starting mode, so if
                 * startingWorker is still INVALID after exchanging our lock,
                 * we assume it's the same one we saw above (so we don't
                 * recheck the launch time).
                 */
                if (AutoVacuumShmem->av_startingWorker != NULL)
                {
                    worker = AutoVacuumShmem->av_startingWorker;
                    worker->wi_dboid = InvalidOid;
                    worker->wi_tableoid = InvalidOid;
                    worker->wi_sharedrel = false;
                    worker->wi_proc = NULL;
                    worker->wi_launchtime = 0;
                    dclist_push_head(&AutoVacuumShmem->av_freeWorkers,
                                     &worker->wi_links);
                    AutoVacuumShmem->av_startingWorker = NULL;
                    ereport(WARNING,
                            errmsg("autovacuum worker took too long to
start; canceled"));
                }

Should we fix these places as well?

Also, if we reverse the ereport() and LWLockRelease() in the specific
example in logicalctl.c, it would happen that a concurrent logical
decoding activation writes the log "logical decoding is enabled upon
creating a new logical replication slot" before the deactivation
"logical decoding is disabled because there are no valid logical
replication slots", confusing users since the logical decoding is
active even though the last log saying "logical decoding is disabled".

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: enhance wraparound warnings
Next
From: Masahiko Sawada
Date:
Subject: Re: Use allocation macros in the logical replication code