Thread: Mailing list subscription's mail delivery delays?

Mailing list subscription's mail delivery delays?

From

Matthias van de Meent

Date:

28 September 2023, 22:46:43

Hi,

By lack of a better place to ask:

I've recently noticed that in several of the email threads that I
follow over on -hackers@ that some of the email messages have a very
high time-to-delivery, and thus mails from the same thread arrive
out-of-order.
I've seen several occurances of this with very long delays of over 10
hours, with at least one larger than 19 hours, assuming mail server
clocks are accurate and receipt dates are correctly included in the
mail headers.

I'm not sure if the issue is on my side (mail servers are gmail's) or
on the mailing list server - all traces I've checked indicate that the
delay is somewhere in the delivery from postgres' last mail server to
the first gmail mail server.

I've only really noticed this sometime in the past few weeks. After
sampling my mails, I found other examples of significant delays (>1h)
for mails from well-respected hackers dating back to at least
2023-08-28.

Would you happen to know why this could be the case, and what I can do
to fix it if it's something on my side?

I've attached three recently received mails from -hackers as .eml, to
help with any debugging: one was delivered relatively quickly (91s),
one for which the delivery took a long time (11h+) and one more with a
very long delivery time (19h+). I haven't yet noticed any specific
differences or commonalities between fast and slow mails.

Kind regards,

Matthias van de Meent.
Hi,

On 2023-09-27 17:43:04 -0700, Peter Geoghegan wrote:
> On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > > Can you define "unfreeze"? I don't know if this newly invented term
> > > refers to unsetting a page that was marked all-frozen following (say)
> > > an UPDATE, or if it refers to choosing to not freeze when the option
> > > was available (in the sense that it was possible to do it and fully
> > > mark the page all-frozen in the VM). Or something else.
> >
> > By "unfreeze", I mean unsetting a page all frozen in the visibility
> > map when modifying the page for the first time after it was last
> > frozen.
>
> I see. So I guess that Andres meant that you'd track that within all
> backends, using pgstats infrastructure (when he summarized your call
> earlier today)?

That call was just between Robert and me (and not dedicated just to this
topic, fwiw).

Yes, I was thinking of tracking that in pgstat. I can imagine occasionally
rolling it over into pg_class, to better deal with crashes / failovers, but am
fairly agnostic on whether that's really useful / necessary.

> And that that information would be an important input for VACUUM, as opposed
> to something that it maintained itself?

Yes. If the ratio of opportunistically frozen pages (which I'd define as pages
that were frozen not because they strictly needed to) vs the number of
unfrozen pages increases, we need to make opportunistic freezing less
aggressive and vice versa.

> ISTM that the concept of "unfreezing" a page is equivalent to
> "opening" the page that was "closed" at some point (by VACUUM). It's
> not limited to freezing per se -- it's "closed for business until
> further notice", which is a slightly broader concept (and one not
> unique to Postgres). You don't just need to be concerned about updates
> and deletes -- inserts are also a concern.
>
> I would be sure to look out for new inserts that "unfreeze" pages, too
> -- ideally you'd have instrumentation that caught that, in order to
> get a general sense of the extent of the problem in each of your
> chosen representative workloads. This is particularly likely to be a
> concern when there is enough space on a heap page to fit one more heap
> tuple, that's smaller than most other tuples. The FSM will "helpfully"
> make sure of it. This problem isn't rare at all, unfortunately.

I'm not as convinced as you are that this is a problem / that the solution
won't cause more problems than it solves. Users are concerned when free space
can't be used - you don't have to look further than the discussion in the last
weeks about adding the ability to disable HOT to fight bloat.

I do agree that the FSM code tries way too hard to fit things onto early pages
- it e.g. can slow down concurrent copy workloads by 3-4x due to contention in
the FSM - and that it has more size classes than necessary, but I don't think
just closing frozen pages against further insertions of small tuples will
cause its own set of issues.

I think at the very least there'd need to be something causing pages to reopen
once the aggregate unused space in the table reaches some threshold.

Greetings,

Andres Freund

On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> > Can you define "unfreeze"? I don't know if this newly invented term
> > refers to unsetting a page that was marked all-frozen following (say)
> > an UPDATE, or if it refers to choosing to not freeze when the option
> > was available (in the sense that it was possible to do it and fully
> > mark the page all-frozen in the VM). Or something else.
>
> By "unfreeze", I mean unsetting a page all frozen in the visibility
> map when modifying the page for the first time after it was last
> frozen.

I see. So I guess that Andres meant that you'd track that within all
backends, using pgstats infrastructure (when he summarized your call
earlier today)? And that that information would be an important input
for VACUUM, as opposed to something that it maintained itself?

> I would probably call choosing not to freeze when the option is
> available "no freeze". I have been thinking of what to call it because
> I want to add some developer stats for myself indicating why a page
> that was freezable was not frozen.

I think that having that sort of information available via custom
instrumentation (just for the performance validation side) makes a lot
of sense.

ISTM that the concept of "unfreezing" a page is equivalent to
"opening" the page that was "closed" at some point (by VACUUM). It's
not limited to freezing per se -- it's "closed for business until
further notice", which is a slightly broader concept (and one not
unique to Postgres). You don't just need to be concerned about updates
and deletes -- inserts are also a concern.

I would be sure to look out for new inserts that "unfreeze" pages, too
-- ideally you'd have instrumentation that caught that, in order to
get a general sense of the extent of the problem in each of your
chosen representative workloads. This is particularly likely to be a
concern when there is enough space on a heap page to fit one more heap
tuple, that's smaller than most other tuples. The FSM will "helpfully"
make sure of it. This problem isn't rare at all, unfortunately.

> > The choice to freeze or not freeze pretty much always relies on
> > guesswork about what'll happen to the page in the future, no?
> > Obviously we wouldn't even apply the FPI trigger criteria if we could
> > somehow easily determine that it won't work out (to some degree that's
> > what conditioning it on being able to set the all-frozen VM bit
> > actually does).
>
> I suppose you are thinking of "opportunistic" as freezing whenever we
> aren't certain it is the right thing to do simply because we have the
> opportunity to do it?

I have heard the term "opportunistic freezing" used to refer to
freezing that takes place outside of VACUUM before now. You know,
something perfectly analogous to pruning in VACUUM versus
opportunistic pruning. (I knew that you can't have meant that -- my
point is that the terminology in this area has problems.)

> I want a way to express "freeze when freeze min age doesn't require it"

That makes sense when you consider where we are right now, but it'll
sound odd in a world where freezing via min_freeze_age is the
exception rather than the rule. If anything, it would make more sense
if the traditional min_freeze_age trigger criteria was the type of
freezing that needed its own adjective.

--
Peter Geoghegan

Andres Freund <andres@anarazel.de> writes:
> On 2023-09-27 16:52:44 -0400, Tom Lane wrote:
>> I think it doesn't, as long as all the relevant build targets
>> write their dependencies with "frontend_code" before "libpq".

> Hm, that's not great. I don't think that should be required. I'll try to take
> a look at why that's needed.

Well, it's only important on platforms where we can't restrict
libpq.so from exporting all symbols.  I don't know how close we are
to deciding that such cases are no longer interesting to worry about.
Makefile.shlib seems to know how to do it everywhere except Windows,
and I imagine we know how to do it over in the MSVC scripts.

>> However, it's hard to test this, because the meson build
>> seems completely broken on current macOS:

> Looks like you need 1.2 for the new clang / ld output...  Apparently apple's
> linker changed the format of its version output :/.

Ah, yeah, updating MacPorts again brought in meson 1.2.1 which seems
to work.  I now see a bunch of

ld: warning: ignoring -e, not used for output type
ld: warning: -undefined error is deprecated

which are unrelated.  There's still one duplicate warning
from the backend link:

ld: warning: ignoring duplicate libraries: '-lpam'

I'm a bit baffled why that's showing up; there's no obvious
double reference to pam.

            regards, tom lane

Re: Mailing list subscription's mail delivery delays?

From

"David G. Johnston"

Date:

28 September 2023, 22:53:32

On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

I'm not sure if the issue is on my side (mail servers are gmail's) or
on the mailing list server - all traces I've checked indicate that the
delay is somewhere in the delivery from postgres' last mail server to
the first gmail mail server.

I've only really noticed this sometime in the past few weeks. After
sampling my mails, I found other examples of significant delays (>1h)
for mails from well-respected hackers dating back to at least
2023-08-28.

I have noticed the same thing happening for the Gmail account that I use.

David J.

Re: Mailing list subscription's mail delivery delays?

From

Tom Lane

Date:

28 September 2023, 23:10:51

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <
> boekewurm+postgres@gmail.com> wrote:
>> I'm not sure if the issue is on my side (mail servers are gmail's) or
>> on the mailing list server - all traces I've checked indicate that the
>> delay is somewhere in the delivery from postgres' last mail server to
>> the first gmail mail server.
>> 
>> I've only really noticed this sometime in the past few weeks. After
>> sampling my mails, I found other examples of significant delays (>1h)
>> for mails from well-respected hackers dating back to at least
>> 2023-08-28.

> I have noticed the same thing happening for the Gmail account that I use.

I have been seeing the same thing for a few days now, on my
definitely-not-gmail personal server.  Something's flaky in the
PG mail infrastructure.  It's gotten better since yesterday's
outage, though I'm not convinced it's totally fixed.

            regards, tom lane

Re: Mailing list subscription's mail delivery delays?

From

Magnus Hagander

Date:

29 September 2023, 07:13:37

On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <
> > boekewurm+postgres@gmail.com> wrote:
> >> I'm not sure if the issue is on my side (mail servers are gmail's) or
> >> on the mailing list server - all traces I've checked indicate that the
> >> delay is somewhere in the delivery from postgres' last mail server to
> >> the first gmail mail server.
> >>
> >> I've only really noticed this sometime in the past few weeks. After
> >> sampling my mails, I found other examples of significant delays (>1h)
> >> for mails from well-respected hackers dating back to at least
> >> 2023-08-28.
>
> > I have noticed the same thing happening for the Gmail account that I use.
>
> I have been seeing the same thing for a few days now, on my
> definitely-not-gmail personal server.  Something's flaky in the
> PG mail infrastructure.  It's gotten better since yesterday's
> outage, though I'm not convinced it's totally fixed.

There have been some pretty bad issues with gmail recently. Some
changes have been deployed that will hopefully help mitigate those and
make things better, but it takes time to recover.

The massive backlogs caused by gmail have been enough to spill over
and affect other destinations as well simply due to the load created
since we have such a huge number of gmail subscribers. But we're
slowly seeing the backlogs shrink now and the load come down so
hopefully the changes made will continue to have effect and let us be
back to normal soon.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Mailing list subscription's mail delivery delays?

From

Ray O'Donnell

Date:

01 October 2023, 18:42:05

On 29/09/2023 08:13, Magnus Hagander wrote:
> 
> There have been some pretty bad issues with gmail recently. Some

Just curious - what sort of issues? I don't use gmail myself.

Ray.


-- 
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie

Re: Mailing list subscription's mail delivery delays?

From

Tom Lane

Date:

02 October 2023, 20:52:48

Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I have been seeing the same thing for a few days now, on my
>> definitely-not-gmail personal server.  Something's flaky in the
>> PG mail infrastructure.  It's gotten better since yesterday's
>> outage, though I'm not convinced it's totally fixed.

> There have been some pretty bad issues with gmail recently. Some
> changes have been deployed that will hopefully help mitigate those and
> make things better, but it takes time to recover.

> The massive backlogs caused by gmail have been enough to spill over
> and affect other destinations as well simply due to the load created
> since we have such a huge number of gmail subscribers. But we're
> slowly seeing the backlogs shrink now and the load come down so
> hopefully the changes made will continue to have effect and let us be
> back to normal soon.

I'm still seeing multi-hour delivery delays on a subset of traffic,
like maybe half a dozen instances today.

Looking at the Received: timestamps shows pretty conclusively that
the delays are within PG infra, for example this recent message from
Heikki got hung up at two separate jumps:

Return-Path: <pgsql-hackers-owner+M15-507066@lists.postgresql.org>
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
    by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
    (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
    for <tgl@sss.pgh.pa.us>; Mon, 2 Oct 2023 13:53:57 -0400
Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
    by malur.postgresql.org with esmtp (Exim 4.94.2)
    (envelope-from <pgsql-hackers-owner+M15-507066@lists.postgresql.org>)
    id 1qnN7D-00GbGd-FB
    for tgl@sss.pgh.pa.us; Mon, 02 Oct 2023 17:53:55 +0000
Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
    by malur.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    (Exim 4.94.2)
    (envelope-from <hlinnaka@iki.fi>)
    id 1qnGcb-00AqOg-Ti
    for pgsql-hackers@lists.postgresql.org; Mon, 02 Oct 2023 10:57:53 +0000
Received: from meesny.iki.fi ([195.140.195.201])
    by makus.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    (Exim 4.94.2)
    (envelope-from <hlinnaka@iki.fi>)
    id 1qnF5S-007kvc-AQ
    for pgsql-hackers@postgresql.org; Mon, 02 Oct 2023 09:19:35 +0000
Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
    (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
     key-exchange X25519 server-signature RSA-PSS (2048 bits))
    (No client certificate requested)
    (Authenticated sender: hlinnaka)
    by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
    Mon,  2 Oct 2023 12:19:29 +0300 (EEST)
Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00@iki.fi>
Date: Mon, 2 Oct 2023 12:19:29 +0300
...


Also, my own message <2154347.1696278028@sss.pgh.pa.us> went
out to -hackers about 25 minutes ago and hasn't come back,
so based on other recent examples I'm betting I won't see it
for hours.

Plenty of other traffic *is* coming through in normal-ish time,
so I'm not sure I buy that there's still a massive logjam.

            regards, tom lane

Re: Mailing list subscription's mail delivery delays?

From

Magnus Hagander

Date:

03 October 2023, 18:31:44

On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I have been seeing the same thing for a few days now, on my
> >> definitely-not-gmail personal server.  Something's flaky in the
> >> PG mail infrastructure.  It's gotten better since yesterday's
> >> outage, though I'm not convinced it's totally fixed.
>
> > There have been some pretty bad issues with gmail recently. Some
> > changes have been deployed that will hopefully help mitigate those and
> > make things better, but it takes time to recover.
>
> > The massive backlogs caused by gmail have been enough to spill over
> > and affect other destinations as well simply due to the load created
> > since we have such a huge number of gmail subscribers. But we're
> > slowly seeing the backlogs shrink now and the load come down so
> > hopefully the changes made will continue to have effect and let us be
> > back to normal soon.
>
> I'm still seeing multi-hour delivery delays on a subset of traffic,
> like maybe half a dozen instances today.
>
> Looking at the Received: timestamps shows pretty conclusively that
> the delays are within PG infra, for example this recent message from
> Heikki got hung up at two separate jumps:
>
> Return-Path: <pgsql-hackers-owner+M15-507066@lists.postgresql.org>
> Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
>         by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
>         (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
>         for <tgl@sss.pgh.pa.us>; Mon, 2 Oct 2023 13:53:57 -0400
> Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
>         by malur.postgresql.org with esmtp (Exim 4.94.2)
>         (envelope-from <pgsql-hackers-owner+M15-507066@lists.postgresql.org>)
>         id 1qnN7D-00GbGd-FB
>         for tgl@sss.pgh.pa.us; Mon, 02 Oct 2023 17:53:55 +0000
> Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
>         by malur.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
>         (Exim 4.94.2)
>         (envelope-from <hlinnaka@iki.fi>)
>         id 1qnGcb-00AqOg-Ti
>         for pgsql-hackers@lists.postgresql.org; Mon, 02 Oct 2023 10:57:53 +0000
> Received: from meesny.iki.fi ([195.140.195.201])
>         by makus.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
>         (Exim 4.94.2)
>         (envelope-from <hlinnaka@iki.fi>)
>         id 1qnF5S-007kvc-AQ
>         for pgsql-hackers@postgresql.org; Mon, 02 Oct 2023 09:19:35 +0000
> Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
>         (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
>          key-exchange X25519 server-signature RSA-PSS (2048 bits))
>         (No client certificate requested)
>         (Authenticated sender: hlinnaka)
>         by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
>         Mon,  2 Oct 2023 12:19:29 +0300 (EEST)
> Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00@iki.fi>
> Date: Mon, 2 Oct 2023 12:19:29 +0300
> ...
>
>
> Also, my own message <2154347.1696278028@sss.pgh.pa.us> went
> out to -hackers about 25 minutes ago and hasn't come back,
> so based on other recent examples I'm betting I won't see it
> for hours.
>
> Plenty of other traffic *is* coming through in normal-ish time,
> so I'm not sure I buy that there's still a massive logjam.

There is still definitely a problem, but it is slowly recovering. It
is *mostliy* hitting gmail at this point, but there can be spillover
to others in some cases (for example, there's a general throttling
when the load on the server gets too high). In this particular case,
it coincides timing-wise with our old friend the oom-killer nuking
postgres on the machine thereby stopping all incoming email for a
while before it got moving again. That particular problem should have
been taken care of completely by now, but the general backlog/queueing
problem is still ongoing but has been improving.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Mailing list subscription's mail delivery delays?

From

Magnus Hagander

Date:

04 October 2023, 16:30:52

On Tue, Oct 3, 2023 at 2:31 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Magnus Hagander <magnus@hagander.net> writes:
> > > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> I have been seeing the same thing for a few days now, on my
> > >> definitely-not-gmail personal server.  Something's flaky in the
> > >> PG mail infrastructure.  It's gotten better since yesterday's
> > >> outage, though I'm not convinced it's totally fixed.
> >
> > > There have been some pretty bad issues with gmail recently. Some
> > > changes have been deployed that will hopefully help mitigate those and
> > > make things better, but it takes time to recover.
> >
> > > The massive backlogs caused by gmail have been enough to spill over
> > > and affect other destinations as well simply due to the load created
> > > since we have such a huge number of gmail subscribers. But we're
> > > slowly seeing the backlogs shrink now and the load come down so
> > > hopefully the changes made will continue to have effect and let us be
> > > back to normal soon.
> >
> > I'm still seeing multi-hour delivery delays on a subset of traffic,
> > like maybe half a dozen instances today.
> >
> > Looking at the Received: timestamps shows pretty conclusively that
> > the delays are within PG infra, for example this recent message from
> > Heikki got hung up at two separate jumps:
> >
> > Return-Path: <pgsql-hackers-owner+M15-507066@lists.postgresql.org>
> > Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
> >         by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
> >         (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
> >         for <tgl@sss.pgh.pa.us>; Mon, 2 Oct 2023 13:53:57 -0400
> > Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
> >         by malur.postgresql.org with esmtp (Exim 4.94.2)
> >         (envelope-from <pgsql-hackers-owner+M15-507066@lists.postgresql.org>)
> >         id 1qnN7D-00GbGd-FB
> >         for tgl@sss.pgh.pa.us; Mon, 02 Oct 2023 17:53:55 +0000
> > Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
> >         by malur.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> >         (Exim 4.94.2)
> >         (envelope-from <hlinnaka@iki.fi>)
> >         id 1qnGcb-00AqOg-Ti
> >         for pgsql-hackers@lists.postgresql.org; Mon, 02 Oct 2023 10:57:53 +0000
> > Received: from meesny.iki.fi ([195.140.195.201])
> >         by makus.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> >         (Exim 4.94.2)
> >         (envelope-from <hlinnaka@iki.fi>)
> >         id 1qnF5S-007kvc-AQ
> >         for pgsql-hackers@postgresql.org; Mon, 02 Oct 2023 09:19:35 +0000
> > Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
> >         (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
> >          key-exchange X25519 server-signature RSA-PSS (2048 bits))
> >         (No client certificate requested)
> >         (Authenticated sender: hlinnaka)
> >         by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
> >         Mon,  2 Oct 2023 12:19:29 +0300 (EEST)
> > Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00@iki.fi>
> > Date: Mon, 2 Oct 2023 12:19:29 +0300
> > ...
> >
> >
> > Also, my own message <2154347.1696278028@sss.pgh.pa.us> went
> > out to -hackers about 25 minutes ago and hasn't come back,
> > so based on other recent examples I'm betting I won't see it
> > for hours.
> >
> > Plenty of other traffic *is* coming through in normal-ish time,
> > so I'm not sure I buy that there's still a massive logjam.
>
> There is still definitely a problem, but it is slowly recovering. It
> is *mostliy* hitting gmail at this point, but there can be spillover
> to others in some cases (for example, there's a general throttling
> when the load on the server gets too high). In this particular case,
> it coincides timing-wise with our old friend the oom-killer nuking
> postgres on the machine thereby stopping all incoming email for a
> while before it got moving again. That particular problem should have
> been taken care of completely by now, but the general backlog/queueing
> problem is still ongoing but has been improving.

We *think* this issue has now been mostly resolved. We are still
seeing some extra delays in deliveries to gmail right now but that's
due to *us* slowing down the deliveries to not trigger things. But we
are now talking delays of minutes or tens of minutes, and not hours or
tens of hours. Non-gmail recipients should now be back to being mostly
unaffected.

We're continuing to monitor the situation of course, and to make
careful modifications to bring us back to the quicker deliverry times.


--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/