Thread: ML archives caching 404 results

ML archives caching 404 results

From
Marti Raudsepp
Date:
Hi www,

When adding my messages to CommitFest, I noticed that when I'm fast enough to click on the resulting link, I get a 404 page from mailing list archives, I guess if archives loader hasn't processed my message yet. This 404 result gets cached for a long time, so my message is not viewable even after links to it appear in archives.

It's not obvious to me why this happens... The view raises an exception, and the cache() decorator should never get around to setting HTTP cache headers because it lets the exception pass through.

@cache(hours=4)
def message(request, msgid):
        try:
                m = Message.objects.get(messageid=msgid)
        except Message.DoesNotExist:
                raise Http404('Message does not exist')

Is there a default expiration time set in Varnish somewhere? Maybe the solution is as easy as setting a lower TTL for 404 results in Varnish:
http://www.garron.me/en/bits/avoid-varnish-cache-404-error-page.html

Another solution would be to somehow shoehorn a cache key into 404 result pages, but that smells of hack, since we'd need a different identifier from the usual "X-pgthreadid".

Regards,
Marti

Re: ML archives caching 404 results

From
Magnus Hagander
Date:
On Tue, Oct 7, 2014 at 3:52 PM, Marti Raudsepp <marti@juffo.org> wrote:
> Hi www,
>
> When adding my messages to CommitFest, I noticed that when I'm fast enough
> to click on the resulting link, I get a 404 page from mailing list archives,
> I guess if archives loader hasn't processed my message yet. This 404 result
> gets cached for a long time, so my message is not viewable even after links
> to it appear in archives.
>
> It's not obvious to me why this happens... The view raises an exception, and
> the cache() decorator should never get around to setting HTTP cache headers
> because it lets the exception pass through.
>
> @cache(hours=4)
> def message(request, msgid):
>         try:
>                 m = Message.objects.get(messageid=msgid)
>         except Message.DoesNotExist:
>                 raise Http404('Message does not exist')
>
> Is there a default expiration time set in Varnish somewhere? Maybe the
> solution is as easy as setting a lower TTL for 404 results in Varnish:
> http://www.garron.me/en/bits/avoid-varnish-cache-404-error-page.html

Yes we cache the 404 pages, and that's definitely intentional. Setting
it lower might be a good idea, yes, but we definitely don't want to
drop it. The default is 4 hours though, which might be a bit of an
overkill. But how should would it have to be to deal with the scenario
you're outlining?


> Another solution would be to somehow shoehorn a cache key into 404 result
> pages, but that smells of hack, since we'd need a different identifier from
> the usual "X-pgthreadid".

Yeah, that's definitely ugly.  We could do something like "purge all
404's whenever a new email arrives", but that seems quite ugly...

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: ML archives caching 404 results

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Oct 7, 2014 at 3:52 PM, Marti Raudsepp <marti@juffo.org> wrote:
>> When adding my messages to CommitFest, I noticed that when I'm fast enough
>> to click on the resulting link, I get a 404 page from mailing list archives,
>> I guess if archives loader hasn't processed my message yet. This 404 result
>> gets cached for a long time, so my message is not viewable even after links
>> to it appear in archives.

FWIW, I've been annoyed by that too ...

>> Is there a default expiration time set in Varnish somewhere? Maybe the
>> solution is as easy as setting a lower TTL for 404 results in Varnish:
>> http://www.garron.me/en/bits/avoid-varnish-cache-404-error-page.html

> Yes we cache the 404 pages, and that's definitely intentional.

It might be intentional, but is it really useful?  How much traffic do
we get to nonexistent pages?
        regards, tom lane



Re: ML archives caching 404 results

From
Magnus Hagander
Date:
On Tue, Oct 7, 2014 at 4:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Tue, Oct 7, 2014 at 3:52 PM, Marti Raudsepp <marti@juffo.org> wrote:
>>> When adding my messages to CommitFest, I noticed that when I'm fast enough
>>> to click on the resulting link, I get a 404 page from mailing list archives,
>>> I guess if archives loader hasn't processed my message yet. This 404 result
>>> gets cached for a long time, so my message is not viewable even after links
>>> to it appear in archives.
>
> FWIW, I've been annoyed by that too ...
>
>>> Is there a default expiration time set in Varnish somewhere? Maybe the
>>> solution is as easy as setting a lower TTL for 404 results in Varnish:
>>> http://www.garron.me/en/bits/avoid-varnish-cache-404-error-page.html
>
>> Yes we cache the 404 pages, and that's definitely intentional.
>
> It might be intentional, but is it really useful?  How much traffic do
> we get to nonexistent pages?

From broken search engines running amok every now and then - quite a
bit. During normal operations, not much.

But we can probably easily drop it to say 5 minutes or so for 404's -
would that be enough?

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: ML archives caching 404 results

From
Marti Raudsepp
Date:
On Tue, Oct 7, 2014 at 4:57 PM, Magnus Hagander <magnus@hagander.net> wrote:
> The default is 4 hours though, which might be a bit of an
> overkill. But how should would it have to be to deal with the scenario
> you're outlining?

Ideally it would appear immediately, but something like 10 minutes
would be a lot less annoying than 4 hours.

>> Another solution would be to somehow shoehorn a cache key into 404 result
>> pages, but that smells of hack
> Yeah, that's definitely ugly.  We could do something like "purge all
> 404's whenever a new email arrives", but that seems quite ugly...

Can we purge by URL from pgarchives like we do in pgweb? We could go
"simple and stupid" and just purge /message-id/.../ on the receipt of
any message.

Regards,
Marti



Re: ML archives caching 404 results

From
Magnus Hagander
Date:
On Tue, Oct 7, 2014 at 4:06 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Oct 7, 2014 at 4:57 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> The default is 4 hours though, which might be a bit of an
>> overkill. But how should would it have to be to deal with the scenario
>> you're outlining?
>
> Ideally it would appear immediately, but something like 10 minutes
> would be a lot less annoying than 4 hours.
>
>>> Another solution would be to somehow shoehorn a cache key into 404 result
>>> pages, but that smells of hack
>> Yeah, that's definitely ugly.  We could do something like "purge all
>> 404's whenever a new email arrives", but that seems quite ugly...
>
> Can we purge by URL from pgarchives like we do in pgweb? We could go
> "simple and stupid" and just purge /message-id/.../ on the receipt of
> any message.

We definitely could. Hmm. I guess that would actually work as well -
though that would generate a lot more unnecessary invalidations.
There's also related views that would have to be expired, like /flat/
etc. Going with the shorter expiry is probably easier..

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/