On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote:
> On 09/29/2015 09:34 PM, Amir Rohan wrote:
>
> for most accesses to the archives the string for the basic auth reply
> quotes the "archives" and "password" strings with ' - see
Fixed.
> we have a number of current issues where data in the archives gets
> mangled/corrupted we are looking into. We are currently working on some
> infrastructure to "test" parsing fixes across all the messages in the
> archives to get a better understanding of what kind effect a change has.
> For this specific message I'm curious of how you found it though?
>
I made a prototype before looking at the repo, using
python's 'mailbox' parser module, and some asserts failed
when some messages parsed out as lacking Message-ID. I had
also read the mbox spec in order to write the patch, and
put the two together.
>>> <...>
>>> Have you done any (approximate) measurements on what the additional
>>> in-memory overhead in both pg (to build the response) and in django is
>>> compared to the resulting mbox?
>>>
>>>> Amir Wrote:
>>>> <some napkins and mitigations>
> My concern mostly stems from operational
> experience(on the sysadmin team) that some operations on the archives
> currently are fairly computational and memory intensive causing issues
> with availability and we would want to not add more vectors that can
> cause that.
>
You're right to be concerned, I raised the issue myself to begin with.
We can solve any particular problem, but how to optimize depends too
much on particulars I don't have.
If you have both cpu and memory shortage, we could trade storage.
You already serve monthly mbox's, having per thread mboxes which are
updated in batch (say hourly) could be managable, and that code
is practically written already. Serving static is as cheap as it gets
on noth cpu and memory.
But for now, see attached patch, which adds a tweakable for setting a
cap on the max size of the response. It still gets everything
from the database at once, so it may not be of much help except
perhaps as a metric for you to easily monitor.
There's also an EJECT button that turns all thread mbox requests into
403, so you can just throw this in production and flip the switch
if a problem appears. Also fixes the quoting in the message.
You