Home > mailing lists

Re: No easy way to join discussion in existing thread when not subscribed - Mailing list pgsql-www

From	Stefan Kaltenbrunner
Subject	Re: No easy way to join discussion in existing thread when not subscribed
Date	September 30, 2015 06:53:36
Msg-id	560B86E8.4020600@kaltenbrunner.cc Whole thread Raw
In response to	Re: No easy way to join discussion in existing thread when not subscribed ("Amir Rohan" <amir.rohan@mail.com>)
List	pgsql-www

Tree view

On 09/30/2015 03:27 AM, Amir Rohan wrote:
> On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote:
>> On 09/29/2015 09:34 PM, Amir Rohan wrote:
>>
>> for most accesses to the archives the string for the basic auth reply
>> quotes the "archives" and "password" strings with ' - see
>
> Fixed.

I think you missed at least one spot in the code you added and also at 
least one occurance in existing code.

>
>> we have a number of current issues where data in the archives gets
>> mangled/corrupted we are looking into. We are currently working on some
>> infrastructure to "test" parsing fixes across all the messages in the
>> archives to get a better understanding of what kind effect a change has.
>> For this specific message I'm curious of how you found it though?
>>
>
> I made a prototype before looking at the repo, using
> python's 'mailbox' parser module, and some asserts failed
> when some messages parsed out as lacking Message-ID. I had
> also read the mbox spec in order to write the patch, and
> put the two together.

ah - nice effort!

>
>>>> <...>
>>>> Have you done any (approximate) measurements  on what the additional
>>>> in-memory overhead in both pg (to build the response) and in django is
>>>> compared to the resulting  mbox?
>>>>
>>>>> Amir Wrote:
>>>>> <some napkins and mitigations>
>> My concern mostly stems from operational
>> experience(on the sysadmin team) that some operations on the archives
>> currently are fairly computational and memory intensive causing issues
>> with availability and we would want to not add more vectors that can
>> cause that.
>>
>
> You're right to be concerned, I raised the issue myself to begin with.
> We can solve any particular problem, but how to optimize depends too
> much on particulars I don't have.
>
> If you have both cpu and memory shortage, we could trade storage.
> You already serve monthly mbox's, having per thread mboxes which are
> updated in batch (say hourly) could be managable, and that code
> is practically written already. Serving static is as cheap as it gets
> on noth cpu and memory.

yeah that is what I was thinking - though I dont think we want hourly. 
Went went a long way to actually get the current system to be "almost 
instant" in terms of having the archives in sync with the lists(at least 
for the basic stuff). What I was thinking is doing the mbox creating 
during the import - we already serialize the process (on the MTA/LDA 
side) there to have only one message imported concurrently so there is 
way less risk of overwhelming the box.

>
> But for now, see attached patch, which adds a tweakable for setting a
> cap on the max size of the response. It still gets everything
> from the database at once, so it may not be of much help except
> perhaps as a metric for you to easily monitor.
>
> There's also an EJECT button that turns all thread mbox requests into
> 403, so you can just throw this in production and flip the switch
> if a problem appears. Also fixes the quoting in the message.


thanks for the updated patch - will take a look and see whether I can 
find out what the worst case is in the archives later today.




Stefan

pgsql-www by date:

From: "Amir Rohan"
Date: 30 September 2015, 01:43:38
Subject: Re: No easy way to join discussion in existing thread when not subscribed

From: Amir Rohan
Date: 30 September 2015, 06:55:31
Subject: Re: SEO for documentation

Re: No easy way to join discussion in existing thread when not subscribed - Mailing list pgsql-www

Previous

Next