Thread: Re: No easy way to join discussion in existing thread when not subscribed
On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote: > On 09/29/2015 09:34 PM, Amir Rohan wrote: > > for most accesses to the archives the string for the basic auth reply > quotes the "archives" and "password" strings with ' - see Fixed. > we have a number of current issues where data in the archives gets > mangled/corrupted we are looking into. We are currently working on some > infrastructure to "test" parsing fixes across all the messages in the > archives to get a better understanding of what kind effect a change has. > For this specific message I'm curious of how you found it though? > I made a prototype before looking at the repo, using python's 'mailbox' parser module, and some asserts failed when some messages parsed out as lacking Message-ID. I had also read the mbox spec in order to write the patch, and put the two together. >>> <...> >>> Have you done any (approximate) measurements on what the additional >>> in-memory overhead in both pg (to build the response) and in django is >>> compared to the resulting mbox? >>> >>>> Amir Wrote: >>>> <some napkins and mitigations> > My concern mostly stems from operational > experience(on the sysadmin team) that some operations on the archives > currently are fairly computational and memory intensive causing issues > with availability and we would want to not add more vectors that can > cause that. > You're right to be concerned, I raised the issue myself to begin with. We can solve any particular problem, but how to optimize depends too much on particulars I don't have. If you have both cpu and memory shortage, we could trade storage. You already serve monthly mbox's, having per thread mboxes which are updated in batch (say hourly) could be managable, and that code is practically written already. Serving static is as cheap as it gets on noth cpu and memory. But for now, see attached patch, which adds a tweakable for setting a cap on the max size of the response. It still gets everything from the database at once, so it may not be of much help except perhaps as a metric for you to easily monitor. There's also an EJECT button that turns all thread mbox requests into 403, so you can just throw this in production and flip the switch if a problem appears. Also fixes the quoting in the message. You
Attachment
Re: No easy way to join discussion in existing thread when not subscribed
From
Stefan Kaltenbrunner
Date:
On 09/30/2015 03:27 AM, Amir Rohan wrote: > On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote: >> On 09/29/2015 09:34 PM, Amir Rohan wrote: >> >> for most accesses to the archives the string for the basic auth reply >> quotes the "archives" and "password" strings with ' - see > > Fixed. I think you missed at least one spot in the code you added and also at least one occurance in existing code. > >> we have a number of current issues where data in the archives gets >> mangled/corrupted we are looking into. We are currently working on some >> infrastructure to "test" parsing fixes across all the messages in the >> archives to get a better understanding of what kind effect a change has. >> For this specific message I'm curious of how you found it though? >> > > I made a prototype before looking at the repo, using > python's 'mailbox' parser module, and some asserts failed > when some messages parsed out as lacking Message-ID. I had > also read the mbox spec in order to write the patch, and > put the two together. ah - nice effort! > >>>> <...> >>>> Have you done any (approximate) measurements on what the additional >>>> in-memory overhead in both pg (to build the response) and in django is >>>> compared to the resulting mbox? >>>> >>>>> Amir Wrote: >>>>> <some napkins and mitigations> >> My concern mostly stems from operational >> experience(on the sysadmin team) that some operations on the archives >> currently are fairly computational and memory intensive causing issues >> with availability and we would want to not add more vectors that can >> cause that. >> > > You're right to be concerned, I raised the issue myself to begin with. > We can solve any particular problem, but how to optimize depends too > much on particulars I don't have. > > If you have both cpu and memory shortage, we could trade storage. > You already serve monthly mbox's, having per thread mboxes which are > updated in batch (say hourly) could be managable, and that code > is practically written already. Serving static is as cheap as it gets > on noth cpu and memory. yeah that is what I was thinking - though I dont think we want hourly. Went went a long way to actually get the current system to be "almost instant" in terms of having the archives in sync with the lists(at least for the basic stuff). What I was thinking is doing the mbox creating during the import - we already serialize the process (on the MTA/LDA side) there to have only one message imported concurrently so there is way less risk of overwhelming the box. > > But for now, see attached patch, which adds a tweakable for setting a > cap on the max size of the response. It still gets everything > from the database at once, so it may not be of much help except > perhaps as a metric for you to easily monitor. > > There's also an EJECT button that turns all thread mbox requests into > 403, so you can just throw this in production and flip the switch > if a problem appears. Also fixes the quoting in the message. thanks for the updated patch - will take a look and see whether I can find out what the worst case is in the archives later today. Stefan