Re: No easy way to join discussion in existing thread when not subscribed - Mailing list pgsql-www
From | Stefan Kaltenbrunner |
---|---|
Subject | Re: No easy way to join discussion in existing thread when not subscribed |
Date | |
Msg-id | 560B86E8.4020600@kaltenbrunner.cc Whole thread Raw |
In response to | Re: No easy way to join discussion in existing thread when not subscribed ("Amir Rohan" <amir.rohan@mail.com>) |
List | pgsql-www |
On 09/30/2015 03:27 AM, Amir Rohan wrote: > On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote: >> On 09/29/2015 09:34 PM, Amir Rohan wrote: >> >> for most accesses to the archives the string for the basic auth reply >> quotes the "archives" and "password" strings with ' - see > > Fixed. I think you missed at least one spot in the code you added and also at least one occurance in existing code. > >> we have a number of current issues where data in the archives gets >> mangled/corrupted we are looking into. We are currently working on some >> infrastructure to "test" parsing fixes across all the messages in the >> archives to get a better understanding of what kind effect a change has. >> For this specific message I'm curious of how you found it though? >> > > I made a prototype before looking at the repo, using > python's 'mailbox' parser module, and some asserts failed > when some messages parsed out as lacking Message-ID. I had > also read the mbox spec in order to write the patch, and > put the two together. ah - nice effort! > >>>> <...> >>>> Have you done any (approximate) measurements on what the additional >>>> in-memory overhead in both pg (to build the response) and in django is >>>> compared to the resulting mbox? >>>> >>>>> Amir Wrote: >>>>> <some napkins and mitigations> >> My concern mostly stems from operational >> experience(on the sysadmin team) that some operations on the archives >> currently are fairly computational and memory intensive causing issues >> with availability and we would want to not add more vectors that can >> cause that. >> > > You're right to be concerned, I raised the issue myself to begin with. > We can solve any particular problem, but how to optimize depends too > much on particulars I don't have. > > If you have both cpu and memory shortage, we could trade storage. > You already serve monthly mbox's, having per thread mboxes which are > updated in batch (say hourly) could be managable, and that code > is practically written already. Serving static is as cheap as it gets > on noth cpu and memory. yeah that is what I was thinking - though I dont think we want hourly. Went went a long way to actually get the current system to be "almost instant" in terms of having the archives in sync with the lists(at least for the basic stuff). What I was thinking is doing the mbox creating during the import - we already serialize the process (on the MTA/LDA side) there to have only one message imported concurrently so there is way less risk of overwhelming the box. > > But for now, see attached patch, which adds a tweakable for setting a > cap on the max size of the response. It still gets everything > from the database at once, so it may not be of much help except > perhaps as a metric for you to easily monitor. > > There's also an EJECT button that turns all thread mbox requests into > 403, so you can just throw this in production and flip the switch > if a problem appears. Also fixes the quoting in the message. thanks for the updated patch - will take a look and see whether I can find out what the worst case is in the archives later today. Stefan