Re: Question -- Session Operations - Feasibility Of Proposed Synchronization Method? - Mailing list pgsql-general
From | Steve Petrie, P.Eng. |
---|---|
Subject | Re: Question -- Session Operations - Feasibility Of Proposed Synchronization Method? |
Date | |
Msg-id | 7B7BF1FB1B7E47B58D8365DC1C2ADD3A@Dell Whole thread Raw |
In response to | Re: Question -- Session Operations - Feasibility Of Proposed Synchronization Method? ("Steve Petrie, P.Eng." <apetrie@aspetrie.net>) |
List | pgsql-general |
Andy, Thanks very much for your response. No worries about raining on my parade. Your feedback is exactly what I'm looking for -- praise is nice, but I really do prefer to have the experts throwing rocks at my naive ideas :) Please see my comments embedded below. Steve ----- Original Message ----- From: "Andy Colson" <andy@squeakycode.net> To: "Steve Petrie, P.Eng." <apetrie@aspetrie.net>; <pgsql-general@postgresql.org> Sent: Thursday, January 07, 2016 10:17 PM Subject: Re: [GENERAL] Question -- Session Operations - Feasibility Of Proposed Synchronization Method? > On 01/07/2016 06:30 PM, Steve Petrie, P.Eng. wrote: >> Thanks to forum members for the four helpful replies, to my earlier >> message that initiated this thread. >> >> The replies expressed concerns, with the feasibility of my proposal >> to use postgres tables to store short-lived context data, for dialog >> continuity during website app transient sessions, with visitor >> browsers over modeless HTTP connections. >> >> Hope the four emails I sent in response (5 January 2016), went some >> way to satisfying the concerns expressed. >> >> Here is a list of the issues discussed, in the dialog mentioned >> above: >> >> 1. "Session" defined; >> 2. Avoid row DELETEs; >> 3. Periodically TRUNCATE each table in a pool of session context >> tables; >> 4. Embed a session ID key parameter in an HTML "hidden" field >> (optional); >> 5. Use sequence generators as rapid global iterators controlling >> access to session context tables; >> > <SNIP> >> >> Thanks to forum members for taking the time to read my email. >> > > This feels hugely overcomplicated. I agree. It is complicated. But I believe it's the minimum functionality required to both: 1. avoid using the <row DELETE ... / table AUTOVACUUM / table VACUUM> approach, to recycling "dead" session context row image storage space back to the filesystem, and 2. enable use of the much faster TRUNCATE command on an entire "dead" session context table.. > I also didn't read most of the last thread, so forgive me if you've > answered this already: How many website requests a second (that > actually need to touch session data) are you expecting? How much > space is the session data going to take? (like, 5 Gig a day?) > Every incoming request to the website for non-static content, needs to touch (INSERT or SELECT + UPDATE) the corresponding session context row. That row is where the transient continuity context for the app session dialog, gets stored, between request <i> and request <i+1> coming in from the browser driving that app session. So session data will be touched by every request that launches an app php function, to service the next step in the session dialog with that visitor. But an individual session isn't going to live all that long, from the time that it's context row gets INSERTed until the time that the session "dies" and it's context row gets UPDATEd as "dead" in its "status" column (the row is never explicitely DELETEd, the entire table in which it resides gets TRUNCATEd). If the website manages to register e.g. 100,000 subscribers in its first year, it will be a runaway success. I'm not expecting more than a few percent of subscribers to visit on any given day. So if the website proves to be a massive winner, there will be maybe 1000 to 5000 subscriber sessions / day, each session being initiated, conducted and then terminated over the time span of a few minutes (rarely more than 30 minutes). But I do fear "success disaster" if suddenly, the website (which will promote a politically controversial technology concept for preventing freeway traffic congestion) gets media coverage in its initial market area (the greater Toronto area in the province of Ontario, Canada), and among a million+ people watching the 6-o'clock Toronto news, a few thousand jump on their smart-phone browsers to hit the website, looking to subscribe or send a contact message via web page form. So I'm looking to build in capacity to handle brief intense bursts of session traffic workload. Not anywhere near Google-scale, obviously. But maybe to handle a 10-minute burst driving a maximum rate of e.g. 1000 requests / second to the database server (being mostly a combination of an INSERT for each new session row, followed by a few <SELECT + UPDATE>s to that row, as the session proceeds through its brief "life", towards its inevitable "death". Actual access activity to longer-lived data tables: 1. subscriber membership table, 2. contact message table; will be orders-of-magnitude lower, than activity in the session context tables. Each visitor session is allowed a "quota" of requests (e.g. 25) so the visitor gets 25 chances to e.g. click a "submit" button. There is also a session timeout "quota" (e.g. 10 minutes) that will kill the session if the visitor waits too long between requests. So the session context tables in aggregate, do not keep growing and growing. And session context data is short-term expendable data. No need to log it for recovery. No need for checkpoints, or any other backup provisions. If all active session context data gets lost in a crash, no big deal. Maybe a reputational hit for the website, but the visitors who had their sessions brutally murdered by the crash, will not wait around for a recovery of their sessions anyway. They will wander off and (hopefully) come back to retry later. Most likely after the website administrator sends out a post-crash apologetic mass email message to the entire subscriber base ("Hey subscribers, guess what? We're so successful we crashed!!" :) Consider a worst case severe workload. There might be say, an intense burst of activity for 10 seconds, that initiates 10000 active sessions (10% of all website subscribers) spread over e.g. 10 tables. And during the entire lifetime of each session, table storage consumed by that session of say 10, session context row images (1 INSERT image + 9 UPDATE images). So in total there would be space for 100,000 row images (10 images / session X 10000 sessions) allocated over the 10 tables, at a peak of 10000 sessions online concurrently. At e.g. 1000 bytes / row image, that would mean a peak total of 100 MB (100,000 row images X 1000 bytes / image) of storage allocated to the 10 session context tables altogether. Those 10000 suddenly-initiated concurrent sessions are then 10000 people, each poking at an HTML page in their browser, maybe once every 30 to 60 seconds, to trigger an HTTP request. So the database server will receive from 167 to 333 <SELECT + UPDATE> requests per second (10000 sessions / 60 seconds; 10000 sessions / 30 seconds). But when the 10000 sessions created by that 10-second burst of session initiations, gradually "die" over the next half hour or so, the entire 100 MB of storage in the 10 tables, will get recycled back to the filesystem, by only 10 very swift TRUNCATE commands. And there will be absolutely no <row DELETE ... / table AUTOVACUUM / table VACUUM) workload overhead, to hamper performance during all that session activity. > If its a huge number, you should put effort into growing horizontally, > not all of this stuff. > If its a small number, you'll spend more time fixing all the broken > things than its worth. > Have you benchmarked this? In my mind, complicated == slow. > All valid points. Wouldn't using multiple session context tables, enable the possibility of horizontal growth? Could separate pools of session context tables go in separate tablespaces, each tablespace on a different SSHD for one pool ? I'm not underestimating the work required to get the idea to a stable functional state. Yes -- complicated does consume more cycles. That's why I'm proposing to use sequence generators (as rapid-access global "iterators"), instead of using some bottleneck of a tiny table, to keep order among the app sessions as they look for permission to access the session context tables. No I haven't benchmarked the idea. It's still at the feasibility consultation / investigation stage. But certainly benchmarking is going to be mandatory. If I decide to proceed beyond just brainstorming the idea, I will build and benchmark a prototype to stress-test the key design ideas. > Sorry if I'm raining on your parade, it looks like you have really put > a lot of work into this. > I'm grateful to you for the downpour of advice :) Yes I have invested quite a few hours. But all pleasant ones. And I have the hours to invest, and the luxury of no boss' butt to kiss. So it won't be a problem if I decide to ditch the idea of using postgres as a session store, and go with e.g. your web proxy suggestion instead. But I really would like if possible, to develop some innovation using postgres, of significant but not overwhelming challenge. Something that might help postgres invade a new market. > Have you considered saving session data to disk is faster than saving > to db? A good reverse web proxy can stick a session to the same > backend. 1 web proxy up front, 5 web servers behind it. I'd bet its > way faster. > Yes I have considered using a plain disk file, but then that's another complication, no? I'm already climbing a steep learning curve with postgres. So if I'm going to complicate my life with some fancy session operations scheme, I would prefer to lodge that complexity firmly in the world of postgres. For sure, before I do more work on the idea I'm proposing, I will investigate your idea of using a web proxy instead. But then, adding a web proxy to the mix, would be a different kind of complication in itself ... > -Andy >
pgsql-general by date: