Re: Trigger on Postgres for tables syncronization - Mailing list pgsql-general
From | Jeff Davis |
---|---|
Subject | Re: Trigger on Postgres for tables syncronization |
Date | |
Msg-id | 1091041598.4431.72.camel@jeff Whole thread Raw |
In response to | Re: Trigger on Postgres for tables syncronization (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>) |
Responses |
Re: Trigger on Postgres for tables syncronization
|
List | pgsql-general |
On Wed, 2004-07-28 at 03:57, Pierre-Frédéric Caillaud wrote: > I'm a postgresql newcomer so correct me if I'm wrong... I also want to > ask another question. > > I would have done this with a view, too, because it's very simple to do > in Postgresql. You can also add some rules (or triggers ?) so that an > insert attempt in appointment0 or appointment1 (which would normally fail) > would be rewritten as an insert into appointment with the 'done' value set > accordingly. > That's correct. A rule is what you're looking for in order to insert into a view. > Now, I've been facing a related problem with tracking user sessions for a > web app. I want to use a table to store user sessions, both active > sessions and expired sessions for archiving. I also wanted it to look like > two different tables. I could have created one table with two views > (online and archived), or two tables. > > In the end I went with two tables because the online session table is > read and updated very often, so it better be small and fit in the cache, > while the archive table will probably be huge and not used often. So to > keep better locality of reference I used two tables, and I created > functions to create sessions, update a session to push its timeout value a > bit in the future, and close a session. These functions detect timed-out > sessions in the "online" table and move them to the "archive" table. I > also have a cleanup function which moves expired sessions to the archive > table and which will be called by a cron. > Advantages of this approach : > - There can be only one session for a given user in the "online" table, > which makes finding the session fast (userid = primary key). > - The online table has only one index for faster updating, this is the > primary key on userid. > Drawbacks : > - Much more complex than a view based approach. > > Question : how huge is huge, ie. how much records do I need to have in > the archive to make the two tables approach worth it ? It is much more > complex. First off, you're on the right track. It's logically one table, but it may be better to have it stored seperately. Note that you will still have locality of reference even if it's one big table, since recently added records will be close to the end of the table, and old records will be at the beginning (this isn't guaranteed, but it's true in general). Two tables will help with caching, however, since you should be able to keep the small table in the cache. Caching is the main benefit, since over time, most of the records in the big archive table can be safely ignored under normal operation (until you want to look at the archive) and will never pollute the cache. Here's what I'd use to determine whether the session table is big enough to warrant two tables: (1) Create an index on the session id field that you select on. I assume here that you do a simple select from the session table like "SELECT * FROM session WHERE session_id=1234567890". (2) "VACUUM ANALYZE session" the table to make sure the planner has up-to-date and accurate information on which it can base it's plan. (3) Explain your query like "EXPLAIN ANALYZE SELECT * FROM session WHERE session_id=1234567890". (4) If it does a sequential scan, that means the table is small enough to get the entire table in few disk reads. If it does an index scan, that means there are enough records to warrant several reads from different parts of the disk to avoid reading the entire table: one or more reads for the index and then one read for the page containing the record. So, in short, seq scan means you have a small table and nothing much to gain by seperating the tables. An index scan means the table is big, and you may have something to gain by seperating it into two tables. Here you basically used the query planner to tell you whether it's too big or not. Now, there are a couple other considerations that you might have to answer for yourself. (1) how often do you move expired sessions to the session_archive table? (2) how often do you vacuum the session table? (3) do you want an all_sessions view like "CREATE VIEW all_sessions AS SELECT * FROM session UNION SELECT * FROM session_archive"? Keep in mind the extra cpu and disk activity from constantly moving the records to archive, and constantly vacuuming. How often you do those things probably requires some real-world testing. Regards, Jeff Davis
pgsql-general by date: