Thread: using a postgres table as a multi-writer multi-updater queue
I wondered if any of you could recommend best practices for using a postgres table as a queue. Roughly speaking, 100-200 workers will vomit rows and rates of a few hundres per second into the table leaving the status as new and then as many workers as needed to keep up with the load will plough through the queue changing the status to something other than new.
My naive implementation would be something along the lines of:
CREATE TABLE event ( ts timestamp, event char(40), status char(10), CONSTRAINT pkey PRIMARY KEY(ts, event) );
...with writers doing INSERT or COPY to get data into the table and readers doing something like:
SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;...so, grabbing batches of 1,000, working on them and then setting their status.
But, am I correct in thinking that SELECT FOR UPDATE will not prevent multiple workers selecting the same rows?
Anyway, is this approach reasonable? If so, what tweaks/optimisations should I be looking to make?
If it's totally wrong, how should I be looking to approach the problem?
cheers,
Chris
Hello. On 23.11.2015 11:41, Chris Withers wrote: > Hi All, > > I wondered if any of you could recommend best practices for using a postgres > table as a queue. Roughly speaking, 100-200 workers will vomit rows and rates of > a few hundres per second into the table leaving the status as new and then as > many workers as needed to keep up with the load will plough through the queue > changing the status to something other than new. > > My naive implementation would be something along the lines of: > > CREATE TABLE event ( > ts timestamp, > event char(40), > status char(10), > CONSTRAINT pkey PRIMARY KEY(ts, event) > ); > > > ...with writers doing INSERT or COPY to get data into the table and readers > doing something like: > > SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000; > > ...so, grabbing batches of 1,000, working on them and then setting their status. > > But, am I correct in thinking that SELECT FOR UPDATE will not prevent multiple > workers selecting the same rows? > > Anyway, is this approach reasonable? If so, what tweaks/optimisations should I > be looking to make? > > If it's totally wrong, how should I be looking to approach the problem? I suggest an excellent read on this topic: http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/ Highly recommended if you haven't read it yet. Also, if you aim on 9.5 (not released yet), it will introduce: SELECT... FOR UPDATE SKIP LOCKED -- this is new which supports exactly this use-case (i.e. to implement a job queue). HTH, Ladislav Lenart
On 11/23/15 6:12 AM, Ladislav Lenart wrote: > I suggest an excellent read on this topic: > > http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/ > > Highly recommended if you haven't read it yet. One thing it doesn't mention that you need to be aware of is the vacuum workload on a queue table. In a busy queue, it will be difficult or even impossible for vacuum to keep the amount of dead rows down to something manageable. That's why PgQ and Slony don't even attempt it; instead, they rotate through a fixed set of tables. Once all the entries in a table have been processed, the table is truncated. If you go the delete route, make sure you don't index any fields in the queue that get updated (otherwise you won't get HOT updates), and run a very aggressive manual vacuum so the table stays small. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On 11/23/2015 4:41 AM, Chris Withers wrote: > Hi All, > > I wondered if any of you could recommend best practices for using a > postgres table as a queue. Roughly speaking, 100-200 workers will vomit > rows and rates of a few hundres per second into the table leaving the > status as new and then as many workers as needed to keep up with the > load will plough through the queue changing the status to something > other than new. > > My naive implementation would be something along the lines of: > > CREATE TABLE event ( > ts timestamp, > event char(40), > status char(10), > CONSTRAINT pkey PRIMARY KEY(ts, event) > ); > > > ...with writers doing INSERT or COPY to get data into the table and > readers doing something like: > > SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000; > > ...so, grabbing batches of 1,000, working on them and then setting their > status. > > But, am I correct in thinking that SELECT FOR UPDATE will not prevent > multiple workers selecting the same rows? > > Anyway, is this approach reasonable? If so, what tweaks/optimisations > should I be looking to make? > > If it's totally wrong, how should I be looking to approach the problem? > > cheers, > > Chris Have you tried Redis? Its really good at that sort of thing. -Andy
On 11/23/2015 2:41 AM, Chris Withers wrote: > > If it's totally wrong, how should I be looking to approach the problem? depending on where these queue entries are coming from, I'd considering using a message queueing system like AMS, MQseries, etc, rather than trying to use a relational database table as a queue. your external data source(s) would write messages to this queue, and you'd have 'subscriber' processes that listen to the queue and process the messages, inserting persistent data into the database as needed. -- john r pierce, recycling bits in santa cruz
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com> wrote: > On 11/23/2015 2:41 AM, Chris Withers wrote: >> >> >> If it's totally wrong, how should I be looking to approach the problem? > > depending on where these queue entries are coming from, I'd considering > using a message queueing system like AMS, MQseries, etc, rather than trying > to use a relational database table as a queue. your external data source(s) > would write messages to this queue, and you'd have 'subscriber' processes > that listen to the queue and process the messages, inserting persistent data > into the database as needed. I just don't agree with this generalization. Keeping the state of the queue in the database has a lot of advantages and is a lot easier to deal with from a programming perspective especially if SQL is your core competency. Being able to produce and consume in SQL based on other relational datasources is...elegant. Specialized queue systems are a very heavy dependency and adding a new server to your platform to mange queues is not something to take lightly. This advice also applies to scheduling systems like quartz, specialized search like solr and elastisearch, and distributed data platforms like hadoop. I've used all of these things and have tended to wish I had just used the database instead in just about every case. Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial). Personally, I tend to roll my own queues. It's not difficult. merlin
On 11/23/2015 2:51 PM, Merlin Moncure wrote: > On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce<pierce@hogranch.com> wrote: > > > >depending on where these queue entries are coming from, I'd considering > >using a message queueing system like AMS, MQseries, etc, rather than trying > >to use a relational database table as a queue. your external data source(s) > >would write messages to this queue, and you'd have 'subscriber' processes > >that listen to the queue and process the messages, inserting persistent data > >into the database as needed. > I just don't agree with this generalization. Keeping the state of the > queue in the database has a lot of advantages and is a lot easier to > deal with from a programming perspective especially if SQL is your > core competency. Being able to produce and consume in SQL based on > other relational datasources is...elegant. our whole system at $job is message based as its a distributed system. clients send messages to middleware servers that talk to the database servers. noone talks directly to the database, instead they use messaging. also the several databases in our core cluster talk to each other with messaging, where the front end database publishes events that the other reporting database servers subscribe to. its a very powerful model for building complex distributed systems and maintaining quite a lot of implementation flexibility, as the exact nature of the schema only needs to be known by a few publisher and subscriber modules. -- john r pierce, recycling bits in santa cruz
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
(LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
-- Set some constraints so the query optimizer knows what to do
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 11/23/2015 2:41 AM, Chris Withers wrote:
>>
>>
>> If it's totally wrong, how should I be looking to approach the problem?
>
> depending on where these queue entries are coming from, I'd considering
> using a message queueing system like AMS, MQseries, etc, rather than trying
> to use a relational database table as a queue. your external data source(s)
> would write messages to this queue, and you'd have 'subscriber' processes
> that listen to the queue and process the messages, inserting persistent data
> into the database as needed.
I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.
Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly. This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop. I've used all of these things and have tended
to wish I had just used the database instead in just about every case.
Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues. It's not difficult.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Nov 23, 2015 at 2:41 AM, Chris Withers <chris@simplistix.co.uk> wrote: > Hi All, > > I wondered if any of you could recommend best practices for using a postgres > table as a queue. Roughly speaking, 100-200 workers will vomit rows and > rates of a few hundres per second into the table leaving the status as new > and then as many workers as needed to keep up with the load will plough > through the queue changing the status to something other than new. Is that several hundred per second per worker, or just several hundred per second in total? What happens if the worker finds the database has crashed when it goes to insert the records? That would determine what kind of transactional system you need. On the consumer side, what would happen if a record were processed twice? > > My naive implementation would be something along the lines of: > > CREATE TABLE event ( > ts timestamp, > event char(40), > status char(10), > CONSTRAINT pkey PRIMARY KEY(ts, event) > ); How long are going you keep these records around for once processed? Unless you delete them immediately, you will probably want a partial index on (ts) where status='new' > ...with writers doing INSERT or COPY to get data into the table and readers > doing something like: > > SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000; > > ...so, grabbing batches of 1,000, working on them and then setting their > status. Why 1000 at a time? Also, you probably need an ORDER BY > But, am I correct in thinking that SELECT FOR UPDATE will not prevent > multiple workers selecting the same rows? You are correct, but the way it will prevent multiple workers from selecting them at the same time is that the next worker will block until the first one commits. You would either use need to use SKIP LOCKED in 9.5 release, or you would need to use pg_try_advisory_xact_lock on lower versions, to avoid that. And, how do you unlock the rows? There are two general approaches. One is to lock the row using PostgreSQL's FOR UPDATE type locks, and hold the transaction open while processing, then updating the row to mark it done and committing to release the lock. This cleans up after itself in the case a worker crashes, but there is no visibility into what is going on. The other is to claim the row for the worker by updating a status field (for example, to have a hostname and pid), and committing that. And then doing the processing, then updating it again to set it as done, and committing that. An abnormal terminated worker will need to have someone or something clean up after it, but it gives you much better visibility into what is happening. > > Anyway, is this approach reasonable? If so, what tweaks/optimisations should > I be looking to make? The best optimization would be to not do it at all. Why can't the 200 inserting worker just do the work themselves immediately (either client side or server side), instead of queuing it? If there is any possibility of making the processing fast enough to do it that way, I'd surely spend my time optimizing the actual work, rather than optimizing a queuing system. And depending on what failure modes you can tolerate, consider a best-effort dedicated queuing system rather than a perfectly ACID one built on PostgreSQL. Cheers, Jeff
CREATE TABLE its_eto.eto_sql_tb_session_www
(
session_www_code char(32) NOT NULL UNIQUE PRIMARY KEY,session_www_type int NOT NULL,
session_www_state int NOT NULL,
session_verify_code char(7) NOT NULL,session_www_serno SERIAL NOT NULL UNIQUE,session_target_serno int NULL,
session_target_data varchar(1000) NULL,session_www_init_utc timestamp NOT NULL,
session_www_last_utc timestamp NOT NULL,
session_www_expiry_utc timestamp NOT NULL,
session_www_delete_utc timestamp NOT NULL,
session_www_hit_count int NOT NULL,
session_www_act_seqno int NULL);CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno);
From: Tim UckunTo: Merlin MoncureSent: Wednesday, November 25, 2015 3:50 PMSubject: Re: [GENERAL] using a postgres table as a multi-writer multi-updater queueI'll add my two cents.....I set up something similar a while ago. Here are my suggestions for what they are worth.You don't ever want to delete from such a table so you need to set up something which allows you to truncate the tables when you no longer need them.One way to accomplish this is with rules (not triggers, rules are blazingly fast compared to triggers). Set up a table inheritance scheme based on whatever search criteria you have (date, sequence etc). Set up a cron job to create the child tables well ahead and to set up the rule redirecting the insert. For example let's say you have a date partitioned table and you want to keep a table for every day. Your cron job would run once a day and would create the next seven days worth of tables (just incase the cron job fails to run for some reason) and would rewrite the rule to insert into the table with a if then else type of logic. This method is preferable to the dynamic creation of the table name with string concatenation because again it's significantly faster.Another method I tried was to have one "primary" child table and "archival" child tables and insert directly into the primal child table. For example say you have a table called "Q". You set up a table called Q_in which inherits from Q. Your code inserts into the Q_in table, you select from the Q table. On a periodic basis you do thisBEGIN TRANSACTIONLOCK TABLE Q_IN IN EXCLUSIVE MODE;
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
(LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
-- Set some constraints so the query optimizer knows what to doEND TRANSACTIONThere is one other method which is the Kafka approach( You can use this in addition to the above methods)Create N tables for incoming queue, each one has a sequence for their ID number. N should be determined by how many clients you expect to run. Create a rule which uses some hash function or round robin or randomly to insert data into one of the tables. Create a different table which keeps track of client connections. The clients use this table to keep track of the last id fetched. For example let's have I have three types of processes that run on the incoming data p1,p2, p3 The table logs the highest ID fetched from each table for each process. When a client connects it connects to the table with the lowest used ID for that process, and it locks it for that client (not process because you can multiple clients running each process), it processes the records, it updates the id, it unlocks the table and it backs off for a few seconds. The next client which woke up goes through the same process and so on. Both Apache Kafka and Amazon kinesis use this approach. One nice thing about this approach is that you can put each table in it's own tablespace in it's own disk for higher performance.One other thing. This is dangerous but you can turn off logging of the tables, this will make the inserts much faster.Finally:There is no need to do any of this. Kinesis is cheap, Kafka is pretty awesome, Rabbit is crazy useful.Hope this helps, ping me offline if you want more details.Cheers.On Tue, Nov 24, 2015 at 11:51 AM, Merlin Moncure <mmoncure@gmail.com> wrote:On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 11/23/2015 2:41 AM, Chris Withers wrote:
>>
>>
>> If it's totally wrong, how should I be looking to approach the problem?
>
> depending on where these queue entries are coming from, I'd considering
> using a message queueing system like AMS, MQseries, etc, rather than trying
> to use a relational database table as a queue. your external data source(s)
> would write messages to this queue, and you'd have 'subscriber' processes
> that listen to the queue and process the messages, inserting persistent data
> into the database as needed.
I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.
Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly. This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop. I've used all of these things and have tended
to wish I had just used the database instead in just about every case.
Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues. It's not difficult.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng. <apetrie@aspetrie.net> wrote: > >> You don't ever want to delete from such a table so you need to set up >> something which allows you to truncate the tables when you no longer need >> them. > > I am migrating a web PHP application (called ITS-ETO) from mysql to > postgres. The app INSERTs a row into a postgres table > <eto_sql_tb_session_www> to manage each session with a web browser. Here is > the DDL for the session table: > > CREATE TABLE its_eto.eto_sql_tb_session_www > ( > session_www_code char(32) NOT NULL UNIQUE PRIMARY KEY, > > session_www_type int NOT NULL, > session_www_state int NOT NULL, > session_verify_code char(7) NOT NULL, > > session_www_serno SERIAL NOT NULL UNIQUE, > > session_target_serno int NULL, > session_target_data varchar(1000) NULL, > > session_www_init_utc timestamp NOT NULL, > session_www_last_utc timestamp NOT NULL, > session_www_expiry_utc timestamp NOT NULL, > session_www_delete_utc timestamp NOT NULL, > session_www_hit_count int NOT NULL, > session_www_act_seqno int NULL > > ); > > CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno); > > Using a "fuzzy" probability mechanism, some randomly-selected fraction of > the HTTP requests that initiate a new session, also SELECT and DELETE > expired rows from the session table. I naively assumed that the database > server would automatically recycle the storage space dynamically released in > this way. > > Now, I'm reading in this forum that in fact, postgres does not efficiently > automatically recycle storage space released by row DELETion. > My application is quite simple and will be supporting a modest workload, > using a small amount of storage space, compared to the massive transaction > rates and gigantic space usages, I'm reading about in this forum. Truncation is far more efficient than deletion + vacuuming. If you are running on the edge of your hardware's capabilities, this efficiency is important. But if you are not on the edge, then it is not worth worrying about. Just make sure your autovacuum settings are at least as aggressive as the default settings. > > I do have the luxury of being able to shut down the application for a few > minutes periodically e.g every 24 hours. > > My plan was always, to avoid eventual exhaustion of the SERIAL sequence > number integer value series, by swapping in during the periodic app > shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table. I'd make the serial column and bigserial, and then forget about it. Cheers, Jeff
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng." <apetrie@aspetrie.net> wrote: >Now, I'm reading in this forum that in fact, postgres does not efficiently >automatically recycle storage space released by row DELETion. Yes and no. Space resulting from deleted rows is not given back to the OS unless you perform a full vacuum. Autovacuum compacts the table, squeezing out empty rows and leaving free space at the end for new insertions. If the table is heavily used, you will reuse the free space quickly anyway. The problem you may run into is needing to autovacuum too often to control the size of the table. Space from truncated or dropped tables is immediately given back to the OS. The issue with heavy deletion usually is performance. Deleting scans the rows and fires any relevant triggers ... truncating or dropping the table does not. >My plan was always, to avoid eventual exhaustion of the SERIAL >sequence number integer value series, by swapping in during the >periodic app shutdown, a freshly truncated postgres > <eto_sql_tb_session_www> table. Is there a logical problem with letting the sequence wrap around? >So my question to this postgres forum is -- should I just remove from >the online app the "fuzzy" probability mechanism, that DELETEs expired >rows from the session table -- because the postgres server is not going >to dynamically recycle the released storage space anyway? I'm not sure I understand the reason for "fuzzy" deletion. There are a number of timestamps in your data ... is it not possible to delete deterministically based on one of them? Hope this helps, George
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng." <apetrie@aspetrie.net> wrote: >My stupid email client software (Microsoft Outlook Express on Win XP) >refuses to respect its own "Reply" option settings for inline reply text. >I've been looking for a replacement email client but so far without success.) Without further comment about a 15 year old, unsupported OS ... My vote for an email client would be Thunderbird. It runs on XP or higher and you can import Outlook's PST files so as to keep your mail archives. Importing PST files directly requires Outlook be available on the same system [there is also a less friendly way to do it via EML files exported from Outlook where Outlook is not on the same system]. Thunderbird directly supports net news, so you don't have to get news mixed with your mail (unless you want to). Hope this helps, George
Thanks to Jeff for the helpful response. My remarks are below. ----- Original Message ----- From: "Jeff Janes" <jeff.janes@gmail.com> To: "Steve Petrie, P.Eng." <apetrie@aspetrie.net> Cc: "Tim Uckun" <timuckun@gmail.com>; "Merlin Moncure" <mmoncure@gmail.com>; "John R Pierce" <pierce@hogranch.com>; "PostgreSQL General" <pgsql-general@postgresql.org> Sent: Thursday, November 26, 2015 2:07 AM Subject: Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue > On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng. > <apetrie@aspetrie.net> wrote: >> >>> You don't ever want to delete from such a table so you need to set up >>> something which allows you to truncate the tables when you no longer >>> need >>> them. ... > Truncation is far more efficient than deletion + vacuuming. If you > are running on the edge of your hardware's capabilities, this > efficiency is important. But if you are not on the edge, then it is > not worth worrying about. This is what I was hoping to learn -- because I doubt my app workload will ever approach the edge of hardware capability. > Just make sure your autovacuum settings are > at least as aggressive as the default settings. > I'll keep that in mind. And by a happy coincidence, in another recent forum thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to commit/vacuum a batch of delete statements in a postgresql function") there is advice from Adrian Klaver to about the need to execute VACUUM outside of a transaction block. >> My plan was always, to avoid eventual exhaustion of the SERIAL sequence >> number integer value series, by swapping in during the periodic app >> shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table. > > I'd make the serial column and bigserial, and then forget about it. I considered using bigint, but decided against doing so for three reasons. 1. int rep is already going to be way more precision than is needed, 2. avoid the extra resource consumption incurred by bigint as compared to int, and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP development system. In fact the PHP app is programmed to terminate abnormally on detecting a postgres SERIAL sequence number that exceeds the maximum positive value of a strictly 32-bit signed integer. > > Cheers, > > Jeff Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada apetrie@aspetrie.net
Thanks to George for the helpful comments. My remarks are below. "George Neuner" <gneuner2@comcast.net> wrote in message news:gvad5bllba9slstdhkn6ql2jbplgd78p98@4ax.com... > On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng." > <apetrie@aspetrie.net> wrote: > > > >>My plan was always, to avoid eventual exhaustion of the SERIAL >>sequence number integer value series, by swapping in during the >>periodic app shutdown, a freshly truncated postgres >> <eto_sql_tb_session_www> table. > > Is there a logical problem with letting the sequence wrap around? > I can't think of any reason why letting the sequence wrap would be a logical problem, but for operational simplicity I prefer to reset the sequence number back to one, during each brief daily postgres server shutdown window. I don't relish the challenge of mentally grappling with gigantic session row sequence numbers, when I'm investigating some future operational anomaly. > > I'm not sure I understand the reason for "fuzzy" deletion. There are > a number of timestamps in your data ... is it not possible to delete > deterministically based on one of them? > You are correct. The timestamps are there in the session row, and the PHP app's session row deletion process does use them to select rows for deletion. There is also a decrementing integer "hit count" limit column (presently initialized == 25) that kills a session when its "budget" of HTTP requests is used up. I should have clarified, that I use the the term "fuzzy" to refer to the probability mechanism, that hooks a small burst of session row deletion activity, to each one of a randomly-selected portion (presently 1/5) of the HTTP requests that cause a new session row to be INSERTed into the session table. This means that on average, only every 5th HTTP request that creates a new session row, will also incur the session deletion workload. When the session row deletion process occurs, its (aggressive) limit for deletion workload is 2X as many expired rows as needed on average, to keep up with the rate of session row creation (so the 2X DELETE limit presently == 10 rows). The idea is to make the process of DELETing expired session rows, automatically scale its activity, to closely and aggressively match the rate of session row creation. A heavy burst of new sessions being created will work proportionately more aggressively to DELETE expired session rows. This way, the (HTTP request-driven) PHP app will be self-tuning its own session table space recycling. And there is no process (e.g. cron-driven), external to the PHP app itself, that is doing session row deletion. Based on what I've learned from this forum (but before I have studied AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command (outside of any SQL transaction block) in the HTTP request-driven PHP app, immediately following any time the PHP app completes a session row DELETE command. Or maybe the AUTOVACUUM request should occur less frequently? > > Hope this helps, > George > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada apetrie@aspetrie.net
Thanks to George for the Thuderbird email client recommendation. My remarks are below. "George Neuner" <gneuner2@comcast.net> wrote in message news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com... > On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng." > <apetrie@aspetrie.net> wrote: > >>My stupid email client software (Microsoft Outlook Express on Win XP) >>refuses to respect its own "Reply" option settings for inline reply text. >>I've been looking for a replacement email client but so far without >>success.) > > Without further comment about a 15 year old, unsupported OS ... > (Suddenly, my MS Outlook Express is letting me do inline comments.) George, your self-restraint is admirable :) And my plan is to move from Win XP to Linux in the (somewhat near) future. Before I lose all credibility with this excellent forum -- be assured that the PHP website app (not yet online) I'm migrating to postgres from mysql, will ABSOLUTELY NOT be running in prodution under any kind of Microsoft server software. Right now I'm planning to use as a server o/s, DragonFlyBSD with its robust HAMMER filesystem. > My vote for an email client would be Thunderbird. It runs on XP or > higher and you can import Outlook's PST files so as to keep your mail > archives. Importing PST files directly requires Outlook be available > on the same system [there is also a less friendly way to do it via EML > files exported from Outlook where Outlook is not on the same system]. > It's a common misconception that MS Outlook Express is compatible with MS Outlook. But in fact the two products are architecturally unrelated. Much the same way that Java and Javascript are unrelated. MS OE does not use .PST files, but there are open source utilities that will extract the contents of MS OE mail folders for migration to alternate email clients. I am considering Thunderbird as an MS OE replacement, but my understanding is that Mozilla has abandoned all but security-related support for Thundebird. I have been kicking the (email client functionality) tires of SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a Mozilla code base. > > > Hope this helps, > George > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada (905) 847-3253 apetrie@aspetrie.net
On 11/29/2015 02:04 AM, Steve Petrie, P.Eng. wrote: > Thanks to George for the Thuderbird email client recommendation. My remarks > are below. > > "George Neuner" <gneuner2@comcast.net> wrote in message > news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com... >> On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng." >> <apetrie@aspetrie.net> wrote: >> >>> My stupid email client software (Microsoft Outlook Express on Win XP) >>> refuses to respect its own "Reply" option settings for inline reply >>> text. >>> I've been looking for a replacement email client but so far without >>> success.) >> >> Without further comment about a 15 year old, unsupported OS ... >> > > (Suddenly, my MS Outlook Express is letting me do inline comments.) > > George, your self-restraint is admirable :) And my plan is to move from Win > XP to Linux in the (somewhat near) future. > > Before I lose all credibility with this excellent forum -- be assured that > the PHP website app (not yet online) I'm migrating to postgres from mysql, > will ABSOLUTELY NOT be running in prodution under any kind of Microsoft > server software. Right now I'm planning to use as a server o/s, > DragonFlyBSD > with its robust HAMMER filesystem. > > >> My vote for an email client would be Thunderbird. It runs on XP or >> higher and you can import Outlook's PST files so as to keep your mail >> archives. Importing PST files directly requires Outlook be available >> on the same system [there is also a less friendly way to do it via EML >> files exported from Outlook where Outlook is not on the same system]. >> > > It's a common misconception that MS Outlook Express is compatible with MS > Outlook. But in fact the two products are architecturally unrelated. Much > the same way that Java and Javascript are unrelated. > > MS OE does not use .PST files, but there are open source utilities that > will > extract the contents of MS OE mail folders for migration to alternate email > clients. > > I am considering Thunderbird as an MS OE replacement, but my understanding > is that Mozilla has abandoned all but security-related support for > Thundebird. I have been kicking the (email client functionality) tires of > SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a > Mozilla code base. My take is SeaMonkey is Thunderbird with a bunch of unneeded features. I use Thunderbird on a variety of machines and it works just fine. To me an email client is a solved problem and security fixes are all that are required, still Mozilla is making changes. For the latest version 38: https://support.mozilla.org/en-US/kb/new-thunderbird-38 > >> >> >> Hope this helps, >> George >> >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/29/2015 01:59 AM, Steve Petrie, P.Eng. wrote: > Thanks to Jeff for the helpful response. My remarks are below. > > ----- Original Message ----- From: "Jeff Janes" <jeff.janes@gmail.com> > To: "Steve Petrie, P.Eng." <apetrie@aspetrie.net> > Cc: "Tim Uckun" <timuckun@gmail.com>; "Merlin Moncure" > <mmoncure@gmail.com>; > "John R Pierce" <pierce@hogranch.com>; "PostgreSQL General" > <pgsql-general@postgresql.org> > Sent: Thursday, November 26, 2015 2:07 AM > Subject: Re: [GENERAL] using a postgres table as a multi-writer > multi-updater queue > > >> On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng. >> <apetrie@aspetrie.net> wrote: >>> >>>> You don't ever want to delete from such a table so you need to set up >>>> something which allows you to truncate the tables when you no longer >>>> need >>>> them. > ... >> Truncation is far more efficient than deletion + vacuuming. If you >> are running on the edge of your hardware's capabilities, this >> efficiency is important. But if you are not on the edge, then it is >> not worth worrying about. > > This is what I was hoping to learn -- because I doubt my app workload will > ever approach the edge of hardware capability. > >> Just make sure your autovacuum settings are >> at least as aggressive as the default settings. >> > > I'll keep that in mind. And by a happy coincidence, in another recent forum > thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to > commit/vacuum a batch of delete statements in a postgresql function") there > is advice from Adrian Klaver to about the need to execute VACUUM outside of > a transaction block. Just to be clear my reference was to someone running VACUUM manually. Jeff's reference was to the autovacuum daemon which runs VACUUM and ANALYZE as a background processes: http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM > >>> My plan was always, to avoid eventual exhaustion of the SERIAL sequence >>> number integer value series, by swapping in during the periodic app >>> shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table. >> >> I'd make the serial column and bigserial, and then forget about it. > > I considered using bigint, but decided against doing so for three reasons. > 1. int rep is already going to be way more precision than is needed, 2. > avoid the extra resource consumption incurred by bigint as compared to int, > and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP > development system. In fact the PHP app is programmed to terminate > abnormally on detecting a postgres SERIAL sequence number that exceeds the > maximum positive value of a strictly 32-bit signed integer. > >> >> Cheers, >> >> Jeff > > Steve Petrie, P.Eng. > > ITS-ETO Consortium > Oakville, Ontario, Canada > apetrie@aspetrie.net > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng." <apetrie@aspetrie.net> wrote: >"George Neuner" <gneuner2@comcast.net> wrote in message >news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com... > >> My vote for an email client would be Thunderbird. It runs on XP or >> higher and you can import Outlook's PST files so as to keep your mail >> archives. Importing PST files directly requires Outlook be available >> on the same system [there is also a less friendly way to do it via EML >> files exported from Outlook where Outlook is not on the same system]. >> > >It's a common misconception that MS Outlook Express is compatible with MS >Outlook. But in fact the two products are architecturally unrelated. My understanding was that OE was based on the old (Win9x) Outlook. I know it isn't the same as the "enterprise" version. I wasn't aware that OE used a different file format. But, AFAIK, it does still export EML files, so you can move your mailboxes into Thunderbird (or whatever). >I am considering Thunderbird as an MS OE replacement, but my understanding >is that Mozilla has abandoned all but security-related support for >Thundebird. I have been kicking the (email client functionality) tires of >SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a >Mozilla code base. Yes and no. Mozilla has Thunderbird on a slow development track. It does occasionally get new features, but mostly now by having some very popular extension becoming built in. Seamonkey was a fork from a discontinued Mozilla application suite. It is not a Mozilla project, although it does incorporate Mozilla code from Firefox and Thunderbird. The problem I have with Seamonkey is that it tries to be all things to all web users. "Jack of all trades, master of none" is a truism. YMMV, but I would rather have very reliable purpose specific tools than an integrated suite which may do more but be less reliable overall. I'm not knocking Seamonkey per se - it seems to be quite well done - I'm just making a general observation re: integrated application suites. Netscape failed in part because it bit off too much, trying to do mail and news on top of the browser [and not doing them well - I loved the Netscape browser, but it's mail and news interface was just bad]. Mozilla discontinued its web application suite because too few people wanted it. George
On Nov 29, 2015 18:34, "George Neuner" <gneuner2@comcast.net> wrote:
>
> On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
> <apetrie@aspetrie.net> wrote:
>
> >"George Neuner" <gneuner2@comcast.net> wrote in message
> >news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com...
> >
> >> My vote for an email client would be Thunderbird. It runs on XP or
> >> higher and you can import Outlook's PST files so as to keep your mail
> >> archives. Importing PST files directly requires Outlook be available
> >> on the same system [there is also a less friendly way to do it via EML
> >> files exported from Outlook where Outlook is not on the same system].
> >>
> >
> >It's a common misconception that MS Outlook Express is compatible with MS
> >Outlook. But in fact the two products are architecturally unrelated.
>
> My understanding was that OE was based on the old (Win9x) Outlook. I
> know it isn't the same as the "enterprise" version.
This is fantastically of topic but no, it was not. OE was based on the old "Internet mail and news". The actual outlook product has always been separate.
/Magnus
On Sun, 29 Nov 2015 05:02:58 -0500, "Steve Petrie, P.Eng." <apetrie@aspetrie.net> wrote: >I should have clarified, that I use the the term "fuzzy" to refer to the >probability mechanism, that hooks a small burst of session row deletion >activity, to each one of a randomly-selected portion (presently 1/5) of the >HTTP requests that cause a new session row to be INSERTed into the session >table. > >This means that on average, only every 5th HTTP request that creates a new >session row, will also incur the session deletion workload. When the session >row deletion process occurs, its (aggressive) limit for deletion workload is >2X as many expired rows as needed on average, to keep up with the rate of >session row creation (so the 2X DELETE limit presently == 10 rows). > >The idea is to make the process of DELETing expired session rows, >automatically scale its activity, to closely and aggressively match the rate >of session row creation. There's nothing really wrong with that, but I wouldn't do it that way ... I would bulk delete old records from a separate scheduled task. Another way to do it would be to have each new session delete exactly one old session. 1:1 scales perfectly and spreads the delete load evenly across all users. Not that deleting a handful of records is a lengthy process, but it seems unfair to burden some users with it but not others. I would burden (or not) everyone equally. >A heavy burst of new sessions being created will >work proportionately more aggressively to DELETE expired session rows. This >way, the (HTTP request-driven) PHP app will be self-tuning its own session >table space recycling. And there is no process (e.g. cron-driven), >external to the PHP app itself, that is doing session row deletion. > >Based on what I've learned from this forum (but before I have studied >AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command >(outside of any SQL transaction block) in the HTTP request-driven PHP app, >immediately following any time the PHP app completes a session row DELETE >command. > >Or maybe the AUTOVACUUM request should occur less frequently? Reducing the frequency will result in a larger table space on disk. Insertions are made at the end of the table so the table keeps growing in size regardless of deletions until (some kind of) vacuum is run. Autovacuum doesn't shrink the table space on disk, it merely compacts the table's live data so that any free space is at the end. If you want to tightly control the growth of the table space, you need to run autovacuum _more_ often, not less. George
Thanks to George and Adrian for their latest responses regarding a DELETE / AUTOVACUUM / VACUUM approach, to recycling disk space allocated for rows in a postgres table, that is used to manage sessions (e.g. HTTP sessions with visitor web browsers). My comments are below. In them, I propose an alternative strategy, using a partial index, instead of a DELETE / AUTOVACUUM / VACUUM approach. The comments are a bit lengthy, but I hope forum members will consider them, and maybe help lead to a breakthrough in using a postgres table for high-performance session management. "George Neuner" <gneuner2@comcast.net> wrote in message news:ukdm5b1ni7lv393coa71vf8d0i1mi0phuh@4ax.com... > On Sun, 29 Nov 2015 05:02:58 -0500, "Steve Petrie, P.Eng." > <apetrie@aspetrie.net> wrote: > >>I should have clarified, that I use the the term "fuzzy" to refer to the >>probability mechanism, that hooks a small burst of session row deletion >>activity, to each one of a randomly-selected portion (presently 1/5) of >>the >>HTTP requests that cause a new session row to be INSERTed into the session >>table. >> >>This means that on average, only every 5th HTTP request that creates a new >>session row, will also incur the session deletion workload. When the >>session >>row deletion process occurs, its (aggressive) limit for deletion workload >>is >>2X as many expired rows as needed on average, to keep up with the rate of >>session row creation (so the 2X DELETE limit presently == 10 rows). >> >>The idea is to make the process of DELETing expired session rows, >>automatically scale its activity, to closely and aggressively match the >>rate >>of session row creation. > > There's nothing really wrong with that, but I wouldn't do it that way > ... I would bulk delete old records from a separate scheduled task. > > Another way to do it would be to have each new session delete exactly > one old session. 1:1 scales perfectly and spreads the delete load > evenly across all users. > > Not that deleting a handful of records is a lengthy process, but it > seems unfair to burden some users with it but not others. I would > burden (or not) everyone equally. > > >>A heavy burst of new sessions being created will >>work proportionately more aggressively to DELETE expired session rows. >>This >>way, the (HTTP request-driven) PHP app will be self-tuning its own session >>table space recycling. And there is no process (e.g. cron-driven), >>external to the PHP app itself, that is doing session row deletion. >> >>Based on what I've learned from this forum (but before I have studied >>AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command >>(outside of any SQL transaction block) in the HTTP request-driven PHP app, >>immediately following any time the PHP app completes a session row DELETE >>command. >> >>Or maybe the AUTOVACUUM request should occur less frequently? > > Reducing the frequency will result in a larger table space on disk. > Insertions are made at the end of the table so the table keeps growing > in size regardless of deletions until (some kind of) vacuum is run. > > Autovacuum doesn't shrink the table space on disk, it merely compacts > the table's live data so that any free space is at the end. > > If you want to tightly control the growth of the table space, you need > to run autovacuum _more_ often, not less. > Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to recycling disk space used for a session management table, I would like to propose, for consideration by this forum, an idea for a different approach. A row in a session management table, represents a significant "sunk cost" in both computing time used to create the row, and in disk space allocated. Postgres has to use a lot of resources to create that row in the first place. When the session that originally caused that row to be allocated, eventually expires -- why delete the associated session managent row ?? Instead of using a DELETE command to destroy the row (and a resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage space), why not instead, simply mark that session management row as "free" (with an UPDATE command) ?? In brief -- Why not just leave the expired session management row allocated in place, already "hot" and ready to be reused for a new session ?? But -- when the app needs to initiate a new session, it must be able quickly to: 1. know if there are any "free" rows available for re-use, and if so 2. select and allocate a specific "free" row for re-use, (3. and if no "free" row is available, then the app will INSERT a new session row) * * * * * * Would the postgres Partial Index facility be useful here? Here are quotes from the excellent postgres documentation (9.3.5): "11.8. Partial Indexes A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. ... This reduces the size of the index, which will speed up those queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases." "11.2. Index Types PostgreSQL provides several index types: B-tree ... By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations. B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < <= = >= > ... Also, an IS NULL or IS NOT NULL condition on an index column can be used with a B-tree index." * * * * * * In the case of my particular app, I propose to add to the session management table eto_sql_tb_session_www: 1. a new int column session_www_free, and 2. a partial (B-tree) index -- session_www_free_ndx -- on the new column. CREATE DATABASE eto_sql_db WITH ENCODING 'UTF8'; CREATE SCHEMA its_eto; CREATE TABLE its_eto.eto_sql_tb_session_www ( session_www_code char(32) NOT NULL UNIQUE PRIMARY KEY, session_www_type int NOT NULL, session_www_state int NOT NULL, session_verify_code char(7) NOT NULL, session_www_serno SERIAL NOT NULL UNIQUE, session_www_free int NULL, session_target_serno int NULL, session_target_data varchar(1000) NULL, session_www_init_utc timestamp NOT NULL, session_www_last_utc timestamp NOT NULL, session_www_expiry_utc timestamp NOT NULL, session_www_delete_utc timestamp NOT NULL, session_www_hit_count int NOT NULL, session_www_act_seqno int NULL ); CREATE INDEX session_www_serno_ndx ON its_eto.eto_sql_tb_session_www (session_www_serno); CREATE INDEX session_www_free_ndx ON its_eto.eto_sql_tb_session_www (session_www_free) WHERE session_www_free IS NOT NULL; * * * * * * The app starts with an empty table eto_sql_tb_session_www that has empty indexes. When the app goes to initiate a new session, it checks for availability of a "free" session row, using a command like: SELECT session_www_code, session_www_serno, ... FROM eto_sql_tb_session_www WHERE ((session_www_free IS NOT NULL)) LIMIT 1 FOR UPDATE. (I am presuming that the postgres query planner will decide to use the session_www_free_ndx partial index, to very quickly satisfy this query.) If a "free" session row is found (e.g. with session_www_code == 'F032...40D9'), the app will allocate that row to the new session, with a command like: UPDATE eto_sql_tb_session_www SET (..., session_www_free = NULL, ...) WHERE (session_www_code = 'F032...40D9') When performing the UPDATE command, postgres will (MUST !!) remove the entry for that row, from the session_www_free_ndx partial index, because that partial index excludes rows WHERE column session_www_free IS NULL. However, when the app goes to initiate a new session, if there are no entries in the session_www_free_ndx partial index, the app then creates a new session row with a unique (MD5) session_www_code value (e.g. '26AD...50C9'): INSERT INTO eto_sql_tb_session_www (session_www_code, ..., session_www_free, ...) VALUES ('26AD...50C9', ..., NULL, ...)) In that INSERT command, the value of the session_www_free column is set to NULL, so postgres will not (MUST NOT!!)create an entry for that row in the session_www_free_ndx partial index. When a session (e.g. with session_www_code == '26AD...50C9') expires, the app UPDATEs the corresponding session row: UPDATE eto_sql_tb_session_www SET (..., session_www_free = session_www_serno, ...) WHERE (session_www_code = '26AD...50C9') Since the UPDATEd value of column session_www_free IS NOT NULL, postgres will (MUST!!) create an entry for that row in the session_www_free_ndx partial index. Every row in table eto_sql_tb_session_www always has a UNIQUE value for column session_www_serno, so there can never an attempt to create a duplicate key value in the session_www_free_ndx partial index. * * * * * * The app uses the 32-character (MD5) key value in column session_www_code, as the session ID parameter in dynamically-generated HTML pages. So an HTTP request, generated for that session by a web browser from that page, can be associated by the app with the correct unique row in table eto_sql_tb_session_www. To protect itself from processing an HTTP request coming from an outdated HTML page, the app uses an additional 6-character hash value, generated from two columns in table eto_sql_tb_session_www: session_www_init_utc timestamp session_www_act_seqno int The session_www_init_utc column gets a new timestamp value, every time the session row is allocated to a new session. Then the timestamp value stays the same in that column for the duration of that session. The session_www_act_seqno column int value starts == 0, and simply increments by 1 with every UPDATE to that session row. A session row gets updated every time the app receives an HTTP request for that row. The app will wrap the value in the row back to zero, if it ever reaches its maximum possible value. This 6-character hash value, is a second parameter, provided in every dynamically-generated HTML page. So there should be no logical problem, with reusing the same session row, with the same 32-character session_www_code value, for more than one session (but only ever for one session at a time). * * * * * * Given the above proposed approach, the number of rows in table eto_sql_tb_session_www will grow as large as necessary to service the maximum number of concurrent sessions required. But the app can reuse rows of expired sessions, again and again, instead of DELETing every row after it is used used for only one session (and also periodically AUTOVACUUMing or VACUUMing table eto_sql_tb_session_www to reclaim the storage space). Suppose there is a large burst of session activity, and table eto_sql_tb_session_www grows in size to e.g. 100,000 "active" session rows. Eventually, when system activity slows down (or ceases) again, most (all) of the "active" rows will become "free" and available for reuse during the next burst of activity. And as the system gets busier again, the session_www_free_ndx partial index will get smaller (because NULL values are being excluded), so postgres should need to use less resources to search this partial index to find a "free" session row for reuse. Use less resources, at the most advantageous time, when the system is busy. * * * * * * In the case of my app, I do still plan to take the postgres server offline once per day, for a maintenance window of a few minutes. But instead of just forcing the postgres server offline at an arbitrary time every day, the app could analyze counts of entries in the two indexes: session_www_serno_ndx session_www_free_ndx to detect when there are not too many active sessions, so postgres can be taken offline with minimum inconvenience to website visitors. During the brief offline maintenance window, the app can allocate a fresh empty eto_sql_tb_session_www table, and copy over to it from the old eto_sql_tb_session_www table, only rows of active sessions. As a result of this table copy operation, column session_www_serno SERIAL numbers will start again at 1, but this is no problem, because this SERIAL number is only ever used as a key by the app, for the duration of a single HTTP request, once the (MD5) key value in the HTTP request has been used to find the row (by the session_www_code PRIMARY KEY). The value of the session_www_code column won't get changed, when a row is copied across to the fresh eto_sql_tb_session_www table. * * * * * * For this approach to succeed, it will be important to ensure that, once storage space for a row in table eto_sql_tb_session_www has been allocated, postgres is in fact going to be able to reuse the same storage space for that row, even as values in that row get changed. I would expect this proviso to be more likely correct for columns containing fixed-length values (e.g. char(32), int, char(7), SERIAL, timestamp). But what about the session_target_data varchar(1000) NULL column in table eto_sql_tb_session_www ?? Here are some snippets from the postgres doc (9.3.5): 3. Character Types "... Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way." "... The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less." "... There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead." Perhaps someone with postgres internals knolwledge could advise, about how much storage space reuse, the proposed partial index approach might actually accomplish, with table eto_sql_tb_session_www ... ?? * * * * * * If forum members think this approach, of using a partial index might work, to implement an efficient session row reuse strategy, I will proceed to try it. > George > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada apetrie@aspetrie.net
> Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to > recycling disk space used for a session management table, I would like to > propose, for consideration by this forum, an idea for a different approach. > > A row in a session management table, represents a significant "sunk cost" in > both computing time used to create the row, and in disk space allocated. > Postgres has to use a lot of resources to create that row in the first > place. > > When the session that originally caused that row to be allocated, eventually > expires -- why delete the associated session managent row ?? > > Instead of using a DELETE command to destroy the row (and a > resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage > space), why not instead, simply mark that session management row as "free" > (with an UPDATE command) ?? An UPDATE is a combination of an INSERT and a DELETE command. However, rows marked as deleted will be reused at some point after autovacuum (or manual VACUUM) has made sure they are no longer in use by any DB session. So your approach can still work, as long as you vacuum that table frequently enough. The actual solution isn't the UPDATE instead of the DELETE though, but rather the partial index and an increased autovacuum frequency. Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
On Mon, 30 Nov 2015 23:07:36 -0500, "Steve Petrie, P.Eng." <apetrie@aspetrie.net> wrote: >Instead of using a DELETE command to destroy the row (and a >resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage >space), why not instead, simply mark that session management row as "free" >(with an UPDATE command) ?? Alban beat me to the answer 8-) But to expand a little: Your plan won't work because Postgresql does not update in place - it inserts a new changed row and marks the old as deleted. It does not physically overwrite the old row until the table is vacuumed. [If even then - a row which lies beyond the logical end-of-table when vacuum is finished won't be overwritten until its space is recycled.] This behavior, known as MVCC (multiple version concurrency control), is integral to transaction isolation: selects which are running concurrently with the update may already have seen the old row and must continue to see it until they complete, even if the update completes first. Postgresql doesn't support "dirty read" isolation. A row can't be physically dropped or its space overwritten while any transaction that can "see" it is still running. For more: https://devcenter.heroku.com/articles/postgresql-concurrency https://momjian.us/main/writings/pgsql/mvcc.pdf Actually lots of great stuff in the presentation section on Bruce Momjian's site: https://momjian.us George