Thread: Can Postgres Not Do This Safely ?!?
Hello Postgres Hackers, We have a simple 'event log' table that is insert only (by multiple concurrent clients). It has an integer primary key. We want to do incremental queries of this table every 5 minutes or so, i.e. "select * from events where id > LAST_ID_I_GOT" to insert into a separate reporting database. The problem is, this simple approach has a race that will forever skip uncommitted events. I.e., if 5000 was committed sooner than 4999, and we get 5000, we will never go back and get 4999 when it finally commits. How can we solve this? Basically it's a phantom row problem but it spans transactions. I looked at checking the internal 'xmin' column but the docs say that is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit value. I don't get it. All I want to is make sure I skip over any rows that are newer than the oldest currently running transaction. Has nobody else run into this before? Thank you very much. -- Karl Pickett
On 29 October 2010 03:04, Karl Pickett <karl.pickett@gmail.com> wrote: > Hello Postgres Hackers, > > We have a simple 'event log' table that is insert only (by multiple > concurrent clients). It has an integer primary key. We want to do > incremental queries of this table every 5 minutes or so, i.e. "select > * from events where id > LAST_ID_I_GOT" to insert into a separate > reporting database. The problem is, this simple approach has a race > that will forever skip uncommitted events. I.e., if 5000 was > committed sooner than 4999, and we get 5000, we will never go back and > get 4999 when it finally commits. How can we solve this? Basically > it's a phantom row problem but it spans transactions. > > I looked at checking the internal 'xmin' column but the docs say that > is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit > value. I don't get it. All I want to is make sure I skip over any > rows that are newer than the oldest currently running transaction. > Has nobody else run into this before? If I understand your question correctly, you want a "gapless" PK: http://www.varlena.com/GeneralBits/130.php -- Regards, Peter Geoghegan
On 10/29/2010 10:04 AM, Karl Pickett wrote: > Hello Postgres Hackers, > > We have a simple 'event log' table that is insert only (by multiple > concurrent clients). It has an integer primary key. We want to do > incremental queries of this table every 5 minutes or so, i.e. "select > * from events where id> LAST_ID_I_GOT" to insert into a separate > reporting database. Essentially, in a table populated by concurrent inserts by many transactions which may commit out of order, you want a way to say "get me all tuples inserted since I last asked". Or, really "get me all tuples that became visible since I last looked". I've never found a good answer for this. If there is one, it'd be wonderful for trigger-free, efficient replication of individual tables using batches. The problem is that - because of commit ordering - there doesn't seem to be any way to match a *range* of transactions, you have to match a *list* of individual transaction IDs that committed since you last ran. And you need a way to generate and maintain that list, preferably only including transactions that touched the table of interest. > I looked at checking the internal 'xmin' column but the docs say that > is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit > value. I don't get it. All I want to is make sure I skip over any > rows that are newer than the oldest currently running transaction. Oh, so you don't care if you get the same tuple multiple times if there's some old, long running transaction? You're just trying to avoid repeatedly grabbing the REALLY old stuff? In that case xmin is what you want. You may have to be aware of xid wraparound issues, but I don't know much more about dealing with them than the term. -- Craig Ringer
On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote: > Hello Postgres Hackers, > > We have a simple 'event log' table that is insert only (by multiple > concurrent clients). It has an integer primary key. We want to do > incremental queries of this table every 5 minutes or so, i.e. "select > * from events where id > LAST_ID_I_GOT" to insert into a separate > reporting database. The problem is, this simple approach has a race > that will forever skip uncommitted events. I.e., if 5000 was > committed sooner than 4999, and we get 5000, we will never go back and > get 4999 when it finally commits. How can we solve this? Basically > it's a phantom row problem but it spans transactions. > > I looked at checking the internal 'xmin' column but the docs say that > is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit > value. I don't get it. http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS "The internal transaction ID type (xid) is 32 bits wide and wraps around every 4 billion transactions. However, these functions export a 64-bit format that is extended with an "epoch" counter so it will not wrap around during the life of an installation. The data type used by these functions, txid_snapshot, stores information about transaction ID visibility at a particular moment in time. Its components are described in Table 9-53. " So: Current snapshot: test=> SELECT txid_current_snapshot(); txid_current_snapshot ----------------------- 5098:5098: xmin of snapshot: test=> SELECT txid_snapshot_xmin(txid_current_snapshot()); txid_snapshot_xmin -------------------- 5098 (1 row) > All I want to is make sure I skip over any > rows that are newer than the oldest currently running transaction. > Has nobody else run into this before? > > Thank you very much. > > -- > Karl Pickett -- Adrian Klaver adrian.klaver@gmail.com
n Fri, Oct 29, 2010 at 2:53 AM, Peter Geoghegan <peter.geoghegan86@gmail.com> wrote: > On 29 October 2010 03:04, Karl Pickett <karl.pickett@gmail.com> wrote: >> Hello Postgres Hackers, >> >> We have a simple 'event log' table that is insert only (by multiple >> concurrent clients). It has an integer primary key. We want to do >> incremental queries of this table every 5 minutes or so, i.e. "select >> * from events where id > LAST_ID_I_GOT" to insert into a separate >> reporting database. The problem is, this simple approach has a race >> that will forever skip uncommitted events. I.e., if 5000 was >> committed sooner than 4999, and we get 5000, we will never go back and >> get 4999 when it finally commits. How can we solve this? Basically >> it's a phantom row problem but it spans transactions. >> >> I looked at checking the internal 'xmin' column but the docs say that >> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit >> value. I don't get it. All I want to is make sure I skip over any >> rows that are newer than the oldest currently running transaction. >> Has nobody else run into this before? > > If I understand your question correctly, you want a "gapless" PK: > > http://www.varlena.com/GeneralBits/130.php > -- > Regards, > Peter Geoghegan > That's interesting, but we're fine with having gaps in the range that never appear. We also don't want to add a performance penalty for concurrent writers. We just don't want any ids to appear (commit) after we got a later id. To clarify, we are using a plain serial primary key and we already have plenty of holes - that's fine. We just want to do an incremental 'tail -f' of this giant table (along with some joins) to feed into a reporting server every few minutes. So we're treating it like a queue, but not deleting anything and having absolute real-time data is not required. It appears that theoretical options are: 1. Start a serializable transaction and wait until all earlier transactions are gone (query pg_stat_activity or something?) 2. Ignore rows that were created later than any other in progress transactions Both of these options assume that serials can never go backward as they're handed out to connections / xids. I think that's safe to assume? Either would be fine, I just don't know if they're officially supported by postgres. -- Karl Pickett
On Fri, Oct 29, 2010 at 8:58 AM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote: >> Hello Postgres Hackers, >> >> We have a simple 'event log' table that is insert only (by multiple >> concurrent clients). It has an integer primary key. We want to do >> incremental queries of this table every 5 minutes or so, i.e. "select >> * from events where id > LAST_ID_I_GOT" to insert into a separate >> reporting database. The problem is, this simple approach has a race >> that will forever skip uncommitted events. I.e., if 5000 was >> committed sooner than 4999, and we get 5000, we will never go back and >> get 4999 when it finally commits. How can we solve this? Basically >> it's a phantom row problem but it spans transactions. >> >> I looked at checking the internal 'xmin' column but the docs say that >> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit >> value. I don't get it. > > http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS > "The internal transaction ID type (xid) is 32 bits wide and wraps around every 4 > billion transactions. However, these functions export a 64-bit format that is > extended with an "epoch" counter so it will not wrap around during the life of > an installation. The data type used by these functions, txid_snapshot, stores > information about transaction ID visibility at a particular moment in time. Its > components are described in Table 9-53. " > > So: > Current snapshot: > > test=> SELECT txid_current_snapshot(); > txid_current_snapshot > ----------------------- > 5098:5098: > > xmin of snapshot: > test=> SELECT txid_snapshot_xmin(txid_current_snapshot()); > txid_snapshot_xmin > -------------------- > 5098 > (1 row) So what happens when txid_snapshot_xmin() goes over 4 billion, and the table's xmin doesn't? You can't compare a 32 bit value that rolls over to a 64 bit that doesn't. > > >> All I want to is make sure I skip over any >> rows that are newer than the oldest currently running transaction. >> Has nobody else run into this before? >> >> Thank you very much. >> >> -- >> Karl Pickett > > > > -- > Adrian Klaver > adrian.klaver@gmail.com > -- Karl Pickett
On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett <karl.pickett@gmail.com> wrote: > Hello Postgres Hackers, > > We have a simple 'event log' table that is insert only (by multiple > concurrent clients). It has an integer primary key. We want to do > incremental queries of this table every 5 minutes or so, i.e. "select > * from events where id > LAST_ID_I_GOT" to insert into a separate > reporting database. The problem is, this simple approach has a race > that will forever skip uncommitted events. I.e., if 5000 was > committed sooner than 4999, and we get 5000, we will never go back and > get 4999 when it finally commits. How can we solve this? Basically > it's a phantom row problem but it spans transactions. > > I looked at checking the internal 'xmin' column but the docs say that > is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit > value. I don't get it. All I want to is make sure I skip over any > rows that are newer than the oldest currently running transaction. > Has nobody else run into this before? You don't have a sequence problem so much as a wrong implementation problem. Sequences are always *grabbed* in order but they can hit the table out of order and there is a time lag between when the sequence value is generated and the transaction commits. If I issue 'begin', insert a log record, and hold the commit for 5 minutes you are going to skip the record because you are only looking at the last processed record. Your algorithm is going to fail if you use a sequence, timestamp, or gapless sequence to manage your queue position. You need to divide your log records into two logical sets, procesed and unprocessed, and look at the set as a whole. I would suggest staging your unprocessed records to a queue table and having your writer consume them and move them to a processed table. You can also look at already built queuing implementations like PGQ written by our spectacularly skilled friends at Skype (haven't used it myself, but I've heard it's good!). merlin
On 10/29/2010 9:49 AM, Merlin Moncure wrote: > On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett<karl.pickett@gmail.com> wrote: >> Hello Postgres Hackers, >> >> We have a simple 'event log' table that is insert only (by multiple >> concurrent clients). It has an integer primary key. We want to do >> incremental queries of this table every 5 minutes or so, i.e. "select >> * from events where id> LAST_ID_I_GOT" to insert into a separate >> reporting database. The problem is, this simple approach has a race >> that will forever skip uncommitted events. I.e., if 5000 was >> committed sooner than 4999, and we get 5000, we will never go back and >> get 4999 when it finally commits. How can we solve this? Basically >> it's a phantom row problem but it spans transactions. >> >> I looked at checking the internal 'xmin' column but the docs say that >> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit >> value. I don't get it. All I want to is make sure I skip over any >> rows that are newer than the oldest currently running transaction. >> Has nobody else run into this before? > > You don't have a sequence problem so much as a wrong implementation > problem. Sequences are always *grabbed* in order but they can hit the > table out of order and there is a time lag between when the sequence > value is generated and the transaction commits. If I issue 'begin', > insert a log record, and hold the commit for 5 minutes you are going > to skip the record because you are only looking at the last processed > record. Your algorithm is going to fail if you use a sequence, > timestamp, or gapless sequence to manage your queue position. You > need to divide your log records into two logical sets, procesed and > unprocessed, and look at the set as a whole. > > I would suggest staging your unprocessed records to a queue table and > having your writer consume them and move them to a processed table. > You can also look at already built queuing implementations like PGQ > written by our spectacularly skilled friends at Skype (haven't used it > myself, but I've heard it's good!). > > merlin > Yep, you dont want a sequence. You want a flag. add a boolean "processed" flag, default it to false. then every 5 minutes run this: begin insert into logged select * from events where processed = false; update events set processed = true where processed = false; commit; or, if you want to select them and do something to them: begin select * from events where processed = false; ... do you processing on each, which would include inserting it... update events set processed = true where processed = false; commit; Just make sure you do it all in the same transaction, so the update sees the exact same set as the select. You could also create a function index on processed to keep track of just those that are false. -Andy
On 10/29/2010 07:32 AM, Karl Pickett wrote: > On Fri, Oct 29, 2010 at 8:58 AM, Adrian Klaver<adrian.klaver@gmail.com> wrote: >> On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote: >>> Hello Postgres Hackers, >>> >>> We have a simple 'event log' table that is insert only (by multiple >>> concurrent clients). It has an integer primary key. We want to do >>> incremental queries of this table every 5 minutes or so, i.e. "select >>> * from events where id> LAST_ID_I_GOT" to insert into a separate >>> reporting database. The problem is, this simple approach has a race >>> that will forever skip uncommitted events. I.e., if 5000 was >>> committed sooner than 4999, and we get 5000, we will never go back and >>> get 4999 when it finally commits. How can we solve this? Basically >>> it's a phantom row problem but it spans transactions. >>> >>> I looked at checking the internal 'xmin' column but the docs say that >>> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit >>> value. I don't get it. >> >> http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS >> "The internal transaction ID type (xid) is 32 bits wide and wraps around every 4 >> billion transactions. However, these functions export a 64-bit format that is >> extended with an "epoch" counter so it will not wrap around during the life of >> an installation. The data type used by these functions, txid_snapshot, stores >> information about transaction ID visibility at a particular moment in time. Its >> components are described in Table 9-53. " >> >> So: >> Current snapshot: >> >> test=> SELECT txid_current_snapshot(); >> txid_current_snapshot >> ----------------------- >> 5098:5098: >> >> xmin of snapshot: >> test=> SELECT txid_snapshot_xmin(txid_current_snapshot()); >> txid_snapshot_xmin >> -------------------- >> 5098 >> (1 row) > > So what happens when txid_snapshot_xmin() goes over 4 billion, and the > table's xmin doesn't? You can't compare a 32 bit value that rolls > over to a 64 bit that doesn't. The long explanation is here: http://www.postgresql.org/docs/9.0/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND The short version as I understand it is that if everything is working correctly the XID(hence xmin) values exist in a continuous loop where 2 billion are in the past and 2 billion are in the future(assuming default settings). At some point the old values are frozen i.e. replaced with a special FrozenXID. This would mean that the *snapshot functions should only return currently valid xmins. Since I have never rolled over a database I can only speak to theory as I understand it. > >> >> >>> All I want to is make sure I skip over any >>> rows that are newer than the oldest currently running transaction. >>> Has nobody else run into this before? >>> >>> Thank you very much. >>> >>> -- >>> Karl Pickett >> >> >> >> -- >> Adrian Klaver >> adrian.klaver@gmail.com >> > > > -- Adrian Klaver adrian.klaver@gmail.com
On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote: > begin > insert into logged select * from events where processed = false; > update events set processed = true where processed = false; > commit; There's a race condition there. The SELECT in the INSERT statement may read 5 tuples, then a concurrent transaction inserts a 6th tuple, then you do an update on all 6 tuples. > begin > select * from events where processed = false; > ... do you processing on each, which would include inserting it... > update events set processed = true where processed = false; > commit; Same problem here. > Just make sure you do it all in the same transaction, so the update sees > the exact same set as the select. You need to use SERIALIZABLE isolation level for this to work. The default is READ COMMITTED. Or better yet, use Merlin's suggestion of PgQ. They've already worked this out in a safe, efficient way. It's the basis for Londiste, a replication system. Regards, Jeff Davis
On Fri, Oct 29, 2010 at 4:59 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote: >> begin >> insert into logged select * from events where processed = false; >> update events set processed = true where processed = false; >> commit; > > There's a race condition there. The SELECT in the INSERT statement may > read 5 tuples, then a concurrent transaction inserts a 6th tuple, then > you do an update on all 6 tuples. > >> begin >> select * from events where processed = false; >> ... do you processing on each, which would include inserting it... >> update events set processed = true where processed = false; >> commit; > > Same problem here. > >> Just make sure you do it all in the same transaction, so the update sees >> the exact same set as the select. > > Regards, > Jeff Davis As stated earlier in the thread, there is a race condition within that transaction, but you could also use a temp table to get the ids that you are about to process. Maybe something like this (untested): begin; create temp table _idlist (id bigint) on commit drop as select id from eventlog where processed is false; insert into othertable select e.* from eventlog as e inner join _idlist as i on (i.id=e.id); update eventlog set processed=true from _idlist as i where eventlog.id = i.id; commit;
On Fri, Oct 29, 2010 at 7:59 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote: >> begin >> insert into logged select * from events where processed = false; >> update events set processed = true where processed = false; >> commit; > > There's a race condition there. The SELECT in the INSERT statement may > read 5 tuples, then a concurrent transaction inserts a 6th tuple, then > you do an update on all 6 tuples. > >> begin >> select * from events where processed = false; >> ... do you processing on each, which would include inserting it... >> update events set processed = true where processed = false; >> commit; > > Same problem here. > >> Just make sure you do it all in the same transaction, so the update sees >> the exact same set as the select. > > You need to use SERIALIZABLE isolation level for this to work. The > default is READ COMMITTED. > > Or better yet, use Merlin's suggestion of PgQ. They've already worked > this out in a safe, efficient way. It's the basis for Londiste, a > replication system. With multiple writers, single reader, it's usually ok to go the ad hoc route. Just read a bunch of records, do something with them, and delete/move them. This will become even easier when wCTE becomes available, and you can do something like: with foo as (delete from stuff_to_do returning * limit something), bar as (insert into stuff done select * form foo) select do_stuff(...) from foo; Heavy artillery like PGQ becomes necessary IMO when you have multiple readers pulling things off the queue and doing things. Doing this in ad hoc fashion would require separating the 'getting stuff from the queue' and the 'doing stuff'. This is harder than it looks, and the skytools people have apparently got it completely worked out (adding a 3rd party dependency is never something to be taken lightly though). I'm a little old school in the way I do things -- I try to work the problem down to where the simpler solutions work. I do a *lot* of batch processing though, and maybe I should put learning PGQ on my radar. merlin