Re: Can Postgres Not Do This Safely ?!? - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Can Postgres Not Do This Safely ?!?
Date
Msg-id AANLkTinNbThWs_jVRXS3Fcvjk0qU9Non_71E8MHn2Bpq@mail.gmail.com
Whole thread Raw
In response to Can Postgres Not Do This Safely ?!?  (Karl Pickett <karl.pickett@gmail.com>)
Responses Re: Can Postgres Not Do This Safely ?!?
List pgsql-general
On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett <karl.pickett@gmail.com> wrote:
> Hello Postgres Hackers,
>
> We have a simple 'event log' table that is insert only (by multiple
> concurrent clients).  It has an integer primary key.  We want to do
> incremental queries of this table every 5 minutes or so, i.e. "select
> * from events where id > LAST_ID_I_GOT" to insert into a separate
> reporting database.  The problem is, this simple approach has a race
> that will forever skip uncommitted events.  I.e., if 5000 was
> committed sooner than 4999, and we get 5000, we will never go back and
> get 4999 when it finally commits.  How can we solve this?  Basically
> it's a phantom row problem but it spans transactions.
>
> I looked at checking the internal 'xmin' column but the docs say that
> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
> value.  I don't get it.   All I want to is make sure I skip over any
> rows that are newer than the oldest currently running transaction.
> Has nobody else run into this before?

You don't have a sequence problem so much as a wrong implementation
problem.  Sequences are always *grabbed* in order but they can hit the
table out of order and there is a time lag between when the sequence
value is generated and the transaction commits.  If I issue 'begin',
insert a log record, and hold the commit for 5 minutes you are going
to skip the record because you are only looking at the last processed
record.  Your algorithm is going to fail if you use a sequence,
timestamp, or gapless sequence to manage your queue position.  You
need to divide your log records into two logical sets, procesed and
unprocessed, and look at the set as a whole.

I would suggest staging your unprocessed records to a queue table and
having your writer consume them and move them to a processed table.
You can also look at already built queuing implementations like PGQ
written by our spectacularly skilled friends at Skype (haven't used it
myself, but I've heard it's good!).

merlin

pgsql-general by date:

Previous
From: Karl Pickett
Date:
Subject: Re: Can Postgres Not Do This Safely ?!?
Next
From: "Jan C."
Date:
Subject: Re: pg_restore -t table doesn't restore PKEY