Re: tracking commit timestamps - Mailing list pgsql-hackers

From Steve Singer
Subject Re: tracking commit timestamps
Date
Msg-id BLU436-SMTP92E792FBEECCC26B120BECDC8D0@phx.gbl
Whole thread Raw
In response to Re: tracking commit timestamps  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: tracking commit timestamps  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 11/14/2014 08:21 PM, Simon Riggs wrote:
> The requested information is already available, as discussed. Logical
> decoding adds commit ordering for *exactly* the purpose of using it
> for replication, available to all solutions. This often requested
> feature has now been added and doesn't need to be added twice.
>
> So what we are discussing is adding a completely superfluous piece of
> information.
>
> Not including the LSN info does nothing to trigger based replication,
> which will no doubt live on happily for many years. But adding LSN
> will slow down logical replication, for no purpose at all.
>

Simon,
The use cases I'm talking about aren't really replication related. Often 
I have come across systems that want to do something such as 'select * 
from orders where X > the_last_row_I_saw order by X' and then do further 
processing on the order.

This is kind of awkard to do today because you don't have a good 
candidate for 'X' to order on.   Using either a sequence or insert-row 
timestamp doesn't work well because a transaction with a lower value for 
X might end up committing after the higher value in in a query result.

Yes you could setup a logical wal slot and listen on the stream of 
inserts into your order table but thats a lot of administration overhead 
compared to just issuing an SQL query for what really is a query type 
operation.

Using the commit timestamp for my X sounded very tempting but could 
allow duplicates.

One could argue that this patch is about replication features, and 
providing commit ordering for query purposes should be a separate patch 
to add that on top of this infrastructure. I see merit to smaller more 
focused patches but that requires leaving the door open to easily 
extending things later.

It could also be that I'm the only one who wants to order and filter 
queries in this manner (but that would surprise me).  If the commit lsn 
has limited appeal and we decide we don't want it at all  then we 
shouldn't add it.  I've seen this type of requirement in a number of 
different systems at a number of different companies.  I've generally 
seen it dealt with by either selecting rows behind the last now() 
timestamp seen and then filtering out already processed rows or by 
tracking the 'processed' state of each row individually (ie performing 
an update on each row once its been processed) which performs poorly.

Steve








pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: New storage parameter pages_per_range not mentioned in CREATE INDEX doc
Next
From: Michael Paquier
Date:
Subject: Re: New storage parameter pages_per_range not mentioned in CREATE INDEX doc