Home > mailing lists

Batch insert order and current_timestamp - Mailing list pgsql-jdbc

From	Agustin CS
Subject	Batch insert order and current_timestamp
Date	March 6, 2009 07:59:25
Msg-id	9fb413bd0903060359j6659f02bmab90b2eb6b1160f6@mail.gmail.com Whole thread Raw
List	pgsql-jdbc

Tree view

Hi all

I'm getting a strange issue with row inserts using PreparedStatement/Statement batches.

I'm trying to use a table as a queue, so lower levels receive data from devices (changes in their attributes)
and add those changes into the table (one row per change) using batches.If some of them are available it reads a certain number (for example 100) and after that those 100 are removed,
so next time upper level will try to read other 100, and so on.

The table looks like this:
ID sequence self-increment (inc = +1)
timestamp
otherFields...

The insert query used for the batch at the PreparedStatement looks like this:
insert into table (timestamp,otherFields) values(current_timestamp,...) so the default value (nextval(sequence))
is used for the ID column.

I was expecting to have a global order only by using this ID as order key, but according to an answer from
people of #postgresql@freenode.org it seems that's not enough. It seems two consecutive inserts can get
disordered, that is, the first one could have a lower ID but a newer timestamp (assuming that have been
inserted from different batches, for example if lower level includes multiple threads inserting rows using their
own batches).

Considering that I can see all rows inserted in the same batch having the same timestamp, the right way
to select rows from the upper level should be ORDER BY timestamp,ID.

Anyway, if this "disorder" can really happen then the global ordering is lost. For example:
- INSERT 50 rows from lower levels from low-thread 1
- SELECT FROM UPPER LEVELS
- INSERT 50 rows from lower levels from low-thread 2

Because of this problem insert1-row50 may get a lower ID, but a newer timestamp than insert2-row1.

Then if the upper level is extracting 50 rows each time and launches the query right in the middle of those
inserts it will get a wrong "global order", even if it's using the ORDER BY timestamp,ID. That is, each
select is locally ordered (it's guaranteed by the ORDER clause) but there's a disorder between 2 consecutive
selects.

I'm asking about this because i've found exactly this problem, two rows in the table where the first one has lower
id but newer timestamp. Right now assuming that each one would be part of different selects is a situation
forced by me teorically, because i think there is a possible problem there.

To summarize, the questions are:
- if i launch a batch having row1, row2 and row3 (inserted in this order into the batch) in a table with a sequence PK
and another timestamp column (with current_timestamp as value) can i be sure that lower id means "not newer" timestamp?
- in the table will i allways get the rows ordered in this way related to the ID? i mean, row1 will allways have lower ID
than row2 and row3, and so on?
- what about locking the table for inserts? it may be the last way to do this, but depending on the concrete scenario
maybe it would be a good solution.

Thx in advance and regards

pgsql-jdbc by date:

From: Dave Cramer
Date: 06 March 2009, 07:40:58
Subject: Re: Forcing postgres NOT to use sequential scan, trough JDBC

From: Александър Шопов
Date: 06 March 2009, 08:25:49
Subject: Re: Documentation for working with PG interval types via JDBC

Batch insert order and current_timestamp - Mailing list pgsql-jdbc

Previous

Next