Thread: Predicate locking

Predicate locking

From
Vlad Arkhipov
Date:
I'm currently need predicate locking in the project, so there are two 
ways to get it by now: implement it by creating special database records 
to lock with SELECT FOR UPDATE or wait while they will be implemented in 
Postgres core. Is there something like predicate locking on the TODO 
list currently?


Re: Predicate locking

From
Nicolas Barbier
Date:
2011/4/27 Vlad Arkhipov <arhipov@dc.baikal.ru>:

> I'm currently need predicate locking in the project, so there are two ways
> to get it by now: implement it by creating special database records to lock
> with SELECT FOR UPDATE or wait while they will be implemented in Postgres
> core. Is there something like predicate locking on the TODO list currently?

I assume you want ("real", as opposed to what is in < 9.1 now)
SERIALIZABLE transactions, in which case you could check:

<URL:http://wiki.postgresql.org/wiki/Serializable>

Nicolas


Re: Predicate locking

From
Vlad Arkhipov
Date:
27.04.2011 17:45, Nicolas Barbier:
> 2011/4/27 Vlad Arkhipov<arhipov@dc.baikal.ru>:
>
>    
>> I'm currently need predicate locking in the project, so there are two ways
>> to get it by now: implement it by creating special database records to lock
>> with SELECT FOR UPDATE or wait while they will be implemented in Postgres
>> core. Is there something like predicate locking on the TODO list currently?
>>      
> I assume you want ("real", as opposed to what is in<  9.1 now)
> SERIALIZABLE transactions, in which case you could check:
>
> <URL:http://wiki.postgresql.org/wiki/Serializable>
>
> Nicolas
>
>    
Not sure about the whole transaction, I think it degrades the 
performance too much as transactions access many tables. Just wanted 
SELECT FOR UPDATE to prevent inserting records into a table with the 
specified condition. It seems to be very typical situation when you have 
a table like
CREATE TABLE timetable (start_ts TIMESTAMP, end_ts TIMESTAMP)
and before insertion in this table want to guarantee that there is no 
overlapped time intervals there. So, first you need to lock the range in 
the table, then to check if there are any records in this range.
In my case this table is the only for which I need such kind of locking.


Re: Predicate locking

From
Heikki Linnakangas
Date:
On 27.04.2011 12:24, Vlad Arkhipov wrote:
> 27.04.2011 17:45, Nicolas Barbier:
>> 2011/4/27 Vlad Arkhipov<arhipov@dc.baikal.ru>:
>>
>>> I'm currently need predicate locking in the project, so there are two
>>> ways
>>> to get it by now: implement it by creating special database records
>>> to lock
>>> with SELECT FOR UPDATE or wait while they will be implemented in
>>> Postgres
>>> core. Is there something like predicate locking on the TODO list
>>> currently?
>> I assume you want ("real", as opposed to what is in< 9.1 now)
>> SERIALIZABLE transactions, in which case you could check:
>>
>> <URL:http://wiki.postgresql.org/wiki/Serializable>
>>
>> Nicolas
>>
> Not sure about the whole transaction, I think it degrades the
> performance too much as transactions access many tables. Just wanted
> SELECT FOR UPDATE to prevent inserting records into a table with the
> specified condition. It seems to be very typical situation when you have
> a table like
> CREATE TABLE timetable (start_ts TIMESTAMP, end_ts TIMESTAMP)
> and before insertion in this table want to guarantee that there is no
> overlapped time intervals there. So, first you need to lock the range in
> the table, then to check if there are any records in this range.
> In my case this table is the only for which I need such kind of locking.

You can do that with exclusion constraints:

http://www.postgresql.org/docs/9.0/static/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION)

See also Depesz's blog post for a specific example on how to use it for 
time ranges:

http://www.depesz.com/index.php/2010/01/03/waiting-for-8-5-exclusion-constraints/

And Jeff Davis's blog post that uses the period data type instead of the 
hack to represent time ranges as boxes:

http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Predicate locking

From
Vlad Arkhipov
Date:
27.04.2011 18:38, Heikki Linnakangas пишет:
> On 27.04.2011 12:24, Vlad Arkhipov wrote:
>> 27.04.2011 17:45, Nicolas Barbier:
>>> 2011/4/27 Vlad Arkhipov<arhipov@dc.baikal.ru>:
>>>
>>>> I'm currently need predicate locking in the project, so there are two
>>>> ways
>>>> to get it by now: implement it by creating special database records
>>>> to lock
>>>> with SELECT FOR UPDATE or wait while they will be implemented in
>>>> Postgres
>>>> core. Is there something like predicate locking on the TODO list
>>>> currently?
>>> I assume you want ("real", as opposed to what is in< 9.1 now)
>>> SERIALIZABLE transactions, in which case you could check:
>>>
>>> <URL:http://wiki.postgresql.org/wiki/Serializable>
>>>
>>> Nicolas
>>>
>> Not sure about the whole transaction, I think it degrades the
>> performance too much as transactions access many tables. Just wanted
>> SELECT FOR UPDATE to prevent inserting records into a table with the
>> specified condition. It seems to be very typical situation when you have
>> a table like
>> CREATE TABLE timetable (start_ts TIMESTAMP, end_ts TIMESTAMP)
>> and before insertion in this table want to guarantee that there is no
>> overlapped time intervals there. So, first you need to lock the range in
>> the table, then to check if there are any records in this range.
>> In my case this table is the only for which I need such kind of locking.
>
> You can do that with exclusion constraints:
>
> http://www.postgresql.org/docs/9.0/static/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION)
>
>
> See also Depesz's blog post for a specific example on how to use it
> for time ranges:
>
> http://www.depesz.com/index.php/2010/01/03/waiting-for-8-5-exclusion-constraints/
>
>
> And Jeff Davis's blog post that uses the period data type instead of
> the hack to represent time ranges as boxes:
>
> http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/
>
Exclusion constraints works only in simple cases. I need to check a 
great amount of business rules to assure that the insertion is possible. 
For example,
for a table with columns (start_ts TIMESTAMP, end_ts TIMESTAMP, room 
BIGINT, visitor BIGINT, service BIGINT) it's not possible to have 
overlapped intervals
for the same time and room, but different visitors. So, in terms of 
exclusion constraints I need something like:

room WITH =,
visitor WITH <>,
(start_ts, end_ts) WITH &&

which seems to be impossible. Predicate locking provides more flexible 
way to solve this problem.


Re: Predicate locking

From
David Fetter
Date:
On Thu, Apr 28, 2011 at 12:07:34PM +0900, Vlad Arkhipov wrote:
> 27.04.2011 18:38, Heikki Linnakangas пишет:
> >On 27.04.2011 12:24, Vlad Arkhipov wrote:
> >>27.04.2011 17:45, Nicolas Barbier:
> >>>2011/4/27 Vlad Arkhipov<arhipov@dc.baikal.ru>:
> >>>
> >>>>I'm currently need predicate locking in the project, so there are two
> >>>>ways
> >>>>to get it by now: implement it by creating special database records
> >>>>to lock
> >>>>with SELECT FOR UPDATE or wait while they will be implemented in
> >>>>Postgres
> >>>>core. Is there something like predicate locking on the TODO list
> >>>>currently?
> >>>I assume you want ("real", as opposed to what is in< 9.1 now)
> >>>SERIALIZABLE transactions, in which case you could check:
> >>>
> >>><URL:http://wiki.postgresql.org/wiki/Serializable>
> >>>
> >>>Nicolas
> >>>
> >>Not sure about the whole transaction, I think it degrades the
> >>performance too much as transactions access many tables. Just wanted
> >>SELECT FOR UPDATE to prevent inserting records into a table with the
> >>specified condition. It seems to be very typical situation when you have
> >>a table like
> >>CREATE TABLE timetable (start_ts TIMESTAMP, end_ts TIMESTAMP)
> >>and before insertion in this table want to guarantee that there is no
> >>overlapped time intervals there. So, first you need to lock the range in
> >>the table, then to check if there are any records in this range.
> >>In my case this table is the only for which I need such kind of locking.
> >
> >You can do that with exclusion constraints:
> >
> >http://www.postgresql.org/docs/9.0/static/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION)
> >
> >
> >See also Depesz's blog post for a specific example on how to use it
> >for time ranges:
> >
> >http://www.depesz.com/index.php/2010/01/03/waiting-for-8-5-exclusion-constraints/
> >
> >
> >And Jeff Davis's blog post that uses the period data type instead of
> >the hack to represent time ranges as boxes:
> >
> >http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/
> >
> Exclusion constraints works only in simple cases. I need to check a
> great amount of business rules to assure that the insertion is
> possible. For example,
> for a table with columns (start_ts TIMESTAMP, end_ts TIMESTAMP, room
> BIGINT, visitor BIGINT, service BIGINT) it's not possible to have
> overlapped intervals
> for the same time and room, but different visitors. So, in terms of
> exclusion constraints I need something like:
> 
> room WITH =,
> visitor WITH <>,
> (start_ts, end_ts) WITH &&
> 
> which seems to be impossible. Predicate locking provides more
> flexible way to solve this problem.

Did you actually try it?  It works just fine with a timestamp range.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Predicate locking

From
Vlad Arkhipov
Date:
28.04.2011 21:36, David Fetter пишет:
> On Thu, Apr 28, 2011 at 12:07:34PM +0900, Vlad Arkhipov wrote:
>    
>> 27.04.2011 18:38, Heikki Linnakangas пишет:
>>      
>>> On 27.04.2011 12:24, Vlad Arkhipov wrote:
>>>        
>>>> 27.04.2011 17:45, Nicolas Barbier:
>>>>          
>>>>> 2011/4/27 Vlad Arkhipov<arhipov@dc.baikal.ru>:
>>>>>
>>>>>            
>>>>>> I'm currently need predicate locking in the project, so there are two
>>>>>> ways
>>>>>> to get it by now: implement it by creating special database records
>>>>>> to lock
>>>>>> with SELECT FOR UPDATE or wait while they will be implemented in
>>>>>> Postgres
>>>>>> core. Is there something like predicate locking on the TODO list
>>>>>> currently?
>>>>>>              
>>>>> I assume you want ("real", as opposed to what is in<  9.1 now)
>>>>> SERIALIZABLE transactions, in which case you could check:
>>>>>
>>>>> <URL:http://wiki.postgresql.org/wiki/Serializable>
>>>>>
>>>>> Nicolas
>>>>>
>>>>>            
>>>> Not sure about the whole transaction, I think it degrades the
>>>> performance too much as transactions access many tables. Just wanted
>>>> SELECT FOR UPDATE to prevent inserting records into a table with the
>>>> specified condition. It seems to be very typical situation when you have
>>>> a table like
>>>> CREATE TABLE timetable (start_ts TIMESTAMP, end_ts TIMESTAMP)
>>>> and before insertion in this table want to guarantee that there is no
>>>> overlapped time intervals there. So, first you need to lock the range in
>>>> the table, then to check if there are any records in this range.
>>>> In my case this table is the only for which I need such kind of locking.
>>>>          
>>> You can do that with exclusion constraints:
>>>
>>> http://www.postgresql.org/docs/9.0/static/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION)
>>>
>>>
>>> See also Depesz's blog post for a specific example on how to use it
>>> for time ranges:
>>>
>>> http://www.depesz.com/index.php/2010/01/03/waiting-for-8-5-exclusion-constraints/
>>>
>>>
>>> And Jeff Davis's blog post that uses the period data type instead of
>>> the hack to represent time ranges as boxes:
>>>
>>> http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/
>>>
>>>        
>> Exclusion constraints works only in simple cases. I need to check a
>> great amount of business rules to assure that the insertion is
>> possible. For example,
>> for a table with columns (start_ts TIMESTAMP, end_ts TIMESTAMP, room
>> BIGINT, visitor BIGINT, service BIGINT) it's not possible to have
>> overlapped intervals
>> for the same time and room, but different visitors. So, in terms of
>> exclusion constraints I need something like:
>>
>> room WITH =,
>> visitor WITH<>,
>> (start_ts, end_ts) WITH&&
>>
>> which seems to be impossible. Predicate locking provides more
>> flexible way to solve this problem.
>>      
> Did you actually try it?  It works just fine with a timestamp range.
>
> Cheers,
> David.
>    
Yes. It does not work on 9.0 when I add 'visitor WITH <>'.
ERROR:  failed to re-find tuple within index "overlapping"
HINT:  This may be because of a non-immutable index expression.

But even if it would work it would not help me anyways. Because my 
constraint is much more complex and depends on other tables, I cannot 
express it in terms of exclusion constraints.


Re: Predicate locking

From
"Kevin Grittner"
Date:
Vlad Arkhipov  wrote:
> But even if it would work it would not help me anyways. Because my
> constraint is much more complex and depends on other tables, I
> cannot express it in terms of exclusion constraints.
Are you aware of the changes to the SERIALIZABLE transaction
isolation level in the upcoming 9.1 release?
http://wiki.postgresql.org/wiki/Serializable
http://wiki.postgresql.org/wiki/SSI
If you can wait for that, it might be just what you're looking for.
-Kevin


Re: Predicate locking

From
Vlad Arkhipov
Date:
29.04.2011 21:18, Kevin Grittner wrote:
> Vlad Arkhipov  wrote:
>
>    
>> But even if it would work it would not help me anyways. Because my
>> constraint is much more complex and depends on other tables, I
>> cannot express it in terms of exclusion constraints.
>>      
>
> Are you aware of the changes to the SERIALIZABLE transaction
> isolation level in the upcoming 9.1 release?
>
> http://wiki.postgresql.org/wiki/Serializable
> http://wiki.postgresql.org/wiki/SSI
>
> If you can wait for that, it might be just what you're looking for.
>
> -Kevin
>
>    
I would not like to make the whole transaction serializable because of 
performance and concurrency reasons.


Re: Predicate locking

From
"Kevin Grittner"
Date:
> Vlad Arkhipov  wrote:
> 29.04.2011 21:18, Kevin Grittner wrote:
>> Vlad Arkhipov wrote:
>>> But even if it would work it would not help me anyways. Because
>>> my constraint is much more complex and depends on other tables, I
>>> cannot express it in terms of exclusion constraints.
>>
>> Are you aware of the changes to the SERIALIZABLE transaction
>> isolation level in the upcoming 9.1 release?
>>
>> http://wiki.postgresql.org/wiki/Serializable
>> http://wiki.postgresql.org/wiki/SSI
>>
>> If you can wait for that, it might be just what you're looking
>> for.
> I would not like to make the whole transaction serializable because
> of performance and concurrency reasons.
I'm curious -- what do you expect the performance and concurrency
impact to be?  You do realize that unlike SELECT FOR UPDATE,
SERIALIZABLE in PostgreSQL 9.1 will not cause any blocking beyond
what is there in READ COMMITTED, right?
This is not like SERIALIZABLE in any other database.  It is the first
production implementation of an innovative technique first published
in 2008.  The paper in which it was introduced won a best paper award
from ACM SIGMOD.  An ACM committee independently confirmed benchmarks
showing that performance was much better than blocking-based
SERIALIZABLE techniques, and very close to snapshot isolation for
many workloads.
Now, it might turn out that there's some reason it's not a good fit
for you, but don't assume that based on anything you know about any
*other* database's SERIALIZABLE isolation level; this is completely
different.
-Kevin


Re: Predicate locking

From
Vlad Arkhipov
Date:
30.04.2011 22:18, Kevin Grittner wrote:
>> Vlad Arkhipov  wrote:
>> 29.04.2011 21:18, Kevin Grittner wrote:
>>      
>>> Vlad Arkhipov wrote:
>>>        
>
>    
>>>> But even if it would work it would not help me anyways. Because
>>>> my constraint is much more complex and depends on other tables, I
>>>> cannot express it in terms of exclusion constraints.
>>>>          
>>> Are you aware of the changes to the SERIALIZABLE transaction
>>> isolation level in the upcoming 9.1 release?
>>>
>>> http://wiki.postgresql.org/wiki/Serializable
>>> http://wiki.postgresql.org/wiki/SSI
>>>
>>> If you can wait for that, it might be just what you're looking
>>> for.
>>>        
>
>    
>> I would not like to make the whole transaction serializable because
>> of performance and concurrency reasons.
>>      
>
> I'm curious -- what do you expect the performance and concurrency
> impact to be?  You do realize that unlike SELECT FOR UPDATE,
> SERIALIZABLE in PostgreSQL 9.1 will not cause any blocking beyond
> what is there in READ COMMITTED, right?
>    
Does 9.1beta contain the new SERIALIZABLE isolation level? If so, I can 
show you some concurrency issues.

First I created a table:
create table t (id bigint, value bigint);
insert into t values (1, 1);
insert into t values (2, 1);
create index t_idx on t(id);
Then I started two transactions.

1.
begin transaction;
set transaction isolation level serializable;
select * from t where id = 2; // and do some logic depending on this result
insert into t (id, value) values (-2, 1);

2.
begin transaction;
set transaction isolation level serializable;
select * from t where id = 3; // and do some logic depending on this result
insert into t (id, value) values (-3, 0);

Then I commited the both and the second one raised an exception:
ERROR: could not serialize access due to read/write dependencies among 
transactions
SQL state: 40001

However the second transaction does not access the records that the 
first one does. If I had predicate locks I could avoid this situation by 
locking the records with the specified id.


Re: Predicate locking

From
Dan Ports
Date:
On Tue, May 03, 2011 at 01:36:36PM +0900, Vlad Arkhipov wrote:
> Then I commited the both and the second one raised an exception:
> ERROR: could not serialize access due to read/write dependencies among 
> transactions
> SQL state: 40001
> 
> However the second transaction does not access the records that the 
> first one does. If I had predicate locks I could avoid this situation by 
> locking the records with the specified id.

Yes, you're right -- the current implementation of SSI only locks
indexes at the granularity of index pages. So although those
transactions don't actually access the same records, they're detected
as a conflict because they're on the same index page. Of course, on a
larger table this might be less likely to happen.

Getting this down to index-key and index-gap lock granularity is on
the todo list. Our focus in the initial SSI development has been to get
something that's functionally correct and stable before optimizing it.
I'm hoping to get some time to work on index-key locking for 9.2, as I
expect it will make a significant performance difference.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/


Re: Predicate locking

From
"Kevin Grittner"
Date:
Dan Ports  wrote:
> On Tue, May 03, 2011 at 01:36:36PM +0900, Vlad Arkhipov wrote:
>> Then I commited the both and the second one raised an exception:
>> ERROR: could not serialize access due to read/write dependencies
>> among transactions
>> SQL state: 40001
>>
>> However the second transaction does not access the records that
>> the first one does. If I had predicate locks I could avoid this
>> situation by locking the records with the specified id.
> 
> Yes, you're right -- the current implementation of SSI only locks
> indexes at the granularity of index pages. So although those
> transactions don't actually access the same records, they're
> detected as a conflict because they're on the same index page. Of
> course, on a larger table this might be less likely to happen.
> 
> Getting this down to index-key and index-gap lock granularity is on
> the todo list. Our focus in the initial SSI development has been to
> get something that's functionally correct and stable before
> optimizing it. I'm hoping to get some time to work on index-key
> locking for 9.2, as I expect it will make a significant performance
> difference.
Well, if Vlad had the feature he's requesting, he almost certainly
would have had blocking and then a deadlock (after the deadlock
checking delay expired) on this particular test. The reason is that
in a table with just a few very narrow rows, any cost based optimizer
will choose a table scan, to bypass the overhead of going through the
index to get to the single page.  As you suggested, on a table with
enough rows to make index use beneficial, it would have a decent
chance of avoiding serialization failures.  Index-gap locking would
not have helped in this example for the same reason.
The fact is, to get protections such as Vlad seeks using either
blocking-style locks or SSI you have a chance of serialization
failure (if you count deadlocks as a form of serialization failure,
which they are).  And I'm not aware of any production-quality
database product which doesn't use locks on accessed objects to do
this, so really Vlad's complaint here breaks down to the fact that
we're using a cost-based optimizer which will choose a table scan
rather than a rules-based optimizer which will go through an index to
get to a single page.  Going through the extra read for the indexed
access here would probably slow things down more, in a real load,
than the transaction retries.
If you look at the paper which won an ACM SIGMOD "best Paper" award
and the benchmarks in it which were independently confirmed by an ACM
committee, you'll see that the SSI techniques used in the PostgreSQL
9.1 SERIALIZABLE transaction level benchmarked as far faster (in far
more realistic tests than the one on this thread) and with better
concurrency than the blocking sort of locking Vlad is requesting.
In a way this argument reminds me of the recurring "I need hints!"
argument -- someone is insisting on doing things using the technique
to which they've become accustomed in other products and complaining
that PostgreSQL deals with the issue in a different way which is
arguably better.
But getting back to Dan's point -- I know he's eager to get the btree
locking down to "next key" granularity, which will reduce the false
positive rate, and I have my eye on a few other optimizations.  I
expect that in its current form serializable mode will perform better
than blocking-based techniques for many workloads, but I also expect
it will get even better over the next few releases.
-Kevin


Re: Predicate locking

From
Greg Smith
Date:
Dan Ports wrote:
> Yes, you're right -- the current implementation of SSI only locks
> indexes at the granularity of index pages. So although those
> transactions don't actually access the same records, they're detected
> as a conflict because they're on the same index page.

Let's try to demonstrate that with an update to Vlad's example.  Run 
this on a first client to generate the same type of table, but with an 
easy to vary number of rows in it:

drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,348) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
select * from t where id = 2;
insert into t (id, value) values (-2, 1);

Execute this on the second client:

begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;

And then return to the first one to run COMMIT.  I'm getting a 
serialization failure in that case.  However, if I increase the 
generate_series to create 349 rows (or more) instead, it works.  I don't 
see where it crosses a page boundary in table or index size going from 
348 to 349, so I'm not sure why I'm seeing a change happening there.  In 
both cases, there's only one non-header index block involved.

I don't think Vlad is being unreasonable here; he's provided a test case 
demonstrating the behavior he'd like to see, and shown it doesn't work 
as expected.  If we can prove that test does work on non-trivial sized 
tables, and that it only suffers from to-be-optimized broader locks than 
necessary, that would make a more compelling argument against the need 
for proper predicate locks.  I don't fully understand why this attempt I 
tried to do that is working the way it does though.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: Predicate locking

From
"Kevin Grittner"
Date:
Greg Smith <greg@2ndquadrant.com> wrote:
> However, if I increase the generate_series to create 349 rows (or
> more) instead, it works.

> I don't fully understand why this attempt I tried to do that is
> working the way it does though.
Check where the plan goes from a table scan to an indexed access.
Also look at what is showing for SIRead locks in pg_locks as you go.
Between those two bits of information, it should become apparent.
> I don't think Vlad is being unreasonable here; he's provided a
> test case demonstrating the behavior he'd like to see, and shown
> it doesn't work as expected.
... on a toy table with contrived values.  How different is this
from the often-asked question about why a query against a four-line
table is not using the index they expect, and how can we expect it
to scale if it doesn't?  I agree that it's not unreasonable for
someone to ask either question.  If my response falls short, I'm
game to try again.
-Kevin


Re: Predicate locking

From
Greg Smith
Date:
Kevin Grittner wrote:
>> I don't think Vlad is being unreasonable here; he's provided a
>> test case demonstrating the behavior he'd like to see, and shown
>> it doesn't work as expected.
>>     
>  
> ... on a toy table with contrived values.  How different is this
> from the often-asked question about why a query against a four-line
> table is not using the index they expect, and how can we expect it
> to scale if it doesn't?

It's not, but in that case I've been known to show someone that the 
behavior they're seeing doesn't happen on a larger table.  My point was 
just that no one has really done that here yet:  provided an example 
showing SSI serialization working as a substitute for predicate locking 
in this sort of use case.  I trust that the theory is sound here, and 
yet I'd still like to see that demonstrated.  This is why I launched 
into making a less trivial test case.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: Predicate locking

From
Greg Smith
Date:
Kevin Grittner wrote:
> Check where the plan goes from a table scan to an indexed access.
> Also look at what is showing for SIRead locks in pg_locks as you go.
> Between those two bits of information, it should become apparent.

OK, so this doesn't look to be an index lock related thing at all here.  
Updated test case does this to create the table and show some additional 
state:

drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,348) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
explain analyze select * from t where id = 2;
select pid,locktype,relation::regclass,page,tuple from pg_locks where 
mode='SIReadLock';
insert into t (id, value) values (-2, 1);
select pid,locktype,relation::regclass,page,tuple from pg_locks where 
mode='SIReadLock';

Do the same thing as before on the second process:

begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;

Then return to the first client to commit.  When I execute that with 348 
records, the case that fails, it looks like this:

gsmith=# explain analyze select * from t where id = 2;                                        QUERY 
PLAN                                        
--------------------------------------------------------------------------------------------Seq Scan on t
(cost=0.00..6.35rows=2 width=16) (actual 
 
time=0.106..0.286 rows=1 loops=1)  Filter: (id = 2)Total runtime: 0.345 ms
(3 rows)

gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks 
where mode='SIReadLock';pid  | locktype | relation | page | tuple
------+----------+----------+------+-------1495 | relation | t        |      |     

So it's actually grabbing a lock on the entire table in that situation.  
The other client does the same thing, and they collide with the 
described serialization failure.

The minute I try that with table that is 349 rows instead, it switches 
plans:

gsmith=# explain analyze select * from t where id = 2;                                                 QUERY 
PLAN                                                 
--------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on t  (cost=4.27..6.29 rows=2 width=16) (actual 
 
time=0.169..0.171 rows=1 loops=1)  Recheck Cond: (id = 2)  ->  Bitmap Index Scan on t_idx  (cost=0.00..4.27 rows=2
width=0)
 
(actual time=0.144..0.144 rows=1 loops=1)        Index Cond: (id = 2)Total runtime: 0.270 ms
(5 rows)

gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks 
where mode='SIReadLock';pid  | locktype | relation | page | tuple
------+----------+----------+------+-------1874 | page     | t_idx    |    1 |     1874 | tuple    | t        |    0 |
  2
 
(2 rows)

Grabbing a lock on the index page and the row, as Dan explained it 
would.  This actually eliminates this particular serialization failure 
altogether here though, even with these still on the same table and 
index page.

So the root problem with Vlad's test isn't the index lock at all; it's 
heavy locking from the sequential scan that's executing on the trivial 
cases.  If he expands his tests to use a larger amount of data, such 
that the plan switches to a realistic one, his results with the new 
serialization mode may very well be more satisfying.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: Predicate locking

From
"Kevin Grittner"
Date:
> Greg Smith  wrote:
> My point was just that no one has really done that here yet:
> provided an example showing SSI serialization working as a
> substitute for predicate locking in this sort of use case. I trust
> that the theory is sound here, and yet I'd still like to see that
> demonstrated.
Fair enough.  You can find examples where there are no false
positives or false negatives in the src/test/isolation directory in
any checkout of the source code.  Dan will be presenting the results
of a set of DBT-2 runs at PGCon, which might help.  I've been
gradually building up a set of examples at:
http://wiki.postgresql.org/wiki/SSI
That set is incomplete so far due to a scarcity of round tuits, but
if you want to suggest any particular tests, or add any (it's a
Wiki), I welcome the input.
With all that going on, and having mentioned that Wiki page on this
thread, I didn't think posting examples to this list was useful, but
could be persuaded otherwise.
-Kevin


Re: Predicate locking

From
Robert Haas
Date:
On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> ... on a toy table with contrived values.  How different is this
> from the often-asked question about why a query against a four-line
> table is not using the index they expect, and how can we expect it
> to scale if it doesn't?  I agree that it's not unreasonable for
> someone to ask either question.  If my response falls short, I'm
> game to try again.

I guess what surprises me about this a bit is that we have to
predicate-lock the whole table even if we're not actually looking at
all the rows.  I can sort of see why that's necessary, but I'm a bit
fuzzy on the details, and it does seem a little unfortunate in this
instance...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Predicate locking

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> ... on a toy table with contrived values.  How different is this
>> from the often-asked question about why a query against a
>> four-line table is not using the index they expect, and how can
>> we expect it to scale if it doesn't?  I agree that it's not
>> unreasonable for someone to ask either question.  If my response
>> falls short, I'm game to try again.
> 
> I guess what surprises me about this a bit is that we have to
> predicate-lock the whole table even if we're not actually looking
> at all the rows.  I can sort of see why that's necessary, but I'm
> a bit fuzzy on the details, and it does seem a little unfortunate
> in this instance...
Well, as far as I can tell, every production-quality database with
predicate locking models the predicates based on the rows actually
accessed.  Until now, that has been every popular SQL database
except PostgreSQL and Oracle.  That makes predicate locking
sensitive to the plan chosen.  It was because of this that I thought
it might be wise to include a bump to the seq_page_cost and/or
cpu_tuple_cost for plans inside a serializable transaction.  This
would encourage indexed access rather than a table scan at an
earlier threshold, thereby reducing false positive serialization
failures.  At the time the suggestion got a rather cool reception. 
Is it time to reconsider that?
On the other hand, as a shop where we're probably going to set
default_transaction_isolation = serializable in our postgresql.conf
files and include trigger checks that we're running at that level,
we can just boost those globally.  That may also work for others.
Once I wrap up these changes to our replication system I'm in the
middle of coding, I'll see about getting all our development
machines onto 9.1beta with default serialization and see how much
trouble our apps have.  Even on our development machines we run with
a copy of real data from a circuit court county database.
-Kevin


Re: Predicate locking

From
"Kevin Grittner"
Date:
I wrote:
> On the other hand, as a shop where we're probably going to set
> default_transaction_isolation = serializable in our
> postgresql.conf files and include trigger checks that we're
> running at that level, we can just boost those globally.  That may
> also work for others.
Just as a quick experiment I took Greg's example and tried it with
different costs, and thereby eliminated the false positives for this
particular example, all the way down to a 5 row table!:
set random_page_cost = 0.2;
set cpu_tuple_cost = 0.05;
drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,5) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
select * from t where id = 2;
insert into t (id, value) values (-2, 1);

Execute this on the second client:

set random_page_cost = 0.2;
set cpu_tuple_cost = 0.05;
begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;
Then go back to the first client and commit -- no problem.
I make no representation that these are great numbers for any
particular workload; it's just meant as a quick illustration that
these behaviors are tunable.  With serializable transactions, it
probably is reasonable to figure that the cost of a sequential scan
or of reading a tuple includes the cost of some percentage of
transactions being rolled back and restarted.
-Kevin