Thread: Common case not at all clear

Common case not at all clear

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/transaction-iso.html
Description:

For all this documentation, it is completely unclear how to handle the most
common, simple case.  I.e.

Select balance into :bal  ...where key =123;
Update set balance = :bal+100 where key = 100

The discussion of read committed for Updates is misleading, I am pretty sure
it will fail if the select is in a different statement, a common case.

For Oracle, I think that by default a Select will return values at the
beginning of a transaction, Select For Update will return the read committed
value, and Select For Update will wait until conflicting transactions
complete.  So the answer is that the first Select would be a Select For
Update, which should be the normal pattern to be safe (with primary key
access) and minimize deadlocks.

Is that how PostgreSql works?  Is that the generally recommended pattern?
Impossible to tell from the docs as written.  MVCC really relies on Select
For Update to work for transactions, I think.

Re: Common case not at all clear

From
Bruce Momjian
Date:
On Wed, Jul 28, 2021 at 03:48:04AM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/13/transaction-iso.html
> Description:
> 
> For all this documentation, it is completely unclear how to handle the most
> common, simple case.  I.e.
> 
> Select balance into :bal  ...where key =123;
> Update set balance = :bal+100 where key = 100
> 
> The discussion of read committed for Updates is misleading, I am pretty sure
> it will fail if the select is in a different statement, a common case.
> 
> For Oracle, I think that by default a Select will return values at the
> beginning of a transaction, Select For Update will return the read committed
> value, and Select For Update will wait until conflicting transactions
> complete.  So the answer is that the first Select would be a Select For
> Update, which should be the normal pattern to be safe (with primary key
> access) and minimize deadlocks.
> 
> Is that how PostgreSql works?  Is that the generally recommended pattern? 
> Impossible to tell from the docs as written.  MVCC really relies on Select
> For Update to work for transactions, I think.

What documentation text is unclear?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Common case not at all clear

From
"David G. Johnston"
Date:
On Thu, Jul 29, 2021 at 8:04 AM PG Doc comments form <noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/transaction-iso.html
Description:

For all this documentation, it is completely unclear how to handle the most
common, simple case.  I.e.

Select balance into :bal  ...where key =123;
Update set balance = :bal+100 where key = 100

That isn't SQL, or any syntax that PostgreSQL supports that I know of.
 
The discussion of read committed for Updates is misleading, I am pretty sure
it will fail if the select is in a different statement, a common case.

I don't believe it is possible for it to fail - or serializable is going to actually result in errors.

Is that how PostgreSql works?  Is that the generally recommended pattern?
Impossible to tell from the docs as written.

That part of the issue with the documentation, they tend to simply say how things work and let the user decide.  Recommendations are uncommon.
 
MVCC really relies on Select
For Update to work for transactions, I think.

IIUC it is basically the difference between optimistic and pessimistic concurrency.  You get to choose which cost/benefit package you want.

My impression is that if you are getting that deep into the bowels of concurrency you should learn and use the serializable isolation level to ensure a consistent linear flow without having to really deal with manual locking directly.

David J.

Re: Common case not at all clear

From
Anthony Berglas
Date:
My point is that while I can follow the academic style discussion, most of my colleagues could not.  They just need to have a clear idea of how to handle the common case, which is to use a database using some programming language. 

Early on, for Read Committed, it should discuss and ideally provide an example of Select For Update, followed by an Update, together with a discussion of why the "For Update" is important.  Indeed essential.  That is totally unclear unless you already understand it.

Yes, it does mention an Update statement, but that is a less common approach.  It needs to mention how and why to use Select For Update clearly and early.  Without being tangled up in all the other more esoteric considerations.


Select balance into :bal  ...where key =123; ... 
Update set balance = :bal+100 where key = 100

That isn't SQL, or any syntax that PostgreSQL supports that I know of.

OK, so I have omitted the table name which is not important.  And the :bal is a traditional notation used in APIs, no ":" for postgresql triggers etc.  But the meaning should be clear.  Retrieve a value and update it in a seperate statement.

 
The discussion of read committed for Updates is misleading, I am pretty sure
it will fail if the select is in a different statement, a common case.

I don't believe it is possible for it to fail - or serializable is going to actually result in errors.

"Fail" meaning roll back.  
 

Is that how PostgreSql works?  Is that the generally recommended pattern?
Impossible to tell from the docs as written.

That part of the issue with the documentation, they tend to simply say how things work and let the user decide.  Recommendations are uncommon.

Well, in this case they are rather important.
 
MVCC really relies on Select
For Update to work for transactions, I think.

IIUC it is basically the difference between optimistic and pessimistic concurrency.  You get to choose which cost/benefit package you want.

My impression is that if you are getting that deep into the bowels of concurrency you should learn and use the serializable isolation level to ensure a consistent linear flow without having to really deal with manual locking directly.

I am more interested in people that do not go into the depths getting the really simple things right.  Not really possible from these docs.

For me, the only way to understand these docs is to do lots of little experiments.

Incidentally, I think that in the default mode (Read Committed?), Oracle gives you the pre transaction snapshot values for a Select but the currently committed values for Select For Update.  Those semantics seem to work pretty well in practice.  

From my understanding and experiments those semantics cannot be achieved with Postgresql.

Anthony
 

David J.

Re: Common case not at all clear

From
"David G. Johnston"
Date:
On Thu, Jul 29, 2021 at 8:44 PM Anthony Berglas <anthony@berglas.org> wrote:
My point is that while I can follow the academic style discussion, most of my colleagues could not.  They just need to have a clear idea of how to handle the common case, which is to use a database using some programming language.

On the whole I believe that we are both mostly correct in our observations.  I would be happy to review a change to this section of the documentation - whether done surgically or in a larger patch to make this resource much more accessible to users.  I don't plan to take on the activity of putting together an initial patch for consideration.  I will observe that there haven't been many questions or comments pertaining to this material hitting the mailing lists; though why that may be is difficult to guess.  Maybe it's covered better in books so people just don't use this as a resource for the topic?  In any case, the ability for someone that knows this material well, but is a coder and not a teacher, to write up something better and more accessible, and judging this is be a better use of their time than some other activity, is fairly low.  It does seem, however, like an excellent project for someone who benefits from the open source nature of the project and is looking for a way to both give back to the community and learn a topic more fully at the same time.

David J.



Re: Common case not at all clear

From
Anthony Berglas
Date:
OK, I'll put something for you to review.  Most programmers simply ignore locking and wonder why it sometimes goes wrong.

On Fri, Jul 30, 2021 at 2:22 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Jul 29, 2021 at 8:44 PM Anthony Berglas <anthony@berglas.org> wrote:
My point is that while I can follow the academic style discussion, most of my colleagues could not.  They just need to have a clear idea of how to handle the common case, which is to use a database using some programming language.

On the whole I believe that we are both mostly correct in our observations.  I would be happy to review a change to this section of the documentation - whether done surgically or in a larger patch to make this resource much more accessible to users.  I don't plan to take on the activity of putting together an initial patch for consideration.  I will observe that there haven't been many questions or comments pertaining to this material hitting the mailing lists; though why that may be is difficult to guess.  Maybe it's covered better in books so people just don't use this as a resource for the topic?  In any case, the ability for someone that knows this material well, but is a coder and not a teacher, to write up something better and more accessible, and judging this is be a better use of their time than some other activity, is fairly low.  It does seem, however, like an excellent project for someone who benefits from the open source nature of the project and is looking for a way to both give back to the community and learn a topic more fully at the same time.

David J.



Re: Common case not at all clear

From
Anthony Berglas
Date:
Hello David,

I have attached a proposed doc update that makes the problem clearer.  I think that this is important because if people do not understand it they will write buggy code and then blame Postgresql for losing updates, which is totally unacceptable.  So please do action this.  I have tested and confirm that the behaviour is as I specify.

There is more that could be said, e.g. why to avoid SHARE locks.   But I think that that is enough.

(I personally think that the default semantics are very dubious.  Either Repeatable Read should be the default mode, or updating a row already read but changed should produce an error for Read Committed.  The goal is not to satisfy academic rules but produce something that works safely in practice.  But I am sure there has been much discussion about that elsewhere.)

On Fri, Jul 30, 2021 at 6:03 PM Anthony Berglas <anthony@berglas.org> wrote:
OK, I'll put something for you to review.  Most programmers simply ignore locking and wonder why it sometimes goes wrong.

On Fri, Jul 30, 2021 at 2:22 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Jul 29, 2021 at 8:44 PM Anthony Berglas <anthony@berglas.org> wrote:
My point is that while I can follow the academic style discussion, most of my colleagues could not.  They just need to have a clear idea of how to handle the common case, which is to use a database using some programming language.

On the whole I believe that we are both mostly correct in our observations.  I would be happy to review a change to this section of the documentation - whether done surgically or in a larger patch to make this resource much more accessible to users.  I don't plan to take on the activity of putting together an initial patch for consideration.  I will observe that there haven't been many questions or comments pertaining to this material hitting the mailing lists; though why that may be is difficult to guess.  Maybe it's covered better in books so people just don't use this as a resource for the topic?  In any case, the ability for someone that knows this material well, but is a coder and not a teacher, to write up something better and more accessible, and judging this is be a better use of their time than some other activity, is fairly low.  It does seem, however, like an excellent project for someone who benefits from the open source nature of the project and is looking for a way to both give back to the community and learn a topic more fully at the same time.

David J.



Attachment

Re: Common case not at all clear

From
"David G. Johnston"
Date:
On Sun, Aug 1, 2021 at 11:35 PM Anthony Berglas <anthony@berglas.org> wrote:
I have attached a proposed doc update that makes the problem clearer.  I think that this is important because if people do not understand it they will write buggy code and then blame Postgresql for losing updates, which is totally unacceptable.  So please do action this.  I have tested and confirm that the behaviour is as I specify.

That really isn't a good solution though...a better one is to modify the update command to be:

UPDATE products SET quantity_on_hand = qoh - 50 WHERE quantity_on_hand = qoh;

Or, even better:

SELECT ... last_updated INTO lastupdatetimestamp, ...;

UPDATE products SET quantity_on_hand = qoh - 50 AND last_updated = lastupdatetimestamp;

Then the application needs to simply check for a zero record update and, if that happens, decide how it wants to deal with the fact that the data has changed out from under it.

This is superior to waiting an indeterminate amount of time holding a FOR UPDATE lock in an open transaction.

I would still expand on the FOR UPDATE option as you suggest.

This is still just discussion though, someone will need to convert this into a proper doc patch that can be built, added to the commitfest, reviewed, and ultimately committed.  IMO it is not worth going to the trouble of making this all HTML-friendly as your patch did.  Just stick to plain text discussion in the email body if you aren't going to write a patch in the sgml source language and present it as a diff.

David J.

Re: Common case not at all clear

From
Anthony Berglas
Date:
Hello David,

You are talking about optimistic locking, commonly used for web applications where there is no transaction kept open during user think time.  A COMMIT between the SELECT and the UPDATE.

This is also what was needed for traditional MySql running only in AutoCommit mode.  It requires no locking at all other than atomic statements.  And the end user has to re-enter the transaction if it fails due to a conflict.

While processing something on the server however, it is nice to be able to use a proper database like Postgresql that does have locking during a transaction.  Failures are more difficult to deal with on a server where you cannot just throw failures back to a user, transactions are more complex, and a short wait for a lock is generally not an issue (there should always be a generous timeout).

(If you use SET quantity_on_hand = quantity_on_hand - 50 then you do not even need the optimistic lock.  But that is rarely done in practice using an Object Relational Mapping library.)

Once upon a time, tools like Oracle Forms kept database locks during user think time so locking strategies were very important.  But client/server web apps cannot work that way, so databases like MySql were useable.  But there is still some server processing to be done, so while locking is less important, it is still worth doing properly.  And more importantly it is very important that people do not use a SELECT without a FOR UPDATE and introduce subtle, unreproducible threading errors.

So please do have the update (or similar) inserted.  If you wanted to also talk about optimistic locking that would be fine, but probably not necessary.

Thanks,

Anthony

P.S.  Do you know if Postgresql Guarantees that all timestamps are distinct, even if they occur within the same clock tick?  (i.e. does it run the clock forward).  I have another reason to know that.  Using clocks is iffy for synchronization.


On Tue, Aug 3, 2021 at 1:26 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Aug 1, 2021 at 11:35 PM Anthony Berglas <anthony@berglas.org> wrote:
I have attached a proposed doc update that makes the problem clearer.  I think that this is important because if people do not understand it they will write buggy code and then blame Postgresql for losing updates, which is totally unacceptable.  So please do action this.  I have tested and confirm that the behaviour is as I specify.

That really isn't a good solution though...a better one is to modify the update command to be:

UPDATE products SET quantity_on_hand = qoh - 50 WHERE quantity_on_hand = qoh;

Or, even better:

SELECT ... last_updated INTO lastupdatetimestamp, ...;

UPDATE products SET quantity_on_hand = qoh - 50 AND last_updated = lastupdatetimestamp;

Then the application needs to simply check for a zero record update and, if that happens, decide how it wants to deal with the fact that the data has changed out from under it.

This is superior to waiting an indeterminate amount of time holding a FOR UPDATE lock in an open transaction.

I would still expand on the FOR UPDATE option as you suggest.

This is still just discussion though, someone will need to convert this into a proper doc patch that can be built, added to the commitfest, reviewed, and ultimately committed.  IMO it is not worth going to the trouble of making this all HTML-friendly as your patch did.  Just stick to plain text discussion in the email body if you aren't going to write a patch in the sgml source language and present it as a diff.

David J.

Re: Common case not at all clear

From
"David G. Johnston"
Date:
On Mon, Aug 2, 2021 at 4:30 PM Anthony Berglas <anthony@berglas.org> wrote:
You are talking about optimistic locking, commonly used for web applications where there is no transaction kept open during user think time.

Yes, I said as much a couple of emails ago.
 
And more importantly it is very important that people do not use a SELECT without a FOR UPDATE and introduce subtle, unreproducible threading errors.

Ok.  This does get covered, though I agreed earlier that there seems to be room for improvement.

So please do have the update (or similar) inserted.  If you wanted to also talk about optimistic locking that would be fine, but probably not necessary.

Just to be clear - this isn't going to be up to me (at least, not anytime soon).  First a correctly written patch needs to be produced.  If/when someone decides to do that we can move onto getting it applied to the source code (which is done by a committer, which also is not me).
P.S.  Do you know if Postgresql Guarantees that all timestamps are distinct, even if they occur within the same clock tick?  (i.e. does it run the clock forward).  I have another reason to know that.  Using clocks is iffy for synchronization.

I've never seen such a guarantee documented...but the details involved are beyond my experience with the code.

David J.

Re: Common case not at all clear

From
Anthony Berglas
Date:
Hello David,

Thanks for that, I had thought that you were a committer.  Sounds like it might all be a bit too difficult.  

Anthony

On Tue, Aug 3, 2021 at 9:39 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Aug 2, 2021 at 4:30 PM Anthony Berglas <anthony@berglas.org> wrote:
You are talking about optimistic locking, commonly used for web applications where there is no transaction kept open during user think time.

Yes, I said as much a couple of emails ago.
 
And more importantly it is very important that people do not use a SELECT without a FOR UPDATE and introduce subtle, unreproducible threading errors.

Ok.  This does get covered, though I agreed earlier that there seems to be room for improvement.

So please do have the update (or similar) inserted.  If you wanted to also talk about optimistic locking that would be fine, but probably not necessary.

Just to be clear - this isn't going to be up to me (at least, not anytime soon).  First a correctly written patch needs to be produced.  If/when someone decides to do that we can move onto getting it applied to the source code (which is done by a committer, which also is not me).
P.S.  Do you know if Postgresql Guarantees that all timestamps are distinct, even if they occur within the same clock tick?  (i.e. does it run the clock forward).  I have another reason to know that.  Using clocks is iffy for synchronization.

I've never seen such a guarantee documented...but the details involved are beyond my experience with the code.

David J.

Re: Common case not at all clear

From
Anthony Berglas
Date:
Hello David,

I note that nothing has happened.

In future I would suggest that you simply tell people that document updates are not really welcome.  Otherwise you waste people's time.

The locking issue is actually quite serious.

Cheers,

Anthony

On Tue, Aug 3, 2021 at 10:01 AM Anthony Berglas <anthony@berglas.org> wrote:
Hello David,

Thanks for that, I had thought that you were a committer.  Sounds like it might all be a bit too difficult.  

Anthony

On Tue, Aug 3, 2021 at 9:39 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Aug 2, 2021 at 4:30 PM Anthony Berglas <anthony@berglas.org> wrote:
You are talking about optimistic locking, commonly used for web applications where there is no transaction kept open during user think time.

Yes, I said as much a couple of emails ago.
 
And more importantly it is very important that people do not use a SELECT without a FOR UPDATE and introduce subtle, unreproducible threading errors.

Ok.  This does get covered, though I agreed earlier that there seems to be room for improvement.

So please do have the update (or similar) inserted.  If you wanted to also talk about optimistic locking that would be fine, but probably not necessary.

Just to be clear - this isn't going to be up to me (at least, not anytime soon).  First a correctly written patch needs to be produced.  If/when someone decides to do that we can move onto getting it applied to the source code (which is done by a committer, which also is not me).
P.S.  Do you know if Postgresql Guarantees that all timestamps are distinct, even if they occur within the same clock tick?  (i.e. does it run the clock forward).  I have another reason to know that.  Using clocks is iffy for synchronization.

I've never seen such a guarantee documented...but the details involved are beyond my experience with the code.

David J.



--
Anthony@Berglas.org
+61 4 4838 8874
Just because it is possible to push twigs along the ground with ones nose
does not necessarily mean that that is the best way to collect firewood.

Re: Common case not at all clear

From
Peter Geoghegan
Date:
On Thu, Jul 29, 2021 at 8:04 AM PG Doc comments form
<noreply@postgresql.org> wrote:
> For all this documentation, it is completely unclear how to handle the most
> common, simple case.  I.e.
>
> Select balance into :bal  ...where key =123;
> Update set balance = :bal+100 where key = 100

I don't think that that's the most common or simple case.

> The discussion of read committed for Updates is misleading, I am pretty sure
> it will fail if the select is in a different statement, a common case.

That's true.

> For Oracle, I think that by default a Select will return values at the
> beginning of a transaction, Select For Update will return the read committed
> value, and Select For Update will wait until conflicting transactions
> complete.

I don't think that's true. I believe that the main difference between
READ COMMITTED in Oracle is conflict handling: If an UPDATE needs to
wait for another UPDATE, the entire statement will be rolled back
before it is retried. While Postgres does something...more
complicated.

Both systems use a snapshot per statement in READ COMMITTED. And so
any differences between the two systems here don't seem relevant.

> So the answer is that the first Select would be a Select For
> Update, which should be the normal pattern to be safe (with primary key
> access) and minimize deadlocks.
>
> Is that how PostgreSql works?  Is that the generally recommended pattern?
> Impossible to tell from the docs as written.  MVCC really relies on Select
> For Update to work for transactions, I think.

I suggest using a higher isolation level. Ideally SERIALIZABLE.

-- 
Peter Geoghegan



Re: Common case not at all clear

From
"David G. Johnston"
Date:
On Sunday, September 19, 2021, Anthony Berglas <anthony@berglas.org> wrote:
I note that nothing has happened.

In future I would suggest that you simply tell people that document updates are not really welcome.  Otherwise you waste people's time.

That isn’t generally true, and as I am not an Oracle my ability to predict which patches are or are not likely to be committed is poor.  Better to try and fail, the effort is usually beneficial and often not that time consuming.

David J.