Thread: Reasons to reorder results *within* a transaction?
I *know* that without an ORDER BY clause, the database is free to reorder results in any way it likes. However, I recentlyran into a case where the *SAME* query was returning results in a different order *within* a single transaction,which surprised me (we fixed it by adding the missing ORDER BY). I would assume that once a transaction obtainsa snapshot, all its read operations would return the same results. Could concurrent updates in other transactions "move" tuples in the underlying heap files? Could the query optimizer decideto execute a query two different ways for some reason (e.g. statistics collected after the first query?). Clearly theway Postgres works internally is a bit different from what I assumed. Any references to docs I should read would be appreciated. Roughly speaking, the schema is something like: create table group_record (id integer primary key, group_id integer, data text); (plus other tables) The transaction is something like: begin; select * from group_record where group_id = x; … reads and writes to/from other tables … select * from group_record where group_id = x; -- returns results in a different order from first time commit; Needless to say, this is one of those fun rare bugs that appeared occasionally in the logs in our production server, andwe couldn't seem to reproduce it in development. Thanks! Evan Jones -- Work: https://www.mitro.co/ Personal: http://evanjones.ca/
Evan Jones <ej@evanjones.ca> wrote: > I *know* that without an ORDER BY clause, the database is free to reorder > results in any way it likes. However, I recently ran into a case where the > *SAME* query was returning results in a different order *within* a single > transaction, which surprised me (we fixed it by adding the missing ORDER BY). I > would assume that once a transaction obtains a snapshot, all its read operations > would return the same results. That is not a valid assumption. For one thing, the default transaction isolation level is read committed, and at that isolation level you are not guaranteed to even get the same *rows* running the same query twice within the same transaction, much less in the same order. At any isolation level statistics could change, resulting in a different plan on two successive executions. Even running the same plan using the same snapshot you could get a different order if you have not specified one with ORDER BY. As one example, a sequential scan of a table won't necessarily start at the beginning of the heap -- if there is already a sequential scan in progress for another process, the new one will start at the point the other one is at, and "wrap around". This can save a lot of physical disk access, resulting in better performance. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Oct 4, 2013, at 13:03 , Kevin Grittner <kgrittn@ymail.com> wrote: > That is not a valid assumption. For one thing, the default > transaction isolation level is read committed, and at that > isolation level you are not guaranteed to even get the same *rows* > running the same query twice within the same transaction, much less > in the same order. I guess I should have mentioned that we are using serializable snapshot isolation (thanks for that, BTW!) > if there is already a sequential > scan in progress for another process, the new one will start at the > point the other one is at, and "wrap around". This can save a lot > of physical disk access, resulting in better performance. OH! This totally, totally makes sense. This is *exactly* the kind of thing I was looking for, and I'll bet that is exactlywhat was happening in our case. The table is pretty small, so Postgres explain says it is doing a full table scanfor this query. Thanks for the speedy insightful answer! This is yet another example of something that when tracking down the bug, we knew immediately it was incorrect and probablywrong, but sometimes you don't notice these things the first time. The joys of software. Evan -- Work: https://www.mitro.co/ Personal: http://evanjones.ca/
Evan Jones wrote: > I *know* that without an ORDER BY clause, the database is free to > reorder results in any way it likes. However, I recently ran into a > case where the *SAME* query was returning results in a different order > *within* a single transaction, which surprised me (we fixed it by > adding the missing ORDER BY). I would assume that once a transaction > obtains a snapshot, all its read operations would return the same > results. Yes, the same results -- not necessarily in the same order. For instance, a synchronized scan might start at a different point of the table. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services