Thread: READ COMMITTED vs. index-only scans

READ COMMITTED vs. index-only scans

From

Jacek Kołodziej

Date:

17 January 2018, 19:08:23

Hello everyone,

this is my first post here and I'm starting with asking a question about data consistency between two consecutive SELECTs using PostgreSQL 9.6.

I'm sorry if that's something that was already discussed - I couldn't find it either in archives, nor in _general internet_. If it is, I would appreciate pointing it out.

I have an "append-only" events table - only INSERT and SELECT queries are issued to it. It has an integer (sequence) ID as a primary key.

I'm issuing following two queries (within the same READ COMMITTED transaction) to that table:

- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query

- second query (B): SELECT id, ... FROM events WHERE id > "min ID" AND id <= "max ID" AND ...

- I won't dig into what's "min ID" but you can assume it's at most 100 less than "max ID"

Concurrently, rows are being added to that table.

Please note that there's enough data in the index for the first query (A) to perform an index-only scan. I'm not sure if that's relevant but "B" query does an index scan because of other conditions I haven't mentioned but still needs to fetch data from the table's heap.

Here's what happening to me: the "A" query occasionally (in my case: on the order of tenths per day) returns an ID _higher_ than any ID present in second query's result (other conditions I haven't specified do _not_ filter any more rows than "id <= max ID") - as if some entries were visible for the first query, but not for the second one. This is an inconsistency that is very problematic for me.

All I can think of is that it might be caused by the index-only-scan and READ COMMITTED transaction isolation level but from what I gather from documentation, it should not be possible due to the use of visibility map[0][1][2].

And yet it happens, likely for some other reason but I can't think of any. I've tried switching transaction isolation level to REPEATABLE READ (on the basis that it's an faulty phenomenon occurring during to some bug) but that didn't help.

Where am I wrong? What am I missing? What information may I provide to help with investigating this?

[0] https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html

[1] https://www.postgresql.org/docs/9.6/static/storage-vm.html

[2] https://wiki.postgresql.org/wiki/Index-only_scans

Kind regards,
Jacek Kołodziej
http://kolodziejj.info

Re: READ COMMITTED vs. index-only scans

From

Melvin Davidson

Date:

17 January 2018, 20:30:05

On Wed, Jan 17, 2018 at 8:08 AM, Jacek Kołodziej <kolodziejj@gmail.com> wrote:

Hello everyone,

this is my first post here and I'm starting with asking a question about data consistency between two consecutive SELECTs using PostgreSQL 9.6.

I'm sorry if that's something that was already discussed - I couldn't find it either in archives, nor in _general internet_. If it is, I would appreciate pointing it out.

I have an "append-only" events table - only INSERT and SELECT queries are issued to it. It has an integer (sequence) ID as a primary key.

I'm issuing following two queries (within the same READ COMMITTED transaction) to that table:
- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query
- second query (B): SELECT id, ... FROM events WHERE id > "min ID" AND id <= "max ID" AND ...
- I won't dig into what's "min ID" but you can assume it's at most 100 less than "max ID"

Concurrently, rows are being added to that table.

Please note that there's enough data in the index for the first query (A) to perform an index-only scan. I'm not sure if that's relevant but "B" query does an index scan because of other conditions I haven't mentioned but still needs to fetch data from the table's heap.

Here's what happening to me: the "A" query occasionally (in my case: on the order of tenths per day) returns an ID _higher_ than any ID present in second query's result (other conditions I haven't specified do _not_ filter any more rows than "id <= max ID") - as if some entries were visible for the first query, but not for the second one. This is an inconsistency that is very problematic for me.

All I can think of is that it might be caused by the index-only-scan and READ COMMITTED transaction isolation level but from what I gather from documentation, it should not be possible due to the use of visibility map[0][1][2].

And yet it happens, likely for some other reason but I can't think of any. I've tried switching transaction isolation level to REPEATABLE READ (on the basis that it's an faulty phenomenon occurring during to some bug) but that didn't help.

Where am I wrong? What am I missing? What information may I provide to help with investigating this?

[0] https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
[1] https://www.postgresql.org/docs/9.6/static/storage-vm.html
[2] https://wiki.postgresql.org/wiki/Index-only_scans

--
Kind regards,
Jacek Kołodziej
http://kolodziejj.info

>- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query

Just a suggestion. The first query is not really needed.

You can simply do:

second query (B):

SELECT id
, ...
FROM events
WHERE id > MIN(ID)
AND id <= MAX(ID)
AND ...

See https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

MAX and MIN functions

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: READ COMMITTED vs. index-only scans

From

Laurenz Albe

Date:

17 January 2018, 21:17:06

Melvin Davidson wrote:
> You can simply do:
> second query (B): 
> 
> SELECT id
> , ... 
>   FROM events 
>  WHERE id > MIN(ID) 
>    AND id <= MAX(ID)
>    AND ...

You probably meant to use a window function there:

  max(id) OVER ()
  min(id) OVER ()

Yours,
Laurenz Albe

Re: READ COMMITTED vs. index-only scans

From

Francisco Olarte

Date:

17 January 2018, 21:21:53

On Wed, Jan 17, 2018 at 3:30 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
....
> Just a suggestion. The first query is not really needed.
> You can simply do:
> second query (B):
>
> SELECT id
> , ...
>   FROM events
>  WHERE id > MIN(ID)
>    AND id <= MAX(ID)
>    AND ...
>
> See https://www.postgresql.org/docs/9.6/static/functions-aggregate.html
> MAX and MIN functions

Are you sure? ( http://sqlfiddle.com/#!17/7805a/3 )

In fact your link, in the first paragraph, points to
https://www.postgresql.org/docs/9.6/static/tutorial-agg.html which,
near the end ( 2nd paragraph from the end, I think its called next to
last in English, but not sure if penultimate is the correct word, like
in Spanish ), states:

"Thus, the WHERE clause must not contain aggregate functions; it makes
no sense to try to use an aggregate to determine which rows will be
inputs to the aggregates."

Francisco Olarte.

Re: READ COMMITTED vs. index-only scans

From

Jacek Kołodziej

Date:

17 January 2018, 21:45:11

17.01.2018 3:30 PM "Melvin Davidson" <melvin6925@gmail.com> napisał(a):

On Wed, Jan 17, 2018 at 8:08 AM, Jacek Kołodziej <kolodziejj@gmail.com> wrote:
Hello everyone,

this is my first post here and I'm starting with asking a question about data consistency between two consecutive SELECTs using PostgreSQL 9.6.

I'm sorry if that's something that was already discussed - I couldn't find it either in archives, nor in _general internet_. If it is, I would appreciate pointing it out.

I have an "append-only" events table - only INSERT and SELECT queries are issued to it. It has an integer (sequence) ID as a primary key.

I'm issuing following two queries (within the same READ COMMITTED transaction) to that table:
- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query
- second query (B): SELECT id, ... FROM events WHERE id > "min ID" AND id <= "max ID" AND ...
- I won't dig into what's "min ID" but you can assume it's at most 100 less than "max ID"

Concurrently, rows are being added to that table.

Please note that there's enough data in the index for the first query (A) to perform an index-only scan. I'm not sure if that's relevant but "B" query does an index scan because of other conditions I haven't mentioned but still needs to fetch data from the table's heap.

Here's what happening to me: the "A" query occasionally (in my case: on the order of tenths per day) returns an ID _higher_ than any ID present in second query's result (other conditions I haven't specified do _not_ filter any more rows than "id <= max ID") - as if some entries were visible for the first query, but not for the second one. This is an inconsistency that is very problematic for me.

All I can think of is that it might be caused by the index-only-scan and READ COMMITTED transaction isolation level but from what I gather from documentation, it should not be possible due to the use of visibility map[0][1][2].

And yet it happens, likely for some other reason but I can't think of any. I've tried switching transaction isolation level to REPEATABLE READ (on the basis that it's an faulty phenomenon occurring during to some bug) but that didn't help.

Where am I wrong? What am I missing? What information may I provide to help with investigating this?

[0] https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
[1] https://www.postgresql.org/docs/9.6/static/storage-vm.html
[2] https://wiki.postgresql.org/wiki/Index-only_scans

--
Kind regards,
Jacek Kołodziej
http://kolodziejj.info

>- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query

Just a suggestion. The first query is not really needed.
You can simply do:
second query (B):

SELECT id
, ...
FROM events
WHERE id > MIN(ID)
AND id <= MAX(ID)
AND ...

See https://www.postgresql.org/docs/9.6/static/functions-aggregate.html
MAX and MIN functions

Hi Melvin, thank you for the suggestion. Unfortunately it won't do in my case. Sorry for not providing enough context in the first place.

After making the first query (A), I'm iterating over that table (with LIMIT 100 and increasing OFFSET) - using a query "B" - until another condition is met; overall, code is supposed to gather a number of rows from the table. I'm also using the "max ID" for another purpose.

--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: READ COMMITTED vs. index-only scans

From

Melvin Davidson

Date:

17 January 2018, 22:04:24

On Wed, Jan 17, 2018 at 10:45 AM, Jacek Kołodziej <kolodziejj@gmail.com> wrote:

17.01.2018 3:30 PM "Melvin Davidson" <melvin6925@gmail.com> napisał(a):

On Wed, Jan 17, 2018 at 8:08 AM, Jacek Kołodziej <kolodziejj@gmail.com> wrote:
Hello everyone,

this is my first post here and I'm starting with asking a question about data consistency between two consecutive SELECTs using PostgreSQL 9.6.

I'm sorry if that's something that was already discussed - I couldn't find it either in archives, nor in _general internet_. If it is, I would appreciate pointing it out.

I have an "append-only" events table - only INSERT and SELECT queries are issued to it. It has an integer (sequence) ID as a primary key.

I'm issuing following two queries (within the same READ COMMITTED transaction) to that table:
- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query
- second query (B): SELECT id, ... FROM events WHERE id > "min ID" AND id <= "max ID" AND ...
- I won't dig into what's "min ID" but you can assume it's at most 100 less than "max ID"

Concurrently, rows are being added to that table.

Please note that there's enough data in the index for the first query (A) to perform an index-only scan. I'm not sure if that's relevant but "B" query does an index scan because of other conditions I haven't mentioned but still needs to fetch data from the table's heap.

Here's what happening to me: the "A" query occasionally (in my case: on the order of tenths per day) returns an ID _higher_ than any ID present in second query's result (other conditions I haven't specified do _not_ filter any more rows than "id <= max ID") - as if some entries were visible for the first query, but not for the second one. This is an inconsistency that is very problematic for me.

All I can think of is that it might be caused by the index-only-scan and READ COMMITTED transaction isolation level but from what I gather from documentation, it should not be possible due to the use of visibility map[0][1][2].

And yet it happens, likely for some other reason but I can't think of any. I've tried switching transaction isolation level to REPEATABLE READ (on the basis that it's an faulty phenomenon occurring during to some bug) but that didn't help.

Where am I wrong? What am I missing? What information may I provide to help with investigating this?

[0] https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
[1] https://www.postgresql.org/docs/9.6/static/storage-vm.html
[2] https://wiki.postgresql.org/wiki/Index-only_scans

--
Kind regards,
Jacek Kołodziej
http://kolodziejj.info

>- first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1 - I'm saving the result as a "max ID" for the second query

Just a suggestion. The first query is not really needed.
You can simply do:
second query (B):

SELECT id
, ...
FROM events
WHERE id > MIN(ID)
AND id <= MAX(ID)
AND ...

See https://www.postgresql.org/docs/9.6/static/functions-aggregate.html
MAX and MIN functions

Hi Melvin, thank you for the suggestion. Unfortunately it won't do in my case. Sorry for not providing enough context in the first place.

After making the first query (A), I'm iterating over that table (with LIMIT 100 and increasing OFFSET) - using a query "B" - until another condition is met; overall, code is supposed to gather a number of rows from the table. I'm also using the "max ID" for another purpose.

--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Oops, my bad. I didn't have my morning coffee yet.

Try this instead:

WITH minmax AS

(SELECT MIN(id) as min_id,

MAX(id AS max_id)

FROM events

)

SELECT id
, ...
FROM events e,

minmax m

WHERE e.id > m.min_id
AND e.id <= m.max_id
AND ...

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: READ COMMITTED vs. index-only scans

From

Tom Lane

Date:

18 January 2018, 00:56:14

=?UTF-8?Q?Jacek_Ko=C5=82odziej?= <kolodziejj@gmail.com> writes:
> Here's what happening to me: the "A" query occasionally (in my case: on the
> order of tenths per day) returns an ID _higher_ than any ID present in
> second query's result (other conditions I haven't specified do _not_ filter
> any more rows than "id <= max ID") - as if some entries were visible for
> the first query, but not for the second one. This is an inconsistency that
> is very problematic for me.

That sounds problematic to me too, but how certain are you that the "other
conditions you haven't specified" aren't suppressing the last row?  That'd
certainly be the least surprising explanation.  If it isn't that, though,
this surely seems like a bug.

Can you determine whether the row(s) missing in the second query are
freshly committed?  Or have they been there awhile?

> Where am I wrong? What am I missing? What information may I provide to help
> with investigating this?

Probably the best thing to spend time on would be to try to extract a
publishable test case.  It would be really hard to get to the bottom
of an issue like this without having a reproducer.  It's okay if it
takes awhile to reproduce the fault ...

Also, before spending a whole lot of time on this: are you on 9.6.6?
If not, update, just in case this is an already-fixed issue.  The
symptoms don't sound familiar, but I don't want to waste a lot of
time only to find out it's some manifestation of a known bug.

            regards, tom lane

RE: READ COMMITTED vs. index-only scans

From

Karen Stone

Date:

18 January 2018, 01:00:54

Please remove me from this list.  Thanks.

Karen Stone| Technical Services| Eldorado |a Division of MphasiS
5353 North 16th Street, Suite 400, Phoenix, Arizona 85016-3228
Tel (928) 892 5735 | www.eldoinc.com | www.mphasis.com |kstone@eldocomp.com

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, January 17, 2018 11:56 AM
To: Jacek Kołodziej <kolodziejj@gmail.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: READ COMMITTED vs. index-only scans

=?UTF-8?Q?Jacek_Ko=C5=82odziej?= <kolodziejj@gmail.com> writes:
> Here's what happening to me: the "A" query occasionally (in my case:
> on the order of tenths per day) returns an ID _higher_ than any ID
> present in second query's result (other conditions I haven't specified
> do _not_ filter any more rows than "id <= max ID") - as if some
> entries were visible for the first query, but not for the second one.
> This is an inconsistency that is very problematic for me.

That sounds problematic to me too, but how certain are you that the "other conditions you haven't specified" aren't
suppressingthe last row?  That'd certainly be the least surprising explanation.  If it isn't that, though, this surely
seemslike a bug. 

Can you determine whether the row(s) missing in the second query are freshly committed?  Or have they been there
awhile?

> Where am I wrong? What am I missing? What information may I provide to
> help with investigating this?

Probably the best thing to spend time on would be to try to extract a publishable test case.  It would be really hard
toget to the bottom of an issue like this without having a reproducer.  It's okay if it takes awhile to reproduce the
fault... 

Also, before spending a whole lot of time on this: are you on 9.6.6?
If not, update, just in case this is an already-fixed issue.  The symptoms don't sound familiar, but I don't want to
wastea lot of time only to find out it's some manifestation of a known bug. 

            regards, tom lane

Re: READ COMMITTED vs. index-only scans

From

Jacek Kołodziej

Date:

18 January 2018, 02:34:21

Hi Tom,

On Wed, Jan 17, 2018 at 7:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jacek Ko\xC5=82odziej <kolodziejj@gmail.com> writes:
> Here's what happening to me: the "A" query occasionally (in my case: on the
> order of tenths per day) returns an ID _higher_ than any ID present in
> second query's result (other conditions I haven't specified do _not_ filter
> any more rows than "id <= max ID") - as if some entries were visible for
> the first query, but not for the second one. This is an inconsistency that
> is very problematic for me.

That sounds problematic to me too, but how certain are you that the "other
conditions you haven't specified" aren't suppressing the last row? That'd
certainly be the least surprising explanation. If it isn't that, though,
this surely seems like a bug.

Yes, I'm fairly sure of that. When I execute that same "B" query again some time afterwards, it returns all expected rows - I mean, also these that were "included" in original "A" query and that were "missing" in "B" one first time around.

Can you determine whether the row(s) missing in the second query are
freshly committed? Or have they been there awhile?

Depends on what would be considered "fresh", usually it's on the order of miliseconds or seconds.

> Where am I wrong? What am I missing? What information may I provide to help
> with investigating this?

Probably the best thing to spend time on would be to try to extract a
publishable test case. It would be really hard to get to the bottom
of an issue like this without having a reproducer. It's okay if it
takes awhile to reproduce the fault ...

I'd certainly love to have a working repro. I won't be able to do it for the next few days but I'll work on this right after the weekend.

Also, before spending a whole lot of time on this: are you on 9.6.6?
If not, update, just in case this is an already-fixed issue. The
symptoms don't sound familiar, but I don't want to waste a lot of
time only to find out it's some manifestation of a known bug.

regards, tom lane

I'm using 9.6.5; I'm not administrating it so it might take some time before updating but once it's done, I'll get back with whether that fixed the situation. In the meantime, when trying to reproduce it locally, I'll use both 9.6.5 and 9.6.6 to see whether it makes any difference.

Thank you very much for the suggestions.

Kind regards,
Jacek Kołodziej
http://kolodziejj.info

Re: READ COMMITTED vs. index-only scans

From

Jacek Kołodziej

Date:

19 February 2018, 01:38:33

On Wed, Jan 17, 2018 at 9:34 PM, Jacek Kołodziej <kolodziejj@gmail.com> wrote:

Hi Tom,

On Wed, Jan 17, 2018 at 7:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jacek Kołodziej <kolodziejj@gmail.com> writes:
> Here's what happening to me: the "A" query occasionally (in my case: on the
> order of tenths per day) returns an ID _higher_ than any ID present in
> second query's result (other conditions I haven't specified do _not_ filter
> any more rows than "id <= max ID") - as if some entries were visible for
> the first query, but not for the second one. This is an inconsistency that
> is very problematic for me.

That sounds problematic to me too, but how certain are you that the "other
conditions you haven't specified" aren't suppressing the last row? That'd
certainly be the least surprising explanation. If it isn't that, though,
this surely seems like a bug.

Yes, I'm fairly sure of that. When I execute that same "B" query again some time afterwards, it returns all expected rows - I mean, also these that were "included" in original "A" query and that were "missing" in "B" one first time around.

Can you determine whether the row(s) missing in the second query are
freshly committed? Or have they been there awhile?

Depends on what would be considered "fresh", usually it's on the order of miliseconds or seconds.

> Where am I wrong? What am I missing? What information may I provide to help
> with investigating this?

Probably the best thing to spend time on would be to try to extract a
publishable test case. It would be really hard to get to the bottom
of an issue like this without having a reproducer. It's okay if it
takes awhile to reproduce the fault ...

I'd certainly love to have a working repro. I won't be able to do it for the next few days but I'll work on this right after the weekend.

Also, before spending a whole lot of time on this: are you on 9.6.6?
If not, update, just in case this is an already-fixed issue. The
symptoms don't sound familiar, but I don't want to waste a lot of
time only to find out it's some manifestation of a known bug.

regards, tom lane

I'm using 9.6.5; I'm not administrating it so it might take some time before updating but once it's done, I'll get back with whether that fixed the situation. In the meantime, when trying to reproduce it locally, I'll use both 9.6.5 and 9.6.6 to see whether it makes any difference.

Thank you very much for the suggestions.

--
Kind regards,
Jacek Kołodziej
http://kolodziejj.info

Hello again,

unsurprisingly, the fault was on my side - it was discovered by my colleague. Source of the problem was disregarding of how sequences work - i.e., how they produce monotonic numbers at query time (in this case: IDs for the primary key) but these queries (and,in turn, numbers from the sequence) may be committed in a different order (T1, T2, T3 - transactions; T1 and T2 appends to the events table, T3 reads from it with "A" and "B" queries):

- T1 was inserting an event (it got ID 6) - it did not commit yet!

- T2 was inserting an event (it got ID 7) and it commits

- T3 has made a query "A" (which gets "max event's ID" equal to 7 at that time) - and selects events with query "B" (which gets event with ID 7, but not the one with ID 6 - because it has not been comitted yet)

- T1 commits but that event (ID equals to 6) has been omitted already and won't be picked up ever again

REPEATABLE READ could not help us in such situation.

We ended up - at least for now - forcing serialization of inserting the events (so that their IDs will always match the order in which they are inserted _and comitted_ into the database); while it may be suboptimal, it seems like a good-enough solution for our use case for the time being and we'll be working on more scalable solution in the future.

Thank you for looking into this and inspiration for further investigation.

Kind regards,
Jacek Kołodziej
http://kolodziejj.info