Thread: BUG #17925: Incorrect select query result

BUG #17925: Incorrect select query result

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17925
Logged by:          Joakim Goldkuhl
Email address:      joakim@cogmed.com
PostgreSQL version: 13.10
Operating system:   Linux
Description:

Hello, this query returns different result depending on chosen execution
plan.

The query run in our database is 

  with ae_trainings as (
    select * from trainings.activity_events
    where training_id in (709, 891))
  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not
null) suspicions_column_result
  from trainings.training_blocks tb
  left join ae_trainings ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (709, 891)
    -- query returns expected result with tb.block < 22 
    -- but with tb.block < 23 ae.trial_time_stimuli_start is incorrectly
null
    and tb.block < 22
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

everything is fine when tb.block < 22 and the result is:

709    21    68
709    20    71
709    19    68
709    18    76
709    17    73
709    16    69
709    15    70
709    14    68
709    13    71
709    12    69

However, when run with tb.block < 23, i.e,

  with ae_trainings as (
    select * from trainings.activity_events
    where training_id in (709, 891))
  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not
null) suspicions_column_result
  from trainings.training_blocks tb
  left join ae_trainings ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (709, 891)
    -- query returns expected result with tb.block < 22 
    -- but with tb.block < 23 ae.trial_time_stimuli_start is incorrectly
null
    and tb.block < 22
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

the result changes to:

709    22    5
709    21    5
709    20    5
709    19    5
709    18    5
709    17    5
709    16    5
709    15    5
709    14    5
709    13    5

I run explain analyze on the two queries, first query plan is

Limit  (cost=16.45..171.35 rows=10 width=14) (actual time=35.517..35.616
rows=10 loops=1)
  ->  GroupAggregate  (cost=16.45..3222.72 rows=207 width=14) (actual
time=35.516..35.614 rows=10 loops=1)
        Group Key: tb.training_id, tb.block
        ->  Incremental Sort  (cost=16.45..3219.10 rows=207 width=14)
(actual time=35.497..35.534 rows=704 loops=1)
              Sort Key: tb.training_id, tb.block DESC
              Presorted Key: tb.training_id
              Full-sort Groups: 1  Sort Method: quicksort  Average Memory:
28kB  Peak Memory: 28kB
              Pre-sorted Groups: 1  Sort Method: quicksort  Average Memory:
122kB  Peak Memory: 122kB
              ->  Nested Loop Left Join  (cost=0.87..3209.82 rows=207
width=14) (actual time=0.040..35.217 rows=1574 loops=1)
                    ->  Index Only Scan using pk_training_blocks on
training_blocks tb  (cost=0.43..546.86 rows=207 width=10) (actual
time=0.010..0.100 rows=106 loops=1)
                          Index Cond: ((training_id = ANY
('{709,891}'::integer[])) AND (block < 22))
                          Heap Fetches: 106
                    ->  Index Scan using i_activity_events_training_id on
activity_events  (cost=0.44..12.85 rows=1 width=18) (actual
time=0.119..0.328 rows=15 loops=106)
                          Index Cond: ((training_id = tb.training_id) AND
(training_id = ANY ('{709,891}'::integer[])))
                          Filter: ((event = 'trial'::trainings.event) AND
(block = tb.block) AND (activity = tb.activity))
                          Rows Removed by Filter: 2165
Planning Time: 0.329 ms
Execution Time: 35.653 ms

while the second (which gives incorrect result) is

Limit  (cost=15.91..165.45 rows=10 width=14) (actual time=28.916..28.928
rows=10 loops=1)
  ->  GroupAggregate  (cost=15.91..3260.98 rows=217 width=14) (actual
time=28.914..28.925 rows=10 loops=1)
        Group Key: tb.training_id, tb.block
        ->  Incremental Sort  (cost=15.91..3257.18 rows=217 width=14)
(actual time=28.905..28.909 rows=51 loops=1)
              Sort Key: tb.training_id, tb.block DESC
              Presorted Key: tb.training_id
              Full-sort Groups: 1  Sort Method: quicksort  Average Memory:
28kB  Peak Memory: 28kB
              Pre-sorted Groups: 1  Sort Method: quicksort  Average Memory:
59kB  Peak Memory: 59kB
              ->  Merge Left Join  (cost=0.87..3247.45 rows=217 width=14)
(actual time=0.047..28.762 rows=727 loops=1)
                    Merge Cond: (tb.training_id =
activity_events.training_id)
                    Join Filter: ((activity_events.block = tb.block) AND
(activity_events.activity = tb.activity))
                    Rows Removed by Join Filter: 73481
                    ->  Index Only Scan using pk_training_blocks on
training_blocks tb  (cost=0.43..572.73 rows=217 width=10) (actual
time=0.010..0.118 rows=111 loops=1)
                          Index Cond: ((training_id = ANY
('{709,891}'::integer[])) AND (block < 23))
                          Heap Fetches: 111
                    ->  Index Scan using i_activity_events_training_id on
activity_events  (cost=0.44..2664.75 rows=3584 width=18) (actual
time=0.009..18.290 rows=74032 loops=1)
                          Index Cond: (training_id = ANY
('{709,891}'::integer[]))
                          Filter: (event = 'trial'::trainings.event)
Planning Time: 0.570 ms
Execution Time: 28.974 ms

I'm no postgresql expert and I don't think I understand the query planner,
but to me the it seems like a bug. I'm sorry that I couldn't come up with a
more minimal query example but if I make small tweaks to the query it starts
to produce expected results again.

Thanks for awesome software anyway!

/Joakim


Re: BUG #17925: Incorrect select query result

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> Hello, this query returns different result depending on chosen execution
> plan.

This does seem fishy, but it's impossible to investigate with
this amount of detail.  Can you provide table declarations and
sample data to go with the query?

            regards, tom lane



Re: BUG #17925: Incorrect select query result

From
Joakim Goldkuhl
Date:
Since this is corporation data I cannot provide much data in this public mailing list unfortunately. And I have not yet been able to reproduce this behaviour on dummy data. 
I notice the following behaviour from the query, even though it probably isn't sufficient to find what's going on, 

specifying query to training_id 709,

  with ae_trainings as (
    select * from trainings.activity_events
    where training_id in (709))
  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
  from trainings.training_blocks tb
  left join ae_trainings ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (709)
    and tb.block < 23
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

returns expected result

709 22 70
709 21 68
709 20 71
709 19 68
709 18 76
709 17 73
709 16 69
709 15 70
709 14 68
709 13 71

also querying for 891 as in

  with ae_trainings as (
    select * from trainings.activity_events
    where training_id in (891))
  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
  from trainings.training_blocks tb
  left join ae_trainings ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (891)
    and tb.block < 23
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

yields correct result

891 22 3
891 21 3
891 20 3
891 19 3
891 18 3
891 17 3
891 16 3
891 15 3
891 14 3
891 13 3

However, querying for both 709 and 891

  with ae_trainings as (
    select * from trainings.activity_events
    where training_id in (709,891))
  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
  from trainings.training_blocks tb
  left join ae_trainings ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (709,891)
    and tb.block < 23
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

returns incorrect result for 709

709 22 5
709 21 5
709 20 5
709 19 5
709 18 5
709 17 5
709 16 5
709 15 5
709 14 5
709 13 5

If I remove the CTE by using 

  select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
  from trainings.training_blocks tb
  left join trainings.activity_events ae
    on ae.training_id = tb.training_id
      and ae.block = tb.block
      and ae.activity = tb.activity
      and ae.event = 'trial'
  where tb.training_id in (709,891)
    and tb.block < 23
  group by tb.training_id, tb.block
  order by tb.training_id, tb.block desc
  limit 10

the result is correct again.

Also, I should mention that the database is running on RDS, managed by AWS, so maybe you are not interested in looking into it in this case?

Thanks, Joakim


On Mon, May 8, 2023 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Hello, this query returns different result depending on chosen execution
> plan.

This does seem fishy, but it's impossible to investigate with
this amount of detail.  Can you provide table declarations and
sample data to go with the query?

                        regards, tom lane

Re: BUG #17925: Incorrect select query result

From
Tom Lane
Date:
Joakim Goldkuhl <joakim.goldkuhl@cogmed.com> writes:
> Since this is corporation data I cannot provide much data in this public
> mailing list unfortunately. And I have not yet been able to reproduce this
> behaviour on dummy data.

Hm, well, please keep trying.

> Also, I should mention that the database is running on RDS, managed by AWS,
> so maybe you are not interested in looking into it in this case?

That does complicate matters, since it's hard to guess whether it's
a native Postgres bug or something Amazon introduced.  (I believe
RDS Postgres is mostly Postgres, unlike say Aurora, but there is some
secret sauce at the storage level.)  All the more reason to try to
devise a publishable test case.

            regards, tom lane