Thread: BUG #17925: Incorrect select query result
The following bug has been logged on the website: Bug reference: 17925 Logged by: Joakim Goldkuhl Email address: joakim@cogmed.com PostgreSQL version: 13.10 Operating system: Linux Description: Hello, this query returns different result depending on chosen execution plan. The query run in our database is with ae_trainings as ( select * from trainings.activity_events where training_id in (709, 891)) select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result from trainings.training_blocks tb left join ae_trainings ae on ae.training_id = tb.training_id and ae.block = tb.block and ae.activity = tb.activity and ae.event = 'trial' where tb.training_id in (709, 891) -- query returns expected result with tb.block < 22 -- but with tb.block < 23 ae.trial_time_stimuli_start is incorrectly null and tb.block < 22 group by tb.training_id, tb.block order by tb.training_id, tb.block desc limit 10 everything is fine when tb.block < 22 and the result is: 709 21 68 709 20 71 709 19 68 709 18 76 709 17 73 709 16 69 709 15 70 709 14 68 709 13 71 709 12 69 However, when run with tb.block < 23, i.e, with ae_trainings as ( select * from trainings.activity_events where training_id in (709, 891)) select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result from trainings.training_blocks tb left join ae_trainings ae on ae.training_id = tb.training_id and ae.block = tb.block and ae.activity = tb.activity and ae.event = 'trial' where tb.training_id in (709, 891) -- query returns expected result with tb.block < 22 -- but with tb.block < 23 ae.trial_time_stimuli_start is incorrectly null and tb.block < 22 group by tb.training_id, tb.block order by tb.training_id, tb.block desc limit 10 the result changes to: 709 22 5 709 21 5 709 20 5 709 19 5 709 18 5 709 17 5 709 16 5 709 15 5 709 14 5 709 13 5 I run explain analyze on the two queries, first query plan is Limit (cost=16.45..171.35 rows=10 width=14) (actual time=35.517..35.616 rows=10 loops=1) -> GroupAggregate (cost=16.45..3222.72 rows=207 width=14) (actual time=35.516..35.614 rows=10 loops=1) Group Key: tb.training_id, tb.block -> Incremental Sort (cost=16.45..3219.10 rows=207 width=14) (actual time=35.497..35.534 rows=704 loops=1) Sort Key: tb.training_id, tb.block DESC Presorted Key: tb.training_id Full-sort Groups: 1 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB Pre-sorted Groups: 1 Sort Method: quicksort Average Memory: 122kB Peak Memory: 122kB -> Nested Loop Left Join (cost=0.87..3209.82 rows=207 width=14) (actual time=0.040..35.217 rows=1574 loops=1) -> Index Only Scan using pk_training_blocks on training_blocks tb (cost=0.43..546.86 rows=207 width=10) (actual time=0.010..0.100 rows=106 loops=1) Index Cond: ((training_id = ANY ('{709,891}'::integer[])) AND (block < 22)) Heap Fetches: 106 -> Index Scan using i_activity_events_training_id on activity_events (cost=0.44..12.85 rows=1 width=18) (actual time=0.119..0.328 rows=15 loops=106) Index Cond: ((training_id = tb.training_id) AND (training_id = ANY ('{709,891}'::integer[]))) Filter: ((event = 'trial'::trainings.event) AND (block = tb.block) AND (activity = tb.activity)) Rows Removed by Filter: 2165 Planning Time: 0.329 ms Execution Time: 35.653 ms while the second (which gives incorrect result) is Limit (cost=15.91..165.45 rows=10 width=14) (actual time=28.916..28.928 rows=10 loops=1) -> GroupAggregate (cost=15.91..3260.98 rows=217 width=14) (actual time=28.914..28.925 rows=10 loops=1) Group Key: tb.training_id, tb.block -> Incremental Sort (cost=15.91..3257.18 rows=217 width=14) (actual time=28.905..28.909 rows=51 loops=1) Sort Key: tb.training_id, tb.block DESC Presorted Key: tb.training_id Full-sort Groups: 1 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB Pre-sorted Groups: 1 Sort Method: quicksort Average Memory: 59kB Peak Memory: 59kB -> Merge Left Join (cost=0.87..3247.45 rows=217 width=14) (actual time=0.047..28.762 rows=727 loops=1) Merge Cond: (tb.training_id = activity_events.training_id) Join Filter: ((activity_events.block = tb.block) AND (activity_events.activity = tb.activity)) Rows Removed by Join Filter: 73481 -> Index Only Scan using pk_training_blocks on training_blocks tb (cost=0.43..572.73 rows=217 width=10) (actual time=0.010..0.118 rows=111 loops=1) Index Cond: ((training_id = ANY ('{709,891}'::integer[])) AND (block < 23)) Heap Fetches: 111 -> Index Scan using i_activity_events_training_id on activity_events (cost=0.44..2664.75 rows=3584 width=18) (actual time=0.009..18.290 rows=74032 loops=1) Index Cond: (training_id = ANY ('{709,891}'::integer[])) Filter: (event = 'trial'::trainings.event) Planning Time: 0.570 ms Execution Time: 28.974 ms I'm no postgresql expert and I don't think I understand the query planner, but to me the it seems like a bug. I'm sorry that I couldn't come up with a more minimal query example but if I make small tweaks to the query it starts to produce expected results again. Thanks for awesome software anyway! /Joakim
PG Bug reporting form <noreply@postgresql.org> writes: > Hello, this query returns different result depending on chosen execution > plan. This does seem fishy, but it's impossible to investigate with this amount of detail. Can you provide table declarations and sample data to go with the query? regards, tom lane
Since this is corporation data I cannot provide much data in this public mailing list unfortunately. And I have not yet been able to reproduce this behaviour on dummy data.
with ae_trainings as (
select * from trainings.activity_events
where training_id in (709))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
with ae_trainings as (
select * from trainings.activity_events
where training_id in (891))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join trainings.activity_events ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709,891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
I notice the following behaviour from the query, even though it probably isn't sufficient to find what's going on,
specifying query to training_id 709,
with ae_trainings as (
select * from trainings.activity_events
where training_id in (709))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
returns expected result
709 22 70
709 21 68
709 20 71
709 19 68
709 18 76
709 17 73
709 16 69
709 15 70
709 14 68
709 13 71
709 21 68
709 20 71
709 19 68
709 18 76
709 17 73
709 16 69
709 15 70
709 14 68
709 13 71
also querying for 891 as in
with ae_trainings as (
select * from trainings.activity_events
where training_id in (891))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
yields correct result
891 22 3
891 21 3
891 20 3
891 19 3
891 18 3
891 17 3
891 16 3
891 15 3
891 14 3
891 13 3
891 21 3
891 20 3
891 19 3
891 18 3
891 17 3
891 16 3
891 15 3
891 14 3
891 13 3
However, querying for both 709 and 891
with ae_trainings as (
select * from trainings.activity_events
where training_id in (709,891))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709,891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
select * from trainings.activity_events
where training_id in (709,891))
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join ae_trainings ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709,891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
returns incorrect result for 709
709 22 5
709 21 5
709 20 5
709 19 5
709 18 5
709 17 5
709 16 5
709 15 5
709 14 5
709 13 5
709 21 5
709 20 5
709 19 5
709 18 5
709 17 5
709 16 5
709 15 5
709 14 5
709 13 5
If I remove the CTE by using
select tb.training_id, tb.block, count(ae.trial_time_stimuli_start is not null) suspicions_column_result
from trainings.training_blocks tb
left join trainings.activity_events ae
on ae.training_id = tb.training_id
and ae.block = tb.block
and ae.activity = tb.activity
and ae.event = 'trial'
where tb.training_id in (709,891)
and tb.block < 23
group by tb.training_id, tb.block
order by tb.training_id, tb.block desc
limit 10
the result is correct again.
Also, I should mention that the database is running on RDS, managed by AWS, so maybe you are not interested in looking into it in this case?
Thanks, Joakim
On Mon, May 8, 2023 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Hello, this query returns different result depending on chosen execution
> plan.
This does seem fishy, but it's impossible to investigate with
this amount of detail. Can you provide table declarations and
sample data to go with the query?
regards, tom lane
Joakim Goldkuhl <joakim.goldkuhl@cogmed.com> writes: > Since this is corporation data I cannot provide much data in this public > mailing list unfortunately. And I have not yet been able to reproduce this > behaviour on dummy data. Hm, well, please keep trying. > Also, I should mention that the database is running on RDS, managed by AWS, > so maybe you are not interested in looking into it in this case? That does complicate matters, since it's hard to guess whether it's a native Postgres bug or something Amazon introduced. (I believe RDS Postgres is mostly Postgres, unlike say Aurora, but there is some secret sauce at the storage level.) All the more reason to try to devise a publishable test case. regards, tom lane