Thread: BUG #15007: LIMIT not respected in sub-queries
The following bug has been logged on the website: Bug reference: 15007 Logged by: Will Storey Email address: will@summercat.com PostgreSQL version: 10.1 Operating system: Ubuntu 16.04 Description: Hello, I am not sure this is a bug. But it is surprising to me and seems to contradict the documentation in terms of join nesting. I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT clause. I've found that sometimes I receive more rows (at most one extra in my testing) than the LIMIT, where I expected only as many rows as the LIMIT. This depends on the query plan. With some plans it never happens, and with others it happens frequently. In looking into this behaviour, I came across hints that this is a known quirk. I found bug reports related specifically to UPDATE/DELETE that sound similar to this, but no mention that the behaviour can happen with SELECT: https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1 (note the comments on the accepted answer) https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com (and the thread) https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL 9.6.5 (where I initially encountered the behaviour). Unfortunately my test case is not very clean and it is somewhat long, so I've put it in a gist on GitHub: https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0 The SELECT query (run by the Perl program) quickly prints out that it receives 6 rows. As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has loops > 1. I believe this is the cause of the behaviour. If I tweak the test to have a plan where that node runs before the Seq Scan, there are never more than 5 rows. I believe a better way to write this query would be to use a CTE. Thank you for your time!
Hi, On 01/11/2018 10:16 PM, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15007 > Logged by: Will Storey > Email address: will@summercat.com > PostgreSQL version: 10.1 > Operating system: Ubuntu 16.04 > Description: > > Hello, > > I am not sure this is a bug. But it is surprising to me and seems to > contradict the documentation in terms of join nesting. > > I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT > clause. I've found that sometimes I receive more rows (at most one extra in > my testing) than the LIMIT, where I expected only as many rows as the LIMIT. > This depends on the query plan. With some plans it never happens, and with > others it happens frequently. > > In looking into this behaviour, I came across hints that this is a known > quirk. I found bug reports related specifically to UPDATE/DELETE that sound > similar to this, but no mention that the behaviour can happen with SELECT: > > https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1 > (note the comments on the accepted answer) > https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com > (and the thread) > https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com > > This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL > repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL > 9.6.5 (where I initially encountered the behaviour). > I think you're right those issues are related - a change of plan may affect the results in a somewhat non-deterministic way. In the cases you linked it's UPDATE on additional rows, in your case it's more about random() during rescans. I'll get to that in a minute. > Unfortunately my test case is not very clean and it is somewhat long, so > I've put it in a gist on GitHub: > > https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0 > > The SELECT query (run by the Perl program) quickly prints out that it > receives 6 rows. > > As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has > loops > 1. I believe this is the cause of the behaviour. If I tweak the test > to have a plan where that node runs before the Seq Scan, there are never > more than 5 rows. > I went through the test case, and I think I know what's going on. The script did not reproduce the issue for me, but I think I've been able to tweak the query to use a plan producing more issues. The key is to force a particular join order, and disable operations that would materialize intermediate results of the t1 scan. That is, we want something like this: -> nestloop -> nestloop t2 + t3 -> seq scan t1 Firstly, I've replaced the dynamic random() condition with a static one: random() <= 0.5 this is not strictly necessary, but it simplifies the plan. We need the random() call though, as I'll explain later. Then I've disabled a bunch of plan nodes that would materialize results of the t1 seq scan: set enable_material = off; set enable_sort = off; set enable_hashjoin = off; And finally, I've disabled join reordering by setting set join_collapse_limit = 1; Now, if you rewrite the query like this (which essentially just forces a particular join order, when combined with join_collapse_limit=1, nothing else): explain analyze SELECT * FROM (t3 JOIN t2 USING (t2_id)) JOIN (SELECT * FROM t1 WHERE t1_id IS NOT NULL AND t1_id < 100 AND t1_val LIKE 'h%' AND random() <= 0.5 LIMIT 5 ) AS t1 ON t3.t1_id = t1.t1_id WHERE t2.t2_val LIKE 'he%'; you will get plans like this: QUERY PLAN ------------------------------------------------------------------------ Nested Loop (cost=0.00..31.41 rows=5 width=21) (actual time=0.052..1.825 rows=7 loops=1) Join Filter: (t3.t1_id = t1.t1_id) Rows Removed by Join Filter: 73 -> Nested Loop (cost=0.00..16.10 rows=16 width=14) (actual time=0.030..0.917 rows=16 loops=1) Join Filter: (t3.t2_id = t2.t2_id) Rows Removed by Join Filter: 160 -> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10) (actual time=0.015..0.041 rows=11 loops=1) Filter: ((t2_val)::text ~~ 'he%'::text) -> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8) (actual time=0.003..0.037 rows=16 loops=11) -> Limit (cost=0.00..0.84 rows=5 width=7) (actual time=0.008..0.037 rows=5 loops=16) -> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7) (actual time=0.004..0.016 rows=5 loops=16) Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND ((t1_val)::text ~~ 'h%'::text) AND (random() <= '0.5'::double precision)) Rows Removed by Filter: 5 Planning time: 0.625 ms Execution time: 1.911 ms (15 rows) This should be equivalent to the original query, and should produce the same results (modulo random() of course). But notice it actually does produce 7 rows! Nested Loop (cost=0.00..31.41 rows=5 width=21) (actual time=0.052..1.825 rows=7 loops=1) ^ The problem is that it ends up executing the sequential scan on t1 repeatedly (because it's the inner relation in a nested loop), and because random() is volatile, the results of the scan are likely different. Each rescan individually still respects the LIMIT, but when combined result may be larger - there may be more unique IDs, matching additional rows from the other tables. > > I believe a better way to write this query would be to use a CTE. > Yes, that will stabilize the output of the random() function, eliminating the nondeterminism during rescans. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu 2018-01-18 00:08:31 +0100, Tomas Vondra wrote: <snip> > I went through the test case, and I think I know what's going on. The > script did not reproduce the issue for me, but I think I've been able to > tweak the query to use a plan producing more issues. > > The key is to force a particular join order, and disable operations that > would materialize intermediate results of the t1 scan. That is, we want > something like this: > > -> nestloop > -> nestloop t2 + t3 > -> seq scan t1 > > Firstly, I've replaced the dynamic random() condition with a static one: > > random() <= 0.5 > > this is not strictly necessary, but it simplifies the plan. We need the > random() call though, as I'll explain later. > > Then I've disabled a bunch of plan nodes that would materialize results > of the t1 seq scan: > > set enable_material = off; > set enable_sort = off; > set enable_hashjoin = off; > > And finally, I've disabled join reordering by setting > > set join_collapse_limit = 1; > > Now, if you rewrite the query like this (which essentially just forces a > particular join order, when combined with join_collapse_limit=1, nothing > else): > > explain analyze SELECT * FROM > (t3 JOIN t2 USING (t2_id)) JOIN > (SELECT * FROM t1 > WHERE t1_id IS NOT NULL AND > t1_id < 100 AND > t1_val LIKE 'h%' AND > random() <= 0.5 > LIMIT 5 > ) AS t1 > ON t3.t1_id = t1.t1_id > WHERE t2.t2_val LIKE 'he%'; > > you will get plans like this: > > > QUERY PLAN > ------------------------------------------------------------------------ > Nested Loop (cost=0.00..31.41 rows=5 width=21) > (actual time=0.052..1.825 rows=7 loops=1) > Join Filter: (t3.t1_id = t1.t1_id) > Rows Removed by Join Filter: 73 > -> Nested Loop (cost=0.00..16.10 rows=16 width=14) > (actual time=0.030..0.917 rows=16 loops=1) > Join Filter: (t3.t2_id = t2.t2_id) > Rows Removed by Join Filter: 160 > -> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10) > (actual time=0.015..0.041 rows=11 loops=1) > Filter: ((t2_val)::text ~~ 'he%'::text) > -> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8) > (actual time=0.003..0.037 rows=16 loops=11) > -> Limit (cost=0.00..0.84 rows=5 width=7) > (actual time=0.008..0.037 rows=5 loops=16) > -> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7) > (actual time=0.004..0.016 rows=5 loops=16) > Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND > ((t1_val)::text ~~ 'h%'::text) AND > (random() <= '0.5'::double precision)) > Rows Removed by Filter: 5 > Planning time: 0.625 ms > Execution time: 1.911 ms > (15 rows) > > > This should be equivalent to the original query, and should produce the > same results (modulo random() of course). > > But notice it actually does produce 7 rows! > > Nested Loop (cost=0.00..31.41 rows=5 width=21) > (actual time=0.052..1.825 rows=7 loops=1) > ^ > > The problem is that it ends up executing the sequential scan on t1 > repeatedly (because it's the inner relation in a nested loop), and > because random() is volatile, the results of the scan are likely different. > > Each rescan individually still respects the LIMIT, but when combined > result may be larger - there may be more unique IDs, matching additional > rows from the other tables. > > > > > I believe a better way to write this query would be to use a CTE. > > > > Yes, that will stabilize the output of the random() function, > eliminating the nondeterminism during rescans. Wow, thank you for the great explanation of what is going on! It sounds like this is not really a bug then and is just something to be expected when using LIMIT with such volatile joins. I suppose I expected that if there is a LIMIT then that would be the maximum number of rows the subquery would ever provide. The planner would have to force materializing/stabilizing in such cases it sounds like. Maybe that is not possible or not a good idea. It does seem like a pretty edge case. Thanks again!
On 01/19/2018 02:55 AM, Will Storey wrote: > On Thu 2018-01-18 00:08:31 +0100, Tomas Vondra wrote: > > <snip> > >> I went through the test case, and I think I know what's going on. The >> script did not reproduce the issue for me, but I think I've been able to >> tweak the query to use a plan producing more issues. >> >> The key is to force a particular join order, and disable operations that >> would materialize intermediate results of the t1 scan. That is, we want >> something like this: >> >> -> nestloop >> -> nestloop t2 + t3 >> -> seq scan t1 >> >> Firstly, I've replaced the dynamic random() condition with a static one: >> >> random() <= 0.5 >> >> this is not strictly necessary, but it simplifies the plan. We need the >> random() call though, as I'll explain later. >> >> Then I've disabled a bunch of plan nodes that would materialize results >> of the t1 seq scan: >> >> set enable_material = off; >> set enable_sort = off; >> set enable_hashjoin = off; >> >> And finally, I've disabled join reordering by setting >> >> set join_collapse_limit = 1; >> >> Now, if you rewrite the query like this (which essentially just forces a >> particular join order, when combined with join_collapse_limit=1, nothing >> else): >> >> explain analyze SELECT * FROM >> (t3 JOIN t2 USING (t2_id)) JOIN >> (SELECT * FROM t1 >> WHERE t1_id IS NOT NULL AND >> t1_id < 100 AND >> t1_val LIKE 'h%' AND >> random() <= 0.5 >> LIMIT 5 >> ) AS t1 >> ON t3.t1_id = t1.t1_id >> WHERE t2.t2_val LIKE 'he%'; >> >> you will get plans like this: >> >> >> QUERY PLAN >> ------------------------------------------------------------------------ >> Nested Loop (cost=0.00..31.41 rows=5 width=21) >> (actual time=0.052..1.825 rows=7 loops=1) >> Join Filter: (t3.t1_id = t1.t1_id) >> Rows Removed by Join Filter: 73 >> -> Nested Loop (cost=0.00..16.10 rows=16 width=14) >> (actual time=0.030..0.917 rows=16 loops=1) >> Join Filter: (t3.t2_id = t2.t2_id) >> Rows Removed by Join Filter: 160 >> -> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10) >> (actual time=0.015..0.041 rows=11 loops=1) >> Filter: ((t2_val)::text ~~ 'he%'::text) >> -> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8) >> (actual time=0.003..0.037 rows=16 loops=11) >> -> Limit (cost=0.00..0.84 rows=5 width=7) >> (actual time=0.008..0.037 rows=5 loops=16) >> -> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7) >> (actual time=0.004..0.016 rows=5 loops=16) >> Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND >> ((t1_val)::text ~~ 'h%'::text) AND >> (random() <= '0.5'::double precision)) >> Rows Removed by Filter: 5 >> Planning time: 0.625 ms >> Execution time: 1.911 ms >> (15 rows) >> >> >> This should be equivalent to the original query, and should produce the >> same results (modulo random() of course). >> >> But notice it actually does produce 7 rows! >> >> Nested Loop (cost=0.00..31.41 rows=5 width=21) >> (actual time=0.052..1.825 rows=7 loops=1) >> ^ >> >> The problem is that it ends up executing the sequential scan on t1 >> repeatedly (because it's the inner relation in a nested loop), and >> because random() is volatile, the results of the scan are likely different. >> >> Each rescan individually still respects the LIMIT, but when combined >> result may be larger - there may be more unique IDs, matching additional >> rows from the other tables. >> >>> >>> I believe a better way to write this query would be to use a CTE. >>> >> >> Yes, that will stabilize the output of the random() function, >> eliminating the nondeterminism during rescans. > > Wow, thank you for the great explanation of what is going on! > > It sounds like this is not really a bug then and is just something to > be expected when using LIMIT with such volatile joins. > The volatile part here is not the join, but the random function during rescans of a relation. But yeah, I don't think this qualifies as a bug. > > I suppose I expected that if there is a LIMIT then that would be the > maximum number of rows the subquery would ever provide. The planner > would have to force materializing/stabilizing in such cases it sounds > like. Maybe that is not possible or not a good idea. It does seem > like a pretty edge case. > Yes, something like that would be necessary. I don't know how difficult that would be, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services