Thread: BUG #15007: LIMIT not respected in sub-queries

BUG #15007: LIMIT not respected in sub-queries

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15007
Logged by:          Will Storey
Email address:      will@summercat.com
PostgreSQL version: 10.1
Operating system:   Ubuntu 16.04
Description:

Hello,

I am not sure this is a bug. But it is surprising to me and seems to
contradict the documentation in terms of join nesting.

I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT
clause. I've found that sometimes I receive more rows (at most one extra in
my testing) than the LIMIT, where I expected only as many rows as the LIMIT.
This depends on the query plan. With some plans it never happens, and with
others it happens frequently.

In looking into this behaviour, I came across hints that this is a known
quirk. I found bug reports related specifically to UPDATE/DELETE that sound
similar to this, but no mention that the behaviour can happen with SELECT:

https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1
(note the comments on the accepted answer)
https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com
(and the thread)
https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com

This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL
repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL
9.6.5 (where I initially encountered the behaviour).

Unfortunately my test case is not very clean and it is somewhat long, so
I've put it in a gist on GitHub:

https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0

The SELECT query (run by the Perl program) quickly prints out that it
receives 6 rows.

As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has
loops > 1. I believe this is the cause of the behaviour. If I tweak the test
to have a plan where that node runs before the Seq Scan, there are never
more than 5 rows.

I believe a better way to write this query would be to use a CTE.

Thank you for your time!


Re: BUG #15007: LIMIT not respected in sub-queries

From
Tomas Vondra
Date:
Hi,

On 01/11/2018 10:16 PM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15007
> Logged by:          Will Storey
> Email address:      will@summercat.com
> PostgreSQL version: 10.1
> Operating system:   Ubuntu 16.04
> Description:        
> 
> Hello,
> 
> I am not sure this is a bug. But it is surprising to me and seems to
> contradict the documentation in terms of join nesting.
> 
> I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT
> clause. I've found that sometimes I receive more rows (at most one extra in
> my testing) than the LIMIT, where I expected only as many rows as the LIMIT.
> This depends on the query plan. With some plans it never happens, and with
> others it happens frequently.
> 
> In looking into this behaviour, I came across hints that this is a known
> quirk. I found bug reports related specifically to UPDATE/DELETE that sound
> similar to this, but no mention that the behaviour can happen with SELECT:
> 
> https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1
> (note the comments on the accepted answer)
> https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com
> (and the thread)
> https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com
> 
> This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL
> repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL
> 9.6.5 (where I initially encountered the behaviour).
> 

I think you're right those issues are related - a change of plan may
affect the results in a somewhat non-deterministic way. In the cases you
linked it's UPDATE on additional rows, in your case it's more about
random() during rescans. I'll get to that in a minute.

> Unfortunately my test case is not very clean and it is somewhat long, so
> I've put it in a gist on GitHub:
> 
> https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0
> 
> The SELECT query (run by the Perl program) quickly prints out that it
> receives 6 rows.
> 
> As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has
> loops > 1. I believe this is the cause of the behaviour. If I tweak the test
> to have a plan where that node runs before the Seq Scan, there are never
> more than 5 rows.
>

I went through the test case, and I think I know what's going on. The
script did not reproduce the issue for me, but I think I've been able to
tweak the query to use a plan producing more issues.

The key is to force a particular join order, and disable operations that
would materialize intermediate results of the t1 scan. That is, we want
something like this:

   -> nestloop
       -> nestloop t2 + t3
       -> seq scan t1

Firstly, I've replaced the dynamic random() condition with a static one:

    random() <= 0.5

this is not strictly necessary, but it simplifies the plan. We need the
random() call though, as I'll explain later.

Then I've disabled a bunch of plan nodes that would materialize results
of the t1 seq scan:

    set enable_material = off;
    set enable_sort = off;
    set enable_hashjoin = off;

And finally, I've disabled join reordering by setting

    set join_collapse_limit = 1;

Now, if you rewrite the query like this (which essentially just forces a
particular join order, when combined with join_collapse_limit=1, nothing
else):

explain analyze SELECT * FROM
(t3 JOIN t2 USING (t2_id)) JOIN
(SELECT * FROM t1
 WHERE t1_id IS NOT NULL AND
 t1_id < 100 AND
 t1_val LIKE 'h%' AND
 random() <= 0.5
 LIMIT 5
) AS t1
ON t3.t1_id = t1.t1_id
WHERE t2.t2_val LIKE 'he%';

you will get plans like this:


                              QUERY PLAN
------------------------------------------------------------------------
 Nested Loop  (cost=0.00..31.41 rows=5 width=21)
              (actual time=0.052..1.825 rows=7 loops=1)
   Join Filter: (t3.t1_id = t1.t1_id)
   Rows Removed by Join Filter: 73
   ->  Nested Loop  (cost=0.00..16.10 rows=16 width=14)
                    (actual time=0.030..0.917 rows=16 loops=1)
         Join Filter: (t3.t2_id = t2.t2_id)
         Rows Removed by Join Filter: 160
         ->  Seq Scan on t2  (cost=0.00..1.14 rows=11 width=10)
                             (actual time=0.015..0.041 rows=11 loops=1)
               Filter: ((t2_val)::text ~~ 'he%'::text)
         ->  Seq Scan on t3  (cost=0.00..1.16 rows=16 width=8)
                             (actual time=0.003..0.037 rows=16 loops=11)
   ->  Limit  (cost=0.00..0.84 rows=5 width=7)
              (actual time=0.008..0.037 rows=5 loops=16)
         ->  Seq Scan on t1  (cost=0.00..1.52 rows=9 width=7)
                             (actual time=0.004..0.016 rows=5 loops=16)
               Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND
                        ((t1_val)::text ~~ 'h%'::text) AND
                         (random() <= '0.5'::double precision))
               Rows Removed by Filter: 5
 Planning time: 0.625 ms
 Execution time: 1.911 ms
(15 rows)


This should be equivalent to the original query, and should produce the
same results (modulo random() of course).

But notice it actually does produce 7 rows!

 Nested Loop  (cost=0.00..31.41 rows=5 width=21)
              (actual time=0.052..1.825 rows=7 loops=1)
                                             ^

The problem is that it ends up executing the sequential scan on t1
repeatedly (because it's the inner relation in a nested loop), and
because random() is volatile, the results of the scan are likely different.

Each rescan individually still respects the LIMIT, but when combined
result may be larger - there may be more unique IDs, matching additional
rows from the other tables.

> 
> I believe a better way to write this query would be to use a CTE.
> 

Yes, that will stabilize the output of the random() function,
eliminating the nondeterminism during rescans.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #15007: LIMIT not respected in sub-queries

From
Will Storey
Date:
On Thu 2018-01-18 00:08:31 +0100, Tomas Vondra wrote:

<snip>

> I went through the test case, and I think I know what's going on. The
> script did not reproduce the issue for me, but I think I've been able to
> tweak the query to use a plan producing more issues.
> 
> The key is to force a particular join order, and disable operations that
> would materialize intermediate results of the t1 scan. That is, we want
> something like this:
> 
>    -> nestloop
>        -> nestloop t2 + t3
>        -> seq scan t1
> 
> Firstly, I've replaced the dynamic random() condition with a static one:
> 
>     random() <= 0.5
> 
> this is not strictly necessary, but it simplifies the plan. We need the
> random() call though, as I'll explain later.
> 
> Then I've disabled a bunch of plan nodes that would materialize results
> of the t1 seq scan:
> 
>     set enable_material = off;
>     set enable_sort = off;
>     set enable_hashjoin = off;
> 
> And finally, I've disabled join reordering by setting
> 
>     set join_collapse_limit = 1;
> 
> Now, if you rewrite the query like this (which essentially just forces a
> particular join order, when combined with join_collapse_limit=1, nothing
> else):
> 
> explain analyze SELECT * FROM
> (t3 JOIN t2 USING (t2_id)) JOIN
> (SELECT * FROM t1
>  WHERE t1_id IS NOT NULL AND
>  t1_id < 100 AND
>  t1_val LIKE 'h%' AND
>  random() <= 0.5
>  LIMIT 5
> ) AS t1
> ON t3.t1_id = t1.t1_id
> WHERE t2.t2_val LIKE 'he%';
> 
> you will get plans like this:
> 
> 
>                               QUERY PLAN
> ------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..31.41 rows=5 width=21)
>               (actual time=0.052..1.825 rows=7 loops=1)
>    Join Filter: (t3.t1_id = t1.t1_id)
>    Rows Removed by Join Filter: 73
>    ->  Nested Loop  (cost=0.00..16.10 rows=16 width=14)
>                     (actual time=0.030..0.917 rows=16 loops=1)
>          Join Filter: (t3.t2_id = t2.t2_id)
>          Rows Removed by Join Filter: 160
>          ->  Seq Scan on t2  (cost=0.00..1.14 rows=11 width=10)
>                              (actual time=0.015..0.041 rows=11 loops=1)
>                Filter: ((t2_val)::text ~~ 'he%'::text)
>          ->  Seq Scan on t3  (cost=0.00..1.16 rows=16 width=8)
>                              (actual time=0.003..0.037 rows=16 loops=11)
>    ->  Limit  (cost=0.00..0.84 rows=5 width=7)
>               (actual time=0.008..0.037 rows=5 loops=16)
>          ->  Seq Scan on t1  (cost=0.00..1.52 rows=9 width=7)
>                              (actual time=0.004..0.016 rows=5 loops=16)
>                Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND
>                         ((t1_val)::text ~~ 'h%'::text) AND
>                          (random() <= '0.5'::double precision))
>                Rows Removed by Filter: 5
>  Planning time: 0.625 ms
>  Execution time: 1.911 ms
> (15 rows)
> 
> 
> This should be equivalent to the original query, and should produce the
> same results (modulo random() of course).
> 
> But notice it actually does produce 7 rows!
> 
>  Nested Loop  (cost=0.00..31.41 rows=5 width=21)
>               (actual time=0.052..1.825 rows=7 loops=1)
>                                              ^
> 
> The problem is that it ends up executing the sequential scan on t1
> repeatedly (because it's the inner relation in a nested loop), and
> because random() is volatile, the results of the scan are likely different.
> 
> Each rescan individually still respects the LIMIT, but when combined
> result may be larger - there may be more unique IDs, matching additional
> rows from the other tables.
> 
> > 
> > I believe a better way to write this query would be to use a CTE.
> > 
> 
> Yes, that will stabilize the output of the random() function,
> eliminating the nondeterminism during rescans.

Wow, thank you for the great explanation of what is going on!

It sounds like this is not really a bug then and is just something to be
expected when using LIMIT with such volatile joins.

I suppose I expected that if there is a LIMIT then that would be the
maximum number of rows the subquery would ever provide. The planner would
have to force materializing/stabilizing in such cases it sounds like. Maybe
that is not possible or not a good idea. It does seem like a pretty edge
case.

Thanks again!


Re: BUG #15007: LIMIT not respected in sub-queries

From
Tomas Vondra
Date:

On 01/19/2018 02:55 AM, Will Storey wrote:
> On Thu 2018-01-18 00:08:31 +0100, Tomas Vondra wrote:
> 
> <snip>
> 
>> I went through the test case, and I think I know what's going on. The
>> script did not reproduce the issue for me, but I think I've been able to
>> tweak the query to use a plan producing more issues.
>>
>> The key is to force a particular join order, and disable operations that
>> would materialize intermediate results of the t1 scan. That is, we want
>> something like this:
>>
>>    -> nestloop
>>        -> nestloop t2 + t3
>>        -> seq scan t1
>>
>> Firstly, I've replaced the dynamic random() condition with a static one:
>>
>>     random() <= 0.5
>>
>> this is not strictly necessary, but it simplifies the plan. We need the
>> random() call though, as I'll explain later.
>>
>> Then I've disabled a bunch of plan nodes that would materialize results
>> of the t1 seq scan:
>>
>>     set enable_material = off;
>>     set enable_sort = off;
>>     set enable_hashjoin = off;
>>
>> And finally, I've disabled join reordering by setting
>>
>>     set join_collapse_limit = 1;
>>
>> Now, if you rewrite the query like this (which essentially just forces a
>> particular join order, when combined with join_collapse_limit=1, nothing
>> else):
>>
>> explain analyze SELECT * FROM
>> (t3 JOIN t2 USING (t2_id)) JOIN
>> (SELECT * FROM t1
>>  WHERE t1_id IS NOT NULL AND
>>  t1_id < 100 AND
>>  t1_val LIKE 'h%' AND
>>  random() <= 0.5
>>  LIMIT 5
>> ) AS t1
>> ON t3.t1_id = t1.t1_id
>> WHERE t2.t2_val LIKE 'he%';
>>
>> you will get plans like this:
>>
>>
>>                               QUERY PLAN
>> ------------------------------------------------------------------------
>>  Nested Loop  (cost=0.00..31.41 rows=5 width=21)
>>               (actual time=0.052..1.825 rows=7 loops=1)
>>    Join Filter: (t3.t1_id = t1.t1_id)
>>    Rows Removed by Join Filter: 73
>>    ->  Nested Loop  (cost=0.00..16.10 rows=16 width=14)
>>                     (actual time=0.030..0.917 rows=16 loops=1)
>>          Join Filter: (t3.t2_id = t2.t2_id)
>>          Rows Removed by Join Filter: 160
>>          ->  Seq Scan on t2  (cost=0.00..1.14 rows=11 width=10)
>>                              (actual time=0.015..0.041 rows=11 loops=1)
>>                Filter: ((t2_val)::text ~~ 'he%'::text)
>>          ->  Seq Scan on t3  (cost=0.00..1.16 rows=16 width=8)
>>                              (actual time=0.003..0.037 rows=16 loops=11)
>>    ->  Limit  (cost=0.00..0.84 rows=5 width=7)
>>               (actual time=0.008..0.037 rows=5 loops=16)
>>          ->  Seq Scan on t1  (cost=0.00..1.52 rows=9 width=7)
>>                              (actual time=0.004..0.016 rows=5 loops=16)
>>                Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND
>>                         ((t1_val)::text ~~ 'h%'::text) AND
>>                          (random() <= '0.5'::double precision))
>>                Rows Removed by Filter: 5
>>  Planning time: 0.625 ms
>>  Execution time: 1.911 ms
>> (15 rows)
>>
>>
>> This should be equivalent to the original query, and should produce the
>> same results (modulo random() of course).
>>
>> But notice it actually does produce 7 rows!
>>
>>  Nested Loop  (cost=0.00..31.41 rows=5 width=21)
>>               (actual time=0.052..1.825 rows=7 loops=1)
>>                                              ^
>>
>> The problem is that it ends up executing the sequential scan on t1
>> repeatedly (because it's the inner relation in a nested loop), and
>> because random() is volatile, the results of the scan are likely different.
>>
>> Each rescan individually still respects the LIMIT, but when combined
>> result may be larger - there may be more unique IDs, matching additional
>> rows from the other tables.
>>
>>>
>>> I believe a better way to write this query would be to use a CTE.
>>>
>>
>> Yes, that will stabilize the output of the random() function,
>> eliminating the nondeterminism during rescans.
> 
> Wow, thank you for the great explanation of what is going on!
> 
> It sounds like this is not really a bug then and is just something to
> be expected when using LIMIT with such volatile joins.
> 

The volatile part here is not the join, but the random function during
rescans of a relation. But yeah, I don't think this qualifies as a bug.

> 
> I suppose I expected that if there is a LIMIT then that would be the 
> maximum number of rows the subquery would ever provide. The planner
> would have to force materializing/stabilizing in such cases it sounds
> like. Maybe that is not possible or not a good idea. It does seem
> like a pretty edge case.
> 

Yes, something like that would be necessary. I don't know how difficult
that would be, though.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services