Thread: BUG #8629: Strange resultset when using CTE or a subselect

BUG #8629: Strange resultset when using CTE or a subselect

From
jonathan.camile@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      8629
Logged by:          Jonathan Camile
Email address:      jonathan.camile@gmail.com
PostgreSQL version: 9.2.4
Operating system:   Ubuntu 10.04.4 LTS
Description:

Hey folks!


I have a bit of an issue with a query and I don't understand why.
It might be not very elegant but here it is, when I use the following query
the last result will always be the same whatsoever the values of LIMIT and
OFFSET.




```
WITH filtred_table AS (
    SELECT c.id
    FROM my_table t
    WHERE t.enabled = true
    AND (t.hdata->'field')::integer = ANY ('{16788}')
)
SELECT my_table.id
FROM  my_table
WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table)
ORDER BY my_table.hdata->'field' DESC
LIMIT 5
OFFSET 0


```


Returns


```
392200574
367046368
375126026
407430954
385242163
```


```
WITH filtred_table AS (
    SELECT c.id
    FROM my_table t
    WHERE t.enabled = true
    AND (t.hdata->'field')::integer = ANY ('{16788}')
)
SELECT my_table.id
FROM  my_table
WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table)
ORDER BY my_table.hdata->'field' DESC
LIMIT 5
OFFSET 5


```


Returns


```
186939712
350754246
408507328
381550486
385242163
```


If I don't `ORDER BY` the field used to filter the issue doesn't happen.


Thanks.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
bricklen
Date:
On Mon, Nov 25, 2013 at 2:27 AM, <jonathan.camile@gmail.com> wrote:
>
> WITH filtred_table AS (
>     SELECT c.id
>     FROM my_table t
>     WHERE t.enabled = true
>     AND (t.hdata->'field')::integer = ANY ('{16788}')
> )
> SELECT my_table.id
> FROM  my_table
> WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table)
> ORDER BY my_table.hdata->'field' DESC
> LIMIT 5
> OFFSET 5
>


Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the LIMIT.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
Jonathan Camile
Date:
Hi,

I tried with OFFSET before LIMIT but I have the same problem.
It's always the same record that I have last.


--

Jonathan Camile


On 25 November 2013 18:32, bricklen <bricklen@gmail.com> wrote:

>
> On Mon, Nov 25, 2013 at 2:27 AM, <jonathan.camile@gmail.com> wrote:
>>
>> WITH filtred_table AS (
>>     SELECT c.id
>>     FROM my_table t
>>     WHERE t.enabled = true
>>     AND (t.hdata->'field')::integer = ANY ('{16788}')
>> )
>> SELECT my_table.id
>> FROM  my_table
>> WHERE mycontract.id IN (SELECT filtred_table.id FROM filtred_table)
>> ORDER BY my_table.hdata->'field' DESC
>> LIMIT 5
>> OFFSET 5
>>
>
>
> Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the LIMIT.
>

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
David Johnston
Date:
jonathan.camile wrote
> The following bug has been logged on the website:
>
> Bug reference:      8629
> Logged by:          Jonathan Camile
> Email address:

> jonathan.camile@

> PostgreSQL version: 9.2.4
> Operating system:   Ubuntu 10.04.4 LTS
> Description:
>
> Hey folks!
>
>
> I have a bit of an issue with a query and I don't understand why.
> It might be not very elegant but here it is, when I use the following
> query
> the last result will always be the same whatsoever the values of LIMIT and
> OFFSET.

When debugging ORDER BY/LIMIT it is good practice to output the columns
being ordered, and others if applicable, and omit the LIMIT (or makes it
considerably larger than needed) to see what raw table data the LIMIT clause
is seeing.  At minimum you need; "SELECT my_table.id,
my_table.hdata->'field' FROM ..." for the output and either remove the limit
or show at least 30 records initially.


The small subset of the data you have provided is insufficient to determine
whether you are making bad assumptions about your data or whether there is
actually a problem.  The fact that it is not self-contained makes debugging
difficult as well.

Note that LIMIT and OFFSET can appear in either order.  While there are some
parts of a select statement for which the syntax dictates an order these do
not appear to be in the group.  If they could not then LIMIT 5 OFFSET 100
would make no sense...






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780199.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
bricklen
Date:
>
>> Try with OFFSET 5 LIMIT 5. The offset should be applied *before* the
>> LIMIT.
>>
>
David Johnston just proved my suggestion about the order of ORDER BY and
OFFSET was full of it; I should have tested before suggesting that.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
David Johnston
Date:
jonathan.camile wrote
> CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as
> id, trunc(random() * 9 + 1) as status;
>
> Then if you play with the following query, you will reproduce it.
>
> WITH filtred_test AS (
>     SELECT c.id
>     FROM public.testme c
>     WHERE c.status = ANY ('{5}')
> )
> SELECT mytest.id, mytest.status
> FROM  public.testme mytest
> WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test)
> ORDER BY mytest.status DESC
> OFFSET 35
> LIMIT 10

Strange but not a bug - though I haven't tested it myself to prove out
anything.

You are ordering by a single field that, by definition, has the same value
for every record that the LIMIT/OFFSET sees.  The ORDER BY is effectively a
no-op in this situation and the order of your output is going to be random.
Why it just happens that the same record is always in your random output I
have no clue but it is your query that is lacking here - not PostgreSQL.
You need a secondary order by field, like ID, if you want to guarantee that
different ranges provide different rows.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780342.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
"jonathan.camile"
Date:
Hi David,

Thanks for your reply.

Here a query to create table with which I can reproduce this behavior :

CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as id,
trunc(random() * 9 + 1) as status;

Then if you play with the following query, you will reproduce it.

WITH filtred_test AS (
    SELECT c.id
    FROM public.testme c
    WHERE c.status = ANY ('{5}')
)
SELECT mytest.id, mytest.status
FROM  public.testme mytest
WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test)
ORDER BY mytest.status DESC
OFFSET 35
LIMIT 10

I always have the same last result regardless of the offset or the limit I
use.
If you have any clue about what's wrong I'll be very glad.

By advance, thanks.


David Johnston wrote
>
> jonathan.camile wrote
>> The following bug has been logged on the website:
>>
>> Bug reference:      8629
>> Logged by:          Jonathan Camile
>> Email address:

>> jonathan.camile@

>> PostgreSQL version: 9.2.4
>> Operating system:   Ubuntu 10.04.4 LTS
>> Description:
>>
>> Hey folks!
>>
>>
>> I have a bit of an issue with a query and I don't understand why.
>> It might be not very elegant but here it is, when I use the following
>> query
>> the last result will always be the same whatsoever the values of LIMIT
>> and
>> OFFSET.
> When debugging ORDER BY/LIMIT it is good practice to output the columns
> being ordered, and others if applicable, and omit the LIMIT (or makes it
> considerably larger than needed) to see what raw table data the LIMIT
> clause is seeing.  At minimum you need; "SELECT my_table.id,
> my_table.hdata->'field' FROM ..." for the output and either remove the
> limit or show at least 30 records initially.
>
>
> The small subset of the data you have provided is insufficient to
> determine whether you are making bad assumptions about your data or
> whether there is actually a problem.  The fact that it is not
> self-contained makes debugging difficult as well.
>
> Note that LIMIT and OFFSET can appear in either order.  While there are
> some parts of a select statement for which the syntax dictates an order
> these do not appear to be in the group.  If they could not then LIMIT 5
> OFFSET 100 would make no sense...


David Johnston wrote
>
> jonathan.camile wrote
>> The following bug has been logged on the website:
>>
>> Bug reference:      8629
>> Logged by:          Jonathan Camile
>> Email address:

>> jonathan.camile@

>> PostgreSQL version: 9.2.4
>> Operating system:   Ubuntu 10.04.4 LTS
>> Description:
>>
>> Hey folks!
>>
>>
>> I have a bit of an issue with a query and I don't understand why.
>> It might be not very elegant but here it is, when I use the following
>> query
>> the last result will always be the same whatsoever the values of LIMIT
>> and
>> OFFSET.
> When debugging ORDER BY/LIMIT it is good practice to output the columns
> being ordered, and others if applicable, and omit the LIMIT (or makes it
> considerably larger than needed) to see what raw table data the LIMIT
> clause is seeing.  At minimum you need; "SELECT my_table.id,
> my_table.hdata->'field' FROM ..." for the output and either remove the
> limit or show at least 30 records initially.
>
>
> The small subset of the data you have provided is insufficient to
> determine whether you are making bad assumptions about your data or
> whether there is actually a problem.  The fact that it is not
> self-contained makes debugging difficult as well.
>
> Note that LIMIT and OFFSET can appear in either order.  While there are
> some parts of a select statement for which the syntax dictates an order
> these do not appear to be in the group.  If they could not then LIMIT 5
> OFFSET 100 would make no sense...


David Johnston wrote
>
> jonathan.camile wrote
>> The following bug has been logged on the website:
>>
>> Bug reference:      8629
>> Logged by:          Jonathan Camile
>> Email address:

>> jonathan.camile@

>> PostgreSQL version: 9.2.4
>> Operating system:   Ubuntu 10.04.4 LTS
>> Description:
>>
>> Hey folks!
>>
>>
>> I have a bit of an issue with a query and I don't understand why.
>> It might be not very elegant but here it is, when I use the following
>> query
>> the last result will always be the same whatsoever the values of LIMIT
>> and
>> OFFSET.
> When debugging ORDER BY/LIMIT it is good practice to output the columns
> being ordered, and others if applicable, and omit the LIMIT (or makes it
> considerably larger than needed) to see what raw table data the LIMIT
> clause is seeing.  At minimum you need; "SELECT my_table.id,
> my_table.hdata->'field' FROM ..." for the output and either remove the
> limit or show at least 30 records initially.
>
>
> The small subset of the data you have provided is insufficient to
> determine whether you are making bad assumptions about your data or
> whether there is actually a problem.  The fact that it is not
> self-contained makes debugging difficult as well.
>
> Note that LIMIT and OFFSET can appear in either order.  While there are
> some parts of a select statement for which the syntax dictates an order
> these do not appear to be in the group.  If they could not then LIMIT 5
> OFFSET 100 would make no sense...





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780294.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
bricklen
Date:
On Tue, Nov 26, 2013 at 1:50 AM, jonathan.camile
<jonathan.camile@gmail.com>wrote:

>
> Here a query to create table with which I can reproduce this behavior :
>
> CREATE TABLE public.testme AS SELECT generate_series(134800, 348008) as id,
> trunc(random() * 9 + 1) as status;
>
> Then if you play with the following query, you will reproduce it.
>
> WITH filtred_test AS (
>     SELECT c.id
>     FROM public.testme c
>     WHERE c.status = ANY ('{5}')
> )
> SELECT mytest.id, mytest.status
> FROM  public.testme mytest
> WHERE mytest.id IN (SELECT filtred_test.id FROM filtred_test)
> ORDER BY mytest.status DESC
> OFFSET 35
> LIMIT 10
>
> I always have the same last result regardless of the offset or the limit I
> use.
>


I cannot reproduce your problem in 9.2.5 nor 9.3.1. Rerunning the query
with different OFFSET values gives me different results each time.

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
Tom Lane
Date:
bricklen <bricklen@gmail.com> writes:
> On Tue, Nov 26, 2013 at 1:50 AM, jonathan.camile
> <jonathan.camile@gmail.com>wrote:
>> I always have the same last result regardless of the offset or the limit I
>> use.

> I cannot reproduce your problem in 9.2.5 nor 9.3.1. Rerunning the query
> with different OFFSET values gives me different results each time.

I do reproduce the behavior, at least when offset+limit is small enough
that the executor will use a top-N heapsort.  As already stated, it's
not a bug; the ordering is underdetermined so any set of the correct
number of rows with status = 5 is a valid output.  As for what's actually
happening: the row that always shows up seems to be the physically first
one with status = 5.  I think that's starting out as the top of the top-N
heap, and it just stays there because there's nothing that can displace
it.

            regards, tom lane

Re: BUG #8629: Strange resultset when using CTE or a subselect

From
"jonathan.camile"
Date:
Hi Tom,

Thanks for the answer.
It's clearer now, I might avoid this behavior ordering by id at last.
I still have the impression that is a bit buggy or misleading.
Even with simple query such as :

SELECT c.id
FROM public.testme c
WHERE c.status = ANY ('{5}')
ORDER BY c.status DESC
OFFSET 150
LIMIT 200

the last is always the same but not the penultimate. Even if the ordering is
underdetermined it shouldn't change then LIMIT/OFFSET shouldn't return the
same value for 2 non overlapping ranges.
I find it difficult to understand how it works completely, but my knowledge
are quite limited.

Thanks anyway. :)



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-8629-Strange-resultset-when-using-CTE-or-a-subselect-tp5780187p5780519.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.