Thread: BUG #4286: ORDER BY returns inconsistent results when using LIMIT on a integer column set to default values

The following bug has been logged online:

Bug reference:      4286
Logged by:          Thibauld Favre
Email address:      tfavre@gmail.com
PostgreSQL version: 8.3.3
Operating system:   Ubuntu Hardy Heron (8.04.1)
Description:        ORDER BY returns inconsistent results when using LIMIT
on a integer column set to default values
Details:

Hi,

Here's a little SQL script that recreate the bug I encounter on my app.
Basically, on certain circonstances, the first value of the table (here 'a')
is constantly returned at the end of the result set, thus creating
inconsistency between queries. I'm not sure I'm clear so here's the little
script:

DROP TABLE IF EXISTS a;
CREATE TABLE a (
    id serial PRIMARY KEY,
    name text NOT NULL,
    popularity integer NOT NULL default 0
);

INSERT INTO a (name) VALUES
('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'),
('i'), ('j'), ('k'), ('l'), ('m'), ('n'), ('o');

SELECT name FROM a ORDER BY popularity LIMIT 1; -- OK
SELECT name FROM a ORDER BY popularity LIMIT 2; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 3; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 4; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 5; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 6; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 7; -- INCONSISTENT RESULT
SELECT name FROM a ORDER BY popularity LIMIT 8; -- OK
SELECT name FROM a ORDER BY popularity LIMIT 9; -- OK

Here's what I get as a result on my server. See how 'a' is systematically
put at the end of the result set until the LIMIT clause reaches the value 8.
Above 8, the results get consistent again. Note that the value of 8 is table
specific: if the test was too be performed on another table, the value
changes.

 name
------
 a
(1 ligne)

 name
------
 b
 a
(2 lignes)

 name
------
 b
 c
 a
(3 lignes)

 name
------
 b
 c
 d
 a
(4 lignes)

 name
------
 b
 c
 d
 e
 a
(5 lignes)

 name
------
 b
 c
 d
 e
 f
 a
(6 lignes)

 name
------
 b
 c
 d
 e
 f
 g
 a
(7 lignes)

 name
------
 a
 b
 c
 d
 e
 f
 g
 h
(8 lignes)


Please do not hesitate to contact me if you have any questions regarding
this bug.
Besides this little anoyance, thanks for making postresql such a great
tool!

Thibauld Favre
Thibauld Favre wrote:
> Here's a little SQL script that recreate the bug I encounter on my app.
> Basically, on certain circonstances, the first value of the table (here 'a')
> is constantly returned at the end of the result set, thus creating
> inconsistency between queries. I'm not sure I'm clear so here's the little
> script:
>
> DROP TABLE IF EXISTS a;
> CREATE TABLE a (
>     id serial PRIMARY KEY,
>     name text NOT NULL,
>     popularity integer NOT NULL default 0
> );
>
> INSERT INTO a (name) VALUES
> ('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'),
> ('i'), ('j'), ('k'), ('l'), ('m'), ('n'), ('o');
>
> SELECT name FROM a ORDER BY popularity LIMIT 1; -- OK
> SELECT name FROM a ORDER BY popularity LIMIT 2; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 3; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 4; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 5; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 6; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 7; -- INCONSISTENT RESULT
> SELECT name FROM a ORDER BY popularity LIMIT 8; -- OK
> SELECT name FROM a ORDER BY popularity LIMIT 9; -- OK
>
> Here's what I get as a result on my server. See how 'a' is systematically
> put at the end of the result set until the LIMIT clause reaches the value 8.
> Above 8, the results get consistent again. Note that the value of 8 is table
> specific: if the test was too be performed on another table, the value
> changes.

Doesn't look like a bug to me. All the rows have the same value in
popularity, so the "ORDER BY popularity" doesn't force any particular
order. This is effectively the same as if there was no ORDER BY at all;
the database is free to return the rows in any random order it wishes.

You can use ORDER BY popularity, name DESC for the order you were
expecting..

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Thibauld Favre escribió:

> Here's a little SQL script that recreate the bug I encounter on my app.
> Basically, on certain circonstances, the first value of the table (here 'a')
> is constantly returned at the end of the result set, thus creating
> inconsistency between queries. I'm not sure I'm clear so here's the little
> script:
>
> DROP TABLE IF EXISTS a;
> CREATE TABLE a (
>     id serial PRIMARY KEY,
>     name text NOT NULL,
>     popularity integer NOT NULL default 0
> );
>
> INSERT INTO a (name) VALUES
> ('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'),
> ('i'), ('j'), ('k'), ('l'), ('m'), ('n'), ('o');

But they all have the same popularity of 0, so what's the problem?
Maybe what you need is to change the ORDER BY clause:

SELECT name FROM a ORDER BY popularity, name LIMIT 1;

Or perhaps you need to rephrase your query to say "give me all the
packages that have a popularity greater or equal than the lowest
popularity among the N less-popular packages"

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Thibauld Favre wrote:
>> Here's what I get as a result on my server. See how 'a' is systematically
>> put at the end of the result set until the LIMIT clause reaches the value 8.
>> Above 8, the results get consistent again.

> Doesn't look like a bug to me.

It isn't; in fact this behavior is specifically disclaimed in the docs
(http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-LIMIT):

    Thus, using different LIMIT/OFFSET values to select different
    subsets of a query result *will give inconsistent results*
    unless you enforce a predictable result ordering with ORDER BY.

Since the query's ORDER BY isn't sufficient to constrain the row
ordering, the observed behavior is covered by this statement.

(FWIW, it's unlikely there's any plan change involved here.  What I
think is happening is that the first row is being selected as the
quicksort pivot item.)

            regards, tom lane
Hi,

Thank you for your kind answers and for not having killed me despite
the fact that I report a bu^H^H documented feature. It had no impact
on me since there's an easy workaround to this behaviour (like you
reported), I just really thought it was a bug...
Sorry guys,

Thibauld

2008/7/8 Tom Lane <tgl@sss.pgh.pa.us>:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Thibauld Favre wrote:
>>> Here's what I get as a result on my server. See how 'a' is systematically
>>> put at the end of the result set until the LIMIT clause reaches the value 8.
>>> Above 8, the results get consistent again.
>
>> Doesn't look like a bug to me.
>
> It isn't; in fact this behavior is specifically disclaimed in the docs
> (http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-LIMIT):
>
>        Thus, using different LIMIT/OFFSET values to select different
>        subsets of a query result *will give inconsistent results*
>        unless you enforce a predictable result ordering with ORDER BY.
>
> Since the query's ORDER BY isn't sufficient to constrain the row
> ordering, the observed behavior is covered by this statement.
>
> (FWIW, it's unlikely there's any plan change involved here.  What I
> think is happening is that the first row is being selected as the
> quicksort pivot item.)
>
>                        regards, tom lane
>