Thread: BUG #13790: last row of limit/offset result produces duplicates

BUG #13790: last row of limit/offset result produces duplicates

From
pbelbin@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      13790
Logged by:          Peter Belbin
Email address:      pbelbin@gmail.com
PostgreSQL version: 9.4.5
Operating system:   Mac
Description:

the following table:

CREATE TABLE demo1
(
  id serial NOT NULL,
  code character varying(20),
  match character varying(20),
  CONSTRAINT demo1_pk PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

populated with rows where many of the 'code' rows contain the same value,
and then queried with something like:

select * from proxy_homing order by code limit 10 offset 40

is returning the exact same result in the last row, regardless of the
offset.

here is some sample data:

insert into demo1 (code, match) values ('AAAA', '203' );
insert into demo1 (code, match) values ('BBBB', '204' );
insert into demo1 (code, match) values ('CCCC', '206' );
insert into demo1 (code, match) values ('DDDD', '207' );
insert into demo1 (code, match) values ('EEEE', '208' );
insert into demo1 (code, match) values ('FFFF', '212' );
insert into demo1 (code, match) values ('CHCG', '215' );
insert into demo1 (code, match) values ('CHCG', '216' );
insert into demo1 (code, match) values ('CHCG', '217' );
insert into demo1 (code, match) values ('CHCG', '218' );
insert into demo1 (code, match) values ('CHCG', '219' );
insert into demo1 (code, match) values ('CHCG', '224' );
insert into demo1 (code, match) values ('CHCG', '226' );
insert into demo1 (code, match) values ('CHCG', '231' );
insert into demo1 (code, match) values ('CHCG', '234' );
insert into demo1 (code, match) values ('CHCG', '240' );
insert into demo1 (code, match) values ('CHCG', '248' );
insert into demo1 (code, match) values ('CHCG', '250' );
insert into demo1 (code, match) values ('CHCG', '253' );
insert into demo1 (code, match) values ('CHCG', '260' );
insert into demo1 (code, match) values ('CHCG', '262' );
insert into demo1 (code, match) values ('CHCG', '267' );
insert into demo1 (code, match) values ('CHCG', '269' );
insert into demo1 (code, match) values ('CHCG', '270' );
insert into demo1 (code, match) values ('CHCG', '276' );
insert into demo1 (code, match) values ('CHCG', '289' );
insert into demo1 (code, match) values ('CHCG', '301' );
insert into demo1 (code, match) values ('CHCG', '302' );
insert into demo1 (code, match) values ('CHCG', '304' );
insert into demo1 (code, match) values ('CHCG', '306' );
insert into demo1 (code, match) values ('CHCG', '307' );
insert into demo1 (code, match) values ('CHCG', '308' );
insert into demo1 (code, match) values ('CHCG', '309' );
insert into demo1 (code, match) values ('CHCG', '312' );
insert into demo1 (code, match) values ('CHCG', '313' );
insert into demo1 (code, match) values ('CHCG', '314' );
insert into demo1 (code, match) values ('CHCG', '315' );
insert into demo1 (code, match) values ('CHCG', '317' );
insert into demo1 (code, match) values ('CHCG', '319' );
insert into demo1 (code, match) values ('CHCG', '320' );
insert into demo1 (code, match) values ('CHCG', '330' );
insert into demo1 (code, match) values ('CHCG', '331' );
insert into demo1 (code, match) values ('CHCG', '339' );
insert into demo1 (code, match) values ('CHCG', '347' );
insert into demo1 (code, match) values ('CHCG', '351' );
insert into demo1 (code, match) values ('CHCG', '360' );
insert into demo1 (code, match) values ('CHCG', '401' );

the last result row should not appear more than once in the output!  but it
does!

eg:

select * from demo1 order by code limit 5 offset 10

18;CHCG;240
19;CHCG;248
20;CHCG;250
14;CHCG;224
10;CHCG;216

and then:

select * from demo1 order by code limit 5 offset 20

28;CHCG;289
29;CHCG;301
30;CHCG;302
14;CHCG;224
10;CHCG;216


the last row of the two result sets above is the same row!

changing the sorting so that it includes the match column appears to avoid
the issue, but, this is a bug.  each row should only appear once if the
limit/offset values are looking at different portions of the result that
would be produced without the limit/offset options.

regards,
peter

Re: BUG #13790: last row of limit/offset result produces duplicates

From
"David G. Johnston"
Date:
On Wed, Dec 2, 2015 at 2:02 PM, <pbelbin@gmail.com> wrote:

> select * from demo1 order by code limit 5 offset 10
>
> 18;CHCG;240
> 19;CHCG;248
> 20;CHCG;250
> 14;CHCG;224
> 10;CHCG;216
>
> and then:
>
> select * from demo1 order by code limit 5 offset 20
>
> 28;CHCG;289
> 29;CHCG;301
> 30;CHCG;302
> 14;CHCG;224
> 10;CHCG;216
>
>
> the last row of the two result sets above is the same row!
>
> changing the sorting so that it includes the match column appears to avoi=
d
> the issue, but, this is a bug.  each row should only appear once if the
> limit/offset values are looking at different portions of the result that
> would be produced without the limit/offset options
> =E2=80=8B.
>

I'm sorry but no, that is not how this works.  The two queries are
independent and if you fail to adequately specify the ORDER BY the fault is
yours, not the system's.

David J.

Re: BUG #13790: last row of limit/offset result produces duplicates

From
Tom Lane
Date:
pbelbin@gmail.com writes:
> ... populated with rows where many of the 'code' rows contain the same value,
> and then queried with something like:
> select * from proxy_homing order by code limit 10 offset 40
> is returning the exact same result in the last row, regardless of the
> offset.

Sorry, this is not a bug.  If you have an underspecified ORDER BY
ordering, the sorting code is entitled to return equal-keyed rows
in any order whatsoever, and there is no guarantee that changing
the limit/offset parameters won't affect that.

> changing the sorting so that it includes the match column appears to avoid
> the issue, but, this is a bug.  each row should only appear once if the
> limit/offset values are looking at different portions of the result that
> would be produced without the limit/offset options.

We do not make any guarantee that that is how limit/offset works.  You
will only get consistent results across queries if the underlying result
ordering is guaranteed consistent across queries, but that's not the case
in this example.  Postgres would be within its rights to return different
rowsets on different executions even without any change in limit/offset,
and indeed it can do so in some cases.

(The technical reason why this happens in this particular case is that
when using a bounded heap to do a top-N sort, which of the equal-keyed
tuples survive to the end of the sort depends heavily on the exact heap
size, ie the LIMIT+OFFSET sum; and the order in which the survivors end
up getting output is also dependent on the exact heap size.  But there
are other mechanisms that could cause the results to be unstable.)

            regards, tom lane