Thread: limit-offset different result sets with same query

limit-offset different result sets with same query

From

Emanuel Calvo Franco

Date:

08 May 2009, 21:10:29

Hi all.

I'll make this faster.

I hace this table and this function:


CREATE FUNCTION pg_round_random_range(integer, integer) RETURNS integer
    LANGUAGE plperl IMMUTABLE STRICT
    AS $_X$
my($imin, $imax) = @_;
if ($_[0] == $_[1]){
return $_[0];}
if($imin > $imax){
$imin = $_[1];
$imax = $_[0];}
$_number_ = ( (rand) * ($imax + 1));
while (($_number_ < $imin) && ( $_number_ > $imax)){
        $_number_ = ( (rand) * ($imax + 1));}
return sprintf "%d",$_number_;

$_X$;

CREATE TABLE datos (
    texto text DEFAULT md5((random())::text),
    entero2 smallint DEFAULT (rpad((hashtext((random())::text))::text,
4))::smallint,
    entero4 integer DEFAULT (lpad(((hashtext((random())::text))::text
|| replace((hashtext((random())::text))::text, '-'::text, ''::text)),
9))::integer,
    entero8 bigint DEFAULT (rpad(((hashtext((random())::text))::text
|| replace((hashtext((random())::text))::text, '-'::text, ''::text)),
19))::bigint,
    "float" double precision DEFAULT ((random() * (1000)::double
precision) + random()),
    fecha date DEFAULT (now())::date,
    tiempo timestamp without time zone DEFAULT now(),
    ztiempo timestamp with time zone DEFAULT now(),
    ip cidr DEFAULT ((((((((pg_round_random_range(0, 255))::text ||
'.'::text) || (pg_round_random_range(0, 255))::text) || '.'::text) ||
(pg_round_random_range(0, 255))::text) || '.'::text) ||
(pg_round_random_range(0, 255))::text))::cidr
);

I insert several tuples to make a test with the sentence 'insert into
datos default values'.
This table don't have indexes. There are no users connected exept me
(is a local and
test database).
When i have ~160000 regs i start to make some querys.

Executing 'select * from datos limit 1 offset 150000' two times i have different
result sets.
When  i execute 'explain analyze verbose <query>' i see that (as
expected) the seq scan
is occurring.

Examples:

parapruebas=# select entero8 from datos limit 1 offset 2;
      entero8
--------------------
 477808241937806077
(1 row)

parapruebas=# select entero8 from datos limit 1 offset 2;
      entero8
--------------------
 477808241937806077
(1 row)
                                                QUERY PLAN

----------------------------------------------------------------------------------------------------
 Limit  (cost=0.05..0.07 rows=1 width=8) (actual time=0.033..0.036
rows=1 loops=1)
   ->  Seq Scan on datos  (cost=0.00..4128.00 rows=172800 width=8)
(actual time=0.014..0.020 rows=3
loops=1)
 Total runtime: 0.107 ms
(3 rows)


In this case, on a small offset the result set returns the same. But
in higher offsets:

parapruebas=# select entero8 from datos limit 1 offset 100000;
      entero8
--------------------
 -82136193203177195
(1 row)

parapruebas=# select entero8 from datos limit 1 offset 100000;
       entero8
---------------------
 1201794554456297856
(1 row)
                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------
 Limit  (cost=2388.89..2388.91 rows=1 width=8) (actual
time=622.198..622.201 rows=1 loops=1)
   ->  Seq Scan on datos  (cost=0.00..4128.00 rows=172800 width=8)
(actual time=0.014..356.800 rows=
100001 loops=1)
 Total runtime: 622.247 ms
(3 rows)



That's correct? Is logical that if the scan is sequential in the
physical table returns differents
data?

I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results.

Thanks in advance.

--
      Emanuel Calvo Franco
        Sumate al ARPUG !
        ( www.arpug.com.ar)
    ArPUG / AOSUG Member

Re: limit-offset different result sets with same query

From

Tom Lane

Date:

08 May 2009, 21:24:51

Emanuel Calvo Franco <postgres.arg@gmail.com> writes:
> Executing 'select * from datos limit 1 offset 150000' two times i have different
> result sets.

The "synchronous scan" logic is probably responsible.  Turn off
synchronize_seqscans if this behavior bothers you.

            regards, tom lane

Re: limit-offset different result sets with same query

From

David Fetter

Date:

08 May 2009, 21:34:20

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:
> Hi all.
>
> I'll make this faster.
>
> I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that actually
specifies the order well enough :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: limit-offset different result sets with same query

From

Emanuel Calvo Franco

Date:

08 May 2009, 21:34:53

2009/5/8 Tom Lane <tgl@sss.pgh.pa.us>:
> Emanuel Calvo Franco <postgres.arg@gmail.com> writes:
>> Executing 'select * from datos limit 1 offset 150000' two times i have different
>> result sets.
>
> The "synchronous scan" logic is probably responsible.  Turn off
> synchronize_seqscans if this behavior bothers you.
>
>                        regards, tom lane
>

It works Tom,
Thanks!

--
      Emanuel Calvo Franco
        Sumate al ARPUG !
        ( www.arpug.com.ar)
    ArPUG / AOSUG Member

Re: limit-offset different result sets with same query

From

Emanuel Calvo Franco

Date:

08 May 2009, 21:42:00

2009/5/8 David Fetter <david@fetter.org>:
> On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:
>> Hi all.
>>
>> I'll make this faster.
>>
>> I hace this table and this function:
>
> You should only ever assume that your SELECT's output will have a
> particular ordering when you include an ORDER BY clause that actually
> specifies the order well enough :)
>

I test it in the first time :)

With the 'order by' it works well, but in 'theory' if you
run sequentially and physically a table, you expect obtain the same
results with a same query.
There is no indexes that can intersect the results or inherits
tables.

But with the option synchronize_seqscans in off, it works like
i expected :)

i will study a bit more this option on monday (like always).

--
      Emanuel Calvo Franco
        Sumate al ARPUG !
        ( www.arpug.com.ar)
    ArPUG / AOSUG Member

Re: limit-offset different result sets with same query

From

Alvaro Herrera

Date:

08 May 2009, 21:42:22

David Fetter escribió:
> On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:
> > Hi all.
> >
> > I'll make this faster.
> >
> > I hace this table and this function:
>
> You should only ever assume that your SELECT's output will have a
> particular ordering when you include an ORDER BY clause that actually
> specifies the order well enough :)

Yeah, we went over this on the spanish list, turned out that I couldn't
remember about syncscan :-)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: limit-offset different result sets with same query

From

Adam Rich

Date:

08 May 2009, 21:43:45

Emanuel Calvo Franco wrote:
 >
 > Executing 'select * from datos limit 1 offset 150000' two times i
have different
 > result sets.
 > When  i execute 'explain analyze verbose <query>' i see that (as
 > expected) the seq scan
 > is occurring.
 >
 >
 > That's correct? Is logical that if the scan is sequential in the
 > physical table returns differents
 > data?
 >
 > I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results.
 >
 > Thanks in advance.
 >

Emanuel,
LIMIT and OFFSET are stable only when you have "ORDER BY" on unique
values.  Without that, the database is free to return the rows in
whatever order it deems best, which gives unpredictable results when
combined with LIMIT/OFFSET.

Adam

Re: limit-offset different result sets with same query

From

David Fetter

Date:

09 May 2009, 02:05:46

On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote:
> 2009/5/8 David Fetter <david@fetter.org>:
> > On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:
> >> Hi all.
> >>
> >> I'll make this faster.
> >>
> >> I hace this table and this function:
> >
> > You should only ever assume that your SELECT's output will have a
> > particular ordering when you include an ORDER BY clause that
> > actually specifies the order well enough :)
> >
>
> I test it in the first time :)
>
> With the 'order by' it works well, but in 'theory'

The theory under which you should operate is that the underlying
implementation only gives you the orderings you ask for.  This way,
when other beneficial implementation changes happen, they will not
surprise you. :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: limit-offset different result sets with same query

From

Emanuel Calvo Franco

Date:

09 May 2009, 15:36:00

2009/5/8 David Fetter <david@fetter.org>:
> On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote:
>>
>> I test it in the first time :)
>>
>> With the 'order by' it works well, but in 'theory'
>
> The theory under which you should operate is that the underlying
> implementation only gives you the orderings you ask for.  This way,
> when other beneficial implementation changes happen, they will not
> surprise you. :)

Yeap. now it have more sense.
Is a question of performance and to take notice of that.

Cool.
Thanks to all!,


--
      Emanuel Calvo Franco
        Sumate al ARPUG !
        ( www.arpug.com.ar)
    ArPUG / AOSUG Member

Re: limit-offset different result sets with same query

From

Merlin Moncure

Date:

09 May 2009, 17:24:23

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> David Fetter escribió:
>> On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:
>> > Hi all.
>> >
>> > I'll make this faster.
>> >
>> > I hace this table and this function:
>>
>> You should only ever assume that your SELECT's output will have a
>> particular ordering when you include an ORDER BY clause that actually
>> specifies the order well enough :)
>
> Yeah, we went over this on the spanish list, turned out that I couldn't
> remember about syncscan :-)

I like the new behavior.  It really encourages proper use of order by,
because the natural ordering results are effectively randomized.  A
class of subtle bugs has been made obvious.  :)

merlin

Re: limit-offset different result sets with same query

From

Tom Lane

Date:

09 May 2009, 17:28:12

Merlin Moncure <mmoncure@gmail.com> writes:
> On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> Yeah, we went over this on the spanish list, turned out that I couldn't
>> remember about syncscan :-)

> I like the new behavior.  It really encourages proper use of order by,
> because the natural ordering results are effectively randomized.  A
> class of subtle bugs has been made obvious.  :)

Not really, because the syncscan behavior only kicks in when your table
gets large ... you'll never see it during devel testing on toy tables ...

            regards, tom lane

Re: limit-offset different result sets with same query

From

David Fetter

Date:

10 May 2009, 08:04:17

On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
> > <alvherre@commandprompt.com> wrote:
> >> Yeah, we went over this on the spanish list, turned out that I
> >> couldn't remember about syncscan :-)
>
> > I like the new behavior.  It really encourages proper use of order
> > by, because the natural ordering results are effectively
> > randomized.  A class of subtle bugs has been made obvious.  :)
>
> Not really, because the syncscan behavior only kicks in when your
> table gets large ... you'll never see it during devel testing on toy
> tables ...

Good point.  It's important not to test only on toy-sized tables for
lots and lots of good reasons, scale-dependence of sync scans being a
small one.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: limit-offset different result sets with same query

From

Scott Marlowe

Date:

11 May 2009, 05:05:12

On Sun, May 10, 2009 at 2:03 AM, David Fetter <david@fetter.org> wrote:
> On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote:
>> Merlin Moncure <mmoncure@gmail.com> writes:
>> > On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
>> > <alvherre@commandprompt.com> wrote:
>> >> Yeah, we went over this on the spanish list, turned out that I
>> >> couldn't remember about syncscan :-)
>>
>> > I like the new behavior.  It really encourages proper use of order
>> > by, because the natural ordering results are effectively
>> > randomized.  A class of subtle bugs has been made obvious.  :)
>>
>> Not really, because the syncscan behavior only kicks in when your
>> table gets large ... you'll never see it during devel testing on toy
>> tables ...
>
> Good point.  It's important not to test only on toy-sized tables for
> lots and lots of good reasons, scale-dependence of sync scans being a
> small one.

Last job I was at I was the lone pgsql guy who worked with three
Oracle DBAs, and quite a few of them were caught off guard by this
type of behaviour (it was with hash_agg and reporting queries with
group by).

Re: limit-offset different result sets with same query

From

Emanuel Calvo Franco

Date:

11 May 2009, 12:40:46

2009/5/9 Tom Lane <tgl@sss.pgh.pa.us>:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>>> Yeah, we went over this on the spanish list, turned out that I couldn't
>>> remember about syncscan :-)
>
>> I like the new behavior.  It really encourages proper use of order by,
>> because the natural ordering results are effectively randomized.  A
>> class of subtle bugs has been made obvious.  :)
>
> Not really, because the syncscan behavior only kicks in when your table
> gets large ... you'll never see it during devel testing on toy tables ...
>
>                        regards, tom lane
>

Yeap. If you see one of the test i made, you'll see this switch over the
~100 regs.


--
      Emanuel Calvo Franco
        Sumate al ARPUG !
        ( www.arpug.com.ar)
    ArPUG / AOSUG Member