Thread: limit-offset different result sets with same query
Hi all. I'll make this faster. I hace this table and this function: CREATE FUNCTION pg_round_random_range(integer, integer) RETURNS integer LANGUAGE plperl IMMUTABLE STRICT AS $_X$ my($imin, $imax) = @_; if ($_[0] == $_[1]){ return $_[0];} if($imin > $imax){ $imin = $_[1]; $imax = $_[0];} $_number_ = ( (rand) * ($imax + 1)); while (($_number_ < $imin) && ( $_number_ > $imax)){ $_number_ = ( (rand) * ($imax + 1));} return sprintf "%d",$_number_; $_X$; CREATE TABLE datos ( texto text DEFAULT md5((random())::text), entero2 smallint DEFAULT (rpad((hashtext((random())::text))::text, 4))::smallint, entero4 integer DEFAULT (lpad(((hashtext((random())::text))::text || replace((hashtext((random())::text))::text, '-'::text, ''::text)), 9))::integer, entero8 bigint DEFAULT (rpad(((hashtext((random())::text))::text || replace((hashtext((random())::text))::text, '-'::text, ''::text)), 19))::bigint, "float" double precision DEFAULT ((random() * (1000)::double precision) + random()), fecha date DEFAULT (now())::date, tiempo timestamp without time zone DEFAULT now(), ztiempo timestamp with time zone DEFAULT now(), ip cidr DEFAULT ((((((((pg_round_random_range(0, 255))::text || '.'::text) || (pg_round_random_range(0, 255))::text) || '.'::text) || (pg_round_random_range(0, 255))::text) || '.'::text) || (pg_round_random_range(0, 255))::text))::cidr ); I insert several tuples to make a test with the sentence 'insert into datos default values'. This table don't have indexes. There are no users connected exept me (is a local and test database). When i have ~160000 regs i start to make some querys. Executing 'select * from datos limit 1 offset 150000' two times i have different result sets. When i execute 'explain analyze verbose <query>' i see that (as expected) the seq scan is occurring. Examples: parapruebas=# select entero8 from datos limit 1 offset 2; entero8 -------------------- 477808241937806077 (1 row) parapruebas=# select entero8 from datos limit 1 offset 2; entero8 -------------------- 477808241937806077 (1 row) QUERY PLAN ---------------------------------------------------------------------------------------------------- Limit (cost=0.05..0.07 rows=1 width=8) (actual time=0.033..0.036 rows=1 loops=1) -> Seq Scan on datos (cost=0.00..4128.00 rows=172800 width=8) (actual time=0.014..0.020 rows=3 loops=1) Total runtime: 0.107 ms (3 rows) In this case, on a small offset the result set returns the same. But in higher offsets: parapruebas=# select entero8 from datos limit 1 offset 100000; entero8 -------------------- -82136193203177195 (1 row) parapruebas=# select entero8 from datos limit 1 offset 100000; entero8 --------------------- 1201794554456297856 (1 row) QUERY PLAN ---------------------------------------------------------------------------------------------------- Limit (cost=2388.89..2388.91 rows=1 width=8) (actual time=622.198..622.201 rows=1 loops=1) -> Seq Scan on datos (cost=0.00..4128.00 rows=172800 width=8) (actual time=0.014..356.800 rows= 100001 loops=1) Total runtime: 622.247 ms (3 rows) That's correct? Is logical that if the scan is sequential in the physical table returns differents data? I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results. Thanks in advance. -- Emanuel Calvo Franco Sumate al ARPUG ! ( www.arpug.com.ar) ArPUG / AOSUG Member
Emanuel Calvo Franco <postgres.arg@gmail.com> writes: > Executing 'select * from datos limit 1 offset 150000' two times i have different > result sets. The "synchronous scan" logic is probably responsible. Turn off synchronize_seqscans if this behavior bothers you. regards, tom lane
On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote: > Hi all. > > I'll make this faster. > > I hace this table and this function: You should only ever assume that your SELECT's output will have a particular ordering when you include an ORDER BY clause that actually specifies the order well enough :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2009/5/8 Tom Lane <tgl@sss.pgh.pa.us>: > Emanuel Calvo Franco <postgres.arg@gmail.com> writes: >> Executing 'select * from datos limit 1 offset 150000' two times i have different >> result sets. > > The "synchronous scan" logic is probably responsible. Turn off > synchronize_seqscans if this behavior bothers you. > > regards, tom lane > It works Tom, Thanks! -- Emanuel Calvo Franco Sumate al ARPUG ! ( www.arpug.com.ar) ArPUG / AOSUG Member
2009/5/8 David Fetter <david@fetter.org>: > On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote: >> Hi all. >> >> I'll make this faster. >> >> I hace this table and this function: > > You should only ever assume that your SELECT's output will have a > particular ordering when you include an ORDER BY clause that actually > specifies the order well enough :) > I test it in the first time :) With the 'order by' it works well, but in 'theory' if you run sequentially and physically a table, you expect obtain the same results with a same query. There is no indexes that can intersect the results or inherits tables. But with the option synchronize_seqscans in off, it works like i expected :) i will study a bit more this option on monday (like always). -- Emanuel Calvo Franco Sumate al ARPUG ! ( www.arpug.com.ar) ArPUG / AOSUG Member
David Fetter escribió: > On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote: > > Hi all. > > > > I'll make this faster. > > > > I hace this table and this function: > > You should only ever assume that your SELECT's output will have a > particular ordering when you include an ORDER BY clause that actually > specifies the order well enough :) Yeah, we went over this on the spanish list, turned out that I couldn't remember about syncscan :-) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Emanuel Calvo Franco wrote: > > Executing 'select * from datos limit 1 offset 150000' two times i have different > result sets. > When i execute 'explain analyze verbose <query>' i see that (as > expected) the seq scan > is occurring. > > > That's correct? Is logical that if the scan is sequential in the > physical table returns differents > data? > > I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results. > > Thanks in advance. > Emanuel, LIMIT and OFFSET are stable only when you have "ORDER BY" on unique values. Without that, the database is free to return the rows in whatever order it deems best, which gives unpredictable results when combined with LIMIT/OFFSET. Adam
On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote: > 2009/5/8 David Fetter <david@fetter.org>: > > On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote: > >> Hi all. > >> > >> I'll make this faster. > >> > >> I hace this table and this function: > > > > You should only ever assume that your SELECT's output will have a > > particular ordering when you include an ORDER BY clause that > > actually specifies the order well enough :) > > > > I test it in the first time :) > > With the 'order by' it works well, but in 'theory' The theory under which you should operate is that the underlying implementation only gives you the orderings you ask for. This way, when other beneficial implementation changes happen, they will not surprise you. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2009/5/8 David Fetter <david@fetter.org>: > On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote: >> >> I test it in the first time :) >> >> With the 'order by' it works well, but in 'theory' > > The theory under which you should operate is that the underlying > implementation only gives you the orderings you ask for. This way, > when other beneficial implementation changes happen, they will not > surprise you. :) Yeap. now it have more sense. Is a question of performance and to take notice of that. Cool. Thanks to all!, -- Emanuel Calvo Franco Sumate al ARPUG ! ( www.arpug.com.ar) ArPUG / AOSUG Member
On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > David Fetter escribió: >> On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote: >> > Hi all. >> > >> > I'll make this faster. >> > >> > I hace this table and this function: >> >> You should only ever assume that your SELECT's output will have a >> particular ordering when you include an ORDER BY clause that actually >> specifies the order well enough :) > > Yeah, we went over this on the spanish list, turned out that I couldn't > remember about syncscan :-) I like the new behavior. It really encourages proper use of order by, because the natural ordering results are effectively randomized. A class of subtle bugs has been made obvious. :) merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> Yeah, we went over this on the spanish list, turned out that I couldn't >> remember about syncscan :-) > I like the new behavior. It really encourages proper use of order by, > because the natural ordering results are effectively randomized. A > class of subtle bugs has been made obvious. :) Not really, because the syncscan behavior only kicks in when your table gets large ... you'll never see it during devel testing on toy tables ... regards, tom lane
On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote: > Merlin Moncure <mmoncure@gmail.com> writes: > > On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera > > <alvherre@commandprompt.com> wrote: > >> Yeah, we went over this on the spanish list, turned out that I > >> couldn't remember about syncscan :-) > > > I like the new behavior. It really encourages proper use of order > > by, because the natural ordering results are effectively > > randomized. A class of subtle bugs has been made obvious. :) > > Not really, because the syncscan behavior only kicks in when your > table gets large ... you'll never see it during devel testing on toy > tables ... Good point. It's important not to test only on toy-sized tables for lots and lots of good reasons, scale-dependence of sync scans being a small one. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, May 10, 2009 at 2:03 AM, David Fetter <david@fetter.org> wrote: > On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >> > On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera >> > <alvherre@commandprompt.com> wrote: >> >> Yeah, we went over this on the spanish list, turned out that I >> >> couldn't remember about syncscan :-) >> >> > I like the new behavior. It really encourages proper use of order >> > by, because the natural ordering results are effectively >> > randomized. A class of subtle bugs has been made obvious. :) >> >> Not really, because the syncscan behavior only kicks in when your >> table gets large ... you'll never see it during devel testing on toy >> tables ... > > Good point. It's important not to test only on toy-sized tables for > lots and lots of good reasons, scale-dependence of sync scans being a > small one. Last job I was at I was the lone pgsql guy who worked with three Oracle DBAs, and quite a few of them were caught off guard by this type of behaviour (it was with hash_agg and reporting queries with group by).
2009/5/9 Tom Lane <tgl@sss.pgh.pa.us>: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >>> Yeah, we went over this on the spanish list, turned out that I couldn't >>> remember about syncscan :-) > >> I like the new behavior. It really encourages proper use of order by, >> because the natural ordering results are effectively randomized. A >> class of subtle bugs has been made obvious. :) > > Not really, because the syncscan behavior only kicks in when your table > gets large ... you'll never see it during devel testing on toy tables ... > > regards, tom lane > Yeap. If you see one of the test i made, you'll see this switch over the ~100 regs. -- Emanuel Calvo Franco Sumate al ARPUG ! ( www.arpug.com.ar) ArPUG / AOSUG Member