Thread: Using "LIMIT" is much faster even though, searching with PK.

Using "LIMIT" is much faster even though, searching with PK.

From

장현성

Date:

01 December 2004, 04:10:44

hello~
i'm curious about this situation.

here is my test.
my zipcode table has 47705 rows,
and schema looks like this.

pgsql=# \d zipcode

Table "public.zipcode" Column | Type | Modifiers
---------+-----------------------+----------- zipcode | character(7) |
not null sido | character varying(4) | not null gugun | character
varying(13) | not null dong | character varying(43) | not null bunji |
character varying(17) | not null seq | integer | not null Indexes:
"zipcode_pkey" PRIMARY KEY, btree (seq)

and I need seq scan so,

pgsql=# SET enable_indexscan TO OFF;
SET
Time: 0.534 ms


now test start!
the first row.

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '1';

QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual
time=0.029..88.099 rows=1 loops=1)
    Filter: (seq = 1)
 Total runtime: 88.187 ms
(3 rows)

Time: 89.392 ms pgsql=#

the first row with LIMIT

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '1' LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=207) (actual time=0.033..0.034
rows=1 loops=1) -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1
width=207) (actual time=0.028..0.028 rows=1 loops=1) Filter: (seq = 1)
Total runtime: 0.111 ms (4 rows)

Time: 1.302 ms pgsql=#

the last row,

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '47705'; QUERY
PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual
time=3.248..88.232 rows=1 loops=1) Filter: (seq = 47705) Total runtime:
88.317 ms (3 rows)

Time: 89.521 ms pgsql=#

the last row with LIMIT,

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '47705' LIMIT
1; QUERY PLAN
------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=207) (actual time=3.254..3.254
rows=1 loops=1) -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1
width=207) (actual time=3.248..3.248 rows=1 loops=1) Filter: (seq =
47705) Total runtime: 3.343 ms (4 rows)

Time: 4.583 ms pgsql=#

When I using index scan, the result was almost same, that means, there
was no time difference, so i'll not mention about index scan.

but, sequence scan, as you see above result, there is big time
difference between using LIMIT and without using it. my question is,
when we're searching with PK like SELECT * FROM table WHERE PK = 'xxx',
we already know there is only 1 row or not. so, pgsql should stop
searching when maching row was found, isn't it?

i don't know exactly about mechanism how pgsql searching row its inside,
so might be i'm thinking wrong way, any comments, advices, notes,
anything will be appreciate to me!

Re: Using "LIMIT" is much faster even though, searching with PK.

From

Tom Lane

Date:

01 December 2004, 04:26:43

=?UTF-8?B?7J6l7ZiE7ISx?= <siche@siche.net> writes:
> but, sequence scan, as you see above result, there is big time
> difference between using LIMIT and without using it.

You've got a table full of dead rows.  Try VACUUM FULL ...

            regards, tom lane

Re: Using "LIMIT" is much faster even though, searching

From

"Hyun-Sung, Jang"

Date:

01 December 2004, 04:42:09

before test, I already executed VACUUM FULL.
this result show up after vacuum full.

Tom Lane 쓴 글:

장현성 <siche@siche.net> writes:

but, sequence scan, as you see above result, there is big time 
difference between using LIMIT and without using it.

You've got a table full of dead rows.  Try VACUUM FULL ...
		regards, tom lane

Re: Using "LIMIT" is much faster even though, searching

From

Josh Berkus

Date:

01 December 2004, 05:06:52

Hyun-Sang,

> before test, I already executed VACUUM FULL.
> this result show up after vacuum full.

Really?   Your results really look like a bloated table.   Can you run VACUUM
FULL ANALYZE VERBOSE on the table and post the output?

> When I using index scan, the result was almost same, that means, there
> was no time difference, so i'll not mention about index scan.

Can we see an index scan plan anyway?   EXPLAIN ANALYZE?

Oh, and if this is a zip codes table, why are you using a sequence as the
primary key instead of just using the zip code?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Using "LIMIT" is much faster even though, searching

From

"Hyun-Sung, Jang"

Date:

01 December 2004, 06:03:47

do you need all of verbose information??
VACUUM FULL ANALYZE VERBOSE give me a lot of infomation,
so i just cut zipcode parts.

==start===============================================================================
INFO: vacuuming "public.zipcode"
INFO: "zipcode": found 0 removable, 47705 nonremovable row versions in 572 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 76 to 136 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 27944 bytes.
0 pages are or will become empty, including 0 at the end of the table.
91 pages containing 8924 free bytes are potential move destinations.
CPU 0.03s/0.00u sec elapsed 0.03 sec.
INFO: index "zipcode_pkey" now contains 47705 row versions in 147 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO: "zipcode": moved 0 row versions, truncated 572 to 572 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.zipcode"
INFO: "zipcode": scanned 572 of 572 pages, containing 47705 live rows and 0 dead rows; 3000 rows in sample, 47705 estimated total rows
INFO: free space map: 108 relations, 128 pages stored; 1760 total pages needed
DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 182 kB shared memory.
VACUUM
pgsql=#
==end===============================================================================

USING INDEX SCAN

==start===============================================================================
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.058 rows=1 loops=1)
   Index Cond: (seq = 1)
Total runtime: 0.152 ms
(3 rows)

pgsql=#

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.059..0.060 rows=1 loops=1)
   -> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.054 rows=1 loops=1)
         Index Cond: (seq = 1)
Total runtime: 0.158 ms
(4 rows)

pgsql=#

WHEN SELECT LAST ROW -----

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.059 rows=1 loops=1)
   Index Cond: (seq = 47705)
Total runtime: 0.150 ms
(3 rows)

pgsql=#

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.057..0.057 rows=1 loops=1)
   -> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.052..0.052 rows=1 loops=1)
         Index Cond: (seq = 47705)
Total runtime: 0.156 ms
(4 rows)

pgsql=#
==end===============================================================================

USING SEQUENCE SCAN

==start===============================================================================
pgsql=# set enable_indexscan to off;
SET
pgsql=#

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.032..109.934 rows=1 loops=1)
   Filter: (seq = 1)
Total runtime: 110.021 ms
(3 rows)

pgsql=#

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=0.035..0.035 rows=1 loops=1)
   -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.030..0.030 rows=1 loops=1)
         Filter: (seq = 1)
Total runtime: 0.113 ms
(4 rows)

pgsql=#

WHEN SELECT LAST ROW -----

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.048..110.232 rows=1 loops=1)
   Filter: (seq = 47705)
Total runtime: 110.322 ms
(3 rows)

pgsql=#

pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=4.038..4.038 rows=1 loops=1)
   -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.033..4.033 rows=1 loops=1)
         Filter: (seq = 47705)
Total runtime: 4.125 ms
(4 rows)

pgsql=#

==end===============================================================================

I just choose zipcode table for this test.
not only zipcode table but other table also give me same result.

SELECT * FROM table_name WHERE PK = 'xxx'

was always slower than

SELECT * FROM table_name WHERE PK = 'xxx' LIMIT 1

when sequence scan .

i think pgsql tring to find more than 1 row when query executed even if
searching condition is primary key.

ah, why i'm using sequence as PK instead of zip code is
in korea, the small towns doesn't have it's own zipcode
so they share other big city's.
that's why zip code can't be a primary key.
actually, i'm not using sequence to find zipcode.
i made it temporary for this test.

i think there is nobody want to using sequence number to find zipcode,
instead of city name. :-)

Josh Berkus 쓴 글:

Hyun-Sang,

before test, I already executed VACUUM FULL.
this result show up after vacuum full.

Really?   Your results really look like a bloated table.   Can you run VACUUM 
FULL ANALYZE VERBOSE on the table and post the output?

When I using index scan, the result was almost same, that means, there
was no time difference, so i'll not mention about index scan.

Can we see an index scan plan anyway?   EXPLAIN ANALYZE?

Oh, and if this is a zip codes table, why are you using a sequence as the 
primary key instead of just using the zip code?

Re: Using "LIMIT" is much faster even though, searching

From

Andrew McMillan

Date:

01 December 2004, 08:23:46

On Wed, 2004-12-01 at 15:03 +0900, Hyun-Sung, Jang wrote:
>
> < lots of information about seq scan vs index scan >
>

Hi,

Just because it has an ID that is the largest in the set, does not mean
it will be at the last position in the on-disk tables.  And similarly,
the lowest numbered ID does not mean it will be at the beginning in the
on-disk structures.

So when you 'LIMIT 1' the sequential scan stops as soon as it has found
the first row that matches, but in the no LIMIT case with a sequential
scan it will continue the scan to the end of the on-disk data.

Given that this column is unique, PostgreSQL could optimise this case
and imply LIMIT 1 for all sequential scans on such criteria, but in the
real world the optimisation is usually going to come from an index - at
least it will for larger tables - since that's a component of how
PostgreSQL is enforcing the unique constraint.

Regards,
                    Andrew McMillan.

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
                Chicken Little only has to be right once.
-------------------------------------------------------------------------

Attachment

signature.asc

Re: Using "LIMIT" is much faster even though, searching

From

Josh Berkus

Date:

02 December 2004, 02:31:43

Hyun-Sung,

> do you need all of verbose information??
> VACUUM FULL ANALYZE VERBOSE give me a lot of infomation,
> so i just cut zipcode parts.

Oh, sorry.  I meant just "VACUUM FULL ANALYZE VERBOSE zipcode", not the whole
database.   Should have been clearer.

> ==start====================================================================
>=========== INFO:  vacuuming "public.zipcode"
> INFO:  "zipcode": found 0 removable, 47705 nonremovable row versions in
> 572 pages
> DETAIL:  0 dead row versions cannot be removed yet.

OK, looks like you're clean.

> I just choose zipcode table for this test.
> not only zipcode table but other table also give me same result.
>
> SELECT * FROM table_name WHERE PK = 'xxx'
>
> was always slower than
>
> SELECT * FROM table_name WHERE PK = 'xxx' LIMIT 1
>
> when sequence scan .

yeah?  So?  Stop using sequence scan!  You've just demonstrated that, if you
don't force the planner to use sequence scan, things run at the same speed
with or without the LIMIT.  So you're causing a problem by forcing the
planner into a bad plan.

See Andrew's explanation of why it works this way.

> ah, why i'm using sequence as PK instead of zip code is
> in korea, the small towns doesn't have it's own zipcode
> so they share other big city's.
> that's why zip code can't be a primary key.
> actually, i'm not using sequence to find zipcode.
> i made it temporary for this test.

That makes sense.

--Josh

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco