Thread: Using "LIMIT" is much faster even though, searching with PK.
hello~ i'm curious about this situation. here is my test. my zipcode table has 47705 rows, and schema looks like this. pgsql=# \d zipcode Table "public.zipcode" Column | Type | Modifiers ---------+-----------------------+----------- zipcode | character(7) | not null sido | character varying(4) | not null gugun | character varying(13) | not null dong | character varying(43) | not null bunji | character varying(17) | not null seq | integer | not null Indexes: "zipcode_pkey" PRIMARY KEY, btree (seq) and I need seq scan so, pgsql=# SET enable_indexscan TO OFF; SET Time: 0.534 ms now test start! the first row. pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '1'; QUERY PLAN ------------------------------------------------------------------------------------------------------- Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual time=0.029..88.099 rows=1 loops=1) Filter: (seq = 1) Total runtime: 88.187 ms (3 rows) Time: 89.392 ms pgsql=# the first row with LIMIT pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '1' LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..1168.31 rows=1 width=207) (actual time=0.033..0.034 rows=1 loops=1) -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual time=0.028..0.028 rows=1 loops=1) Filter: (seq = 1) Total runtime: 0.111 ms (4 rows) Time: 1.302 ms pgsql=# the last row, pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '47705'; QUERY PLAN ------------------------------------------------------------------------------------------------------- Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual time=3.248..88.232 rows=1 loops=1) Filter: (seq = 47705) Total runtime: 88.317 ms (3 rows) Time: 89.521 ms pgsql=# the last row with LIMIT, pgsql=# EXPLAIN ANALYZE select * from zipcode where seq = '47705' LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..1168.31 rows=1 width=207) (actual time=3.254..3.254 rows=1 loops=1) -> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=207) (actual time=3.248..3.248 rows=1 loops=1) Filter: (seq = 47705) Total runtime: 3.343 ms (4 rows) Time: 4.583 ms pgsql=# When I using index scan, the result was almost same, that means, there was no time difference, so i'll not mention about index scan. but, sequence scan, as you see above result, there is big time difference between using LIMIT and without using it. my question is, when we're searching with PK like SELECT * FROM table WHERE PK = 'xxx', we already know there is only 1 row or not. so, pgsql should stop searching when maching row was found, isn't it? i don't know exactly about mechanism how pgsql searching row its inside, so might be i'm thinking wrong way, any comments, advices, notes, anything will be appreciate to me!
=?UTF-8?B?7J6l7ZiE7ISx?= <siche@siche.net> writes: > but, sequence scan, as you see above result, there is big time > difference between using LIMIT and without using it. You've got a table full of dead rows. Try VACUUM FULL ... regards, tom lane
before test, I already executed VACUUM FULL.
this result show up after vacuum full.
Tom Lane 쓴 글:
this result show up after vacuum full.
Tom Lane 쓴 글:
장현성 <siche@siche.net> writes:but, sequence scan, as you see above result, there is big time difference between using LIMIT and without using it.You've got a table full of dead rows. Try VACUUM FULL ... regards, tom lane
Hyun-Sang, > before test, I already executed VACUUM FULL. > this result show up after vacuum full. Really? Your results really look like a bloated table. Can you run VACUUM FULL ANALYZE VERBOSE on the table and post the output? > When I using index scan, the result was almost same, that means, there > was no time difference, so i'll not mention about index scan. Can we see an index scan plan anyway? EXPLAIN ANALYZE? Oh, and if this is a zip codes table, why are you using a sequence as the primary key instead of just using the zip code? -- Josh Berkus Aglio Database Solutions San Francisco
do you need all of verbose information??
VACUUM FULL ANALYZE VERBOSE give me a lot of infomation,
so i just cut zipcode parts.
==start===============================================================================
INFO: vacuuming "public.zipcode"
INFO: "zipcode": found 0 removable, 47705 nonremovable row versions in 572 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 76 to 136 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 27944 bytes.
0 pages are or will become empty, including 0 at the end of the table.
91 pages containing 8924 free bytes are potential move destinations.
CPU 0.03s/0.00u sec elapsed 0.03 sec.
INFO: index "zipcode_pkey" now contains 47705 row versions in 147 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO: "zipcode": moved 0 row versions, truncated 572 to 572 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.zipcode"
INFO: "zipcode": scanned 572 of 572 pages, containing 47705 live rows and 0 dead rows; 3000 rows in sample, 47705 estimated total rows
INFO: free space map: 108 relations, 128 pages stored; 1760 total pages needed
DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 182 kB shared memory.
VACUUM
pgsql=#
==end===============================================================================
USING INDEX SCAN
==start===============================================================================
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.058 rows=1 loops=1)
Index Cond: (seq = 1)
Total runtime: 0.152 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.059..0.060 rows=1 loops=1)
-> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.054 rows=1 loops=1)
Index Cond: (seq = 1)
Total runtime: 0.158 ms
(4 rows)
pgsql=#
WHEN SELECT LAST ROW -----
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.059 rows=1 loops=1)
Index Cond: (seq = 47705)
Total runtime: 0.150 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.052..0.052 rows=1 loops=1)
Index Cond: (seq = 47705)
Total runtime: 0.156 ms
(4 rows)
pgsql=#
==end===============================================================================
USING SEQUENCE SCAN
==start===============================================================================
pgsql=# set enable_indexscan to off;
SET
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.032..109.934 rows=1 loops=1)
Filter: (seq = 1)
Total runtime: 110.021 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=0.035..0.035 rows=1 loops=1)
-> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.030..0.030 rows=1 loops=1)
Filter: (seq = 1)
Total runtime: 0.113 ms
(4 rows)
pgsql=#
WHEN SELECT LAST ROW -----
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.048..110.232 rows=1 loops=1)
Filter: (seq = 47705)
Total runtime: 110.322 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=4.038..4.038 rows=1 loops=1)
-> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.033..4.033 rows=1 loops=1)
Filter: (seq = 47705)
Total runtime: 4.125 ms
(4 rows)
pgsql=#
==end===============================================================================
I just choose zipcode table for this test.
not only zipcode table but other table also give me same result.
SELECT * FROM table_name WHERE PK = 'xxx'
was always slower than
SELECT * FROM table_name WHERE PK = 'xxx' LIMIT 1
when sequence scan .
i think pgsql tring to find more than 1 row when query executed even if
searching condition is primary key.
ah, why i'm using sequence as PK instead of zip code is
in korea, the small towns doesn't have it's own zipcode
so they share other big city's.
that's why zip code can't be a primary key.
actually, i'm not using sequence to find zipcode.
i made it temporary for this test.
i think there is nobody want to using sequence number to find zipcode,
instead of city name. :-)
Josh Berkus 쓴 글:
VACUUM FULL ANALYZE VERBOSE give me a lot of infomation,
so i just cut zipcode parts.
==start===============================================================================
INFO: vacuuming "public.zipcode"
INFO: "zipcode": found 0 removable, 47705 nonremovable row versions in 572 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 76 to 136 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 27944 bytes.
0 pages are or will become empty, including 0 at the end of the table.
91 pages containing 8924 free bytes are potential move destinations.
CPU 0.03s/0.00u sec elapsed 0.03 sec.
INFO: index "zipcode_pkey" now contains 47705 row versions in 147 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO: "zipcode": moved 0 row versions, truncated 572 to 572 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.zipcode"
INFO: "zipcode": scanned 572 of 572 pages, containing 47705 live rows and 0 dead rows; 3000 rows in sample, 47705 estimated total rows
INFO: free space map: 108 relations, 128 pages stored; 1760 total pages needed
DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 182 kB shared memory.
VACUUM
pgsql=#
==end===============================================================================
USING INDEX SCAN
==start===============================================================================
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.058 rows=1 loops=1)
Index Cond: (seq = 1)
Total runtime: 0.152 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.059..0.060 rows=1 loops=1)
-> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.054 rows=1 loops=1)
Index Cond: (seq = 1)
Total runtime: 0.158 ms
(4 rows)
pgsql=#
WHEN SELECT LAST ROW -----
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.054..0.059 rows=1 loops=1)
Index Cond: (seq = 47705)
Total runtime: 0.150 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.02 rows=1 width=55) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Scan using zipcode_pkey on zipcode (cost=0.00..3.02 rows=1 width=55) (actual time=0.052..0.052 rows=1 loops=1)
Index Cond: (seq = 47705)
Total runtime: 0.156 ms
(4 rows)
pgsql=#
==end===============================================================================
USING SEQUENCE SCAN
==start===============================================================================
pgsql=# set enable_indexscan to off;
SET
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.032..109.934 rows=1 loops=1)
Filter: (seq = 1)
Total runtime: 110.021 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='1' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=0.035..0.035 rows=1 loops=1)
-> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=0.030..0.030 rows=1 loops=1)
Filter: (seq = 1)
Total runtime: 0.113 ms
(4 rows)
pgsql=#
WHEN SELECT LAST ROW -----
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.048..110.232 rows=1 loops=1)
Filter: (seq = 47705)
Total runtime: 110.322 ms
(3 rows)
pgsql=#
pgsql=# EXPLAIN ANALYZE select * from zipcode where seq='47705' LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1168.31 rows=1 width=55) (actual time=4.038..4.038 rows=1 loops=1)
-> Seq Scan on zipcode (cost=0.00..1168.31 rows=1 width=55) (actual time=4.033..4.033 rows=1 loops=1)
Filter: (seq = 47705)
Total runtime: 4.125 ms
(4 rows)
pgsql=#
==end===============================================================================
I just choose zipcode table for this test.
not only zipcode table but other table also give me same result.
SELECT * FROM table_name WHERE PK = 'xxx'
was always slower than
SELECT * FROM table_name WHERE PK = 'xxx' LIMIT 1
when sequence scan .
i think pgsql tring to find more than 1 row when query executed even if
searching condition is primary key.
ah, why i'm using sequence as PK instead of zip code is
in korea, the small towns doesn't have it's own zipcode
so they share other big city's.
that's why zip code can't be a primary key.
actually, i'm not using sequence to find zipcode.
i made it temporary for this test.
i think there is nobody want to using sequence number to find zipcode,
instead of city name. :-)
Josh Berkus 쓴 글:
Hyun-Sang,before test, I already executed VACUUM FULL. this result show up after vacuum full.Really? Your results really look like a bloated table. Can you run VACUUM FULL ANALYZE VERBOSE on the table and post the output?When I using index scan, the result was almost same, that means, there was no time difference, so i'll not mention about index scan.Can we see an index scan plan anyway? EXPLAIN ANALYZE? Oh, and if this is a zip codes table, why are you using a sequence as the primary key instead of just using the zip code?
On Wed, 2004-12-01 at 15:03 +0900, Hyun-Sung, Jang wrote: > > < lots of information about seq scan vs index scan > > Hi, Just because it has an ID that is the largest in the set, does not mean it will be at the last position in the on-disk tables. And similarly, the lowest numbered ID does not mean it will be at the beginning in the on-disk structures. So when you 'LIMIT 1' the sequential scan stops as soon as it has found the first row that matches, but in the no LIMIT case with a sequential scan it will continue the scan to the end of the on-disk data. Given that this column is unique, PostgreSQL could optimise this case and imply LIMIT 1 for all sequential scans on such criteria, but in the real world the optimisation is usually going to come from an index - at least it will for larger tables - since that's a component of how PostgreSQL is enforcing the unique constraint. Regards, Andrew McMillan. ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Chicken Little only has to be right once. -------------------------------------------------------------------------
Attachment
Hyun-Sung, > do you need all of verbose information?? > VACUUM FULL ANALYZE VERBOSE give me a lot of infomation, > so i just cut zipcode parts. Oh, sorry. I meant just "VACUUM FULL ANALYZE VERBOSE zipcode", not the whole database. Should have been clearer. > ==start==================================================================== >=========== INFO: vacuuming "public.zipcode" > INFO: "zipcode": found 0 removable, 47705 nonremovable row versions in > 572 pages > DETAIL: 0 dead row versions cannot be removed yet. OK, looks like you're clean. > I just choose zipcode table for this test. > not only zipcode table but other table also give me same result. > > SELECT * FROM table_name WHERE PK = 'xxx' > > was always slower than > > SELECT * FROM table_name WHERE PK = 'xxx' LIMIT 1 > > when sequence scan . yeah? So? Stop using sequence scan! You've just demonstrated that, if you don't force the planner to use sequence scan, things run at the same speed with or without the LIMIT. So you're causing a problem by forcing the planner into a bad plan. See Andrew's explanation of why it works this way. > ah, why i'm using sequence as PK instead of zip code is > in korea, the small towns doesn't have it's own zipcode > so they share other big city's. > that's why zip code can't be a primary key. > actually, i'm not using sequence to find zipcode. > i made it temporary for this test. That makes sense. --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco