Thread: limit and query planner
All Please see below explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=0 limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1) -> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429 rows=10loops=1) Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0)) Rows Removed by Filter: 115 Planning time: 3.022 ms Execution time: 0.639 ms (6 rows) birstdb=# \d sp_i2birst_reg_staging_test Table "csischema.sp_i2birst_reg_staging_test" Column | Type | Modifiers ---------------+-----------------------------+------------------------------------------------------------------------- action_id | bigint | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass) reg_uid | integer | not null evt_uid | integer | not null evt_id | character varying(10) | not null operation | character varying(6) | not null status | smallint | not null category | character varying(20) | not null default ''::character varying add_date | timestamp with time zone | not null default now() mod_date | timestamp with time zone | not null default now() ingres_data | jsonb | thread_number | bigint | not null default 0 start_time | timestamp without time zone | end_time | timestamp without time zone | Indexes: "sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id) "sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category) Check constraints: "sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3])) Even if add an index on evt_id and status same table scan But select count(*) from sp_i2birst_reg_staging_test; count ------- 6860 select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=0 ; count ------- 4239 So I can see why the planner is choosing a table scan My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? Thanks — Armand
2018-06-05 20:24 GMT+02:00 armand pirvu <armand.pirvu@gmail.com>:
All
Please see below
explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 10;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -----------
Limit (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1)
-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429 rows=10 loops=1)
Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
Rows Removed by Filter: 115
Planning time: 3.022 ms
Execution time: 0.639 ms
(6 rows)
birstdb=# \d sp_i2birst_reg_staging_test
Table "csischema.sp_i2birst_reg_staging_test"
Column | Type | Modifiers
---------------+-----------------------------+-------------- ------------------------------ -----------------------------
action_id | bigint | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass)
reg_uid | integer | not null
evt_uid | integer | not null
evt_id | character varying(10) | not null
operation | character varying(6) | not null
status | smallint | not null
category | character varying(20) | not null default ''::character varying
add_date | timestamp with time zone | not null default now()
mod_date | timestamp with time zone | not null default now()
ingres_data | jsonb |
thread_number | bigint | not null default 0
start_time | timestamp without time zone |
end_time | timestamp without time zone |
Indexes:
"sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id)
"sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category)
Check constraints:
"sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3]))
Even if add an index on evt_id and status same table scan
But
select count(*) from sp_i2birst_reg_staging_test;
count
-------
6860
select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
count
-------
4239
So I can see why the planner is choosing a table scan
My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implications whatsoever on the planner, meaning the planner ignores it. Am I right or wrong ?
LIMIT is last clause and it is processed after aggregation.
probably you would select count(*) from (select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' LIMIT 10) s;
more you have not index on evt_id column - there is composite index, but the chance can be low
Regards
Pavel
Thanks
— Armand
On Jun 5, 2018, at 1:32 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:2018-06-05 20:24 GMT+02:00 armand pirvu <armand.pirvu@gmail.com>:All
Please see below
explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 10;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -----------
Limit (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1)
-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429 rows=10 loops=1)
Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
Rows Removed by Filter: 115
Planning time: 3.022 ms
Execution time: 0.639 ms
(6 rows)
birstdb=# \d sp_i2birst_reg_staging_test
Table "csischema.sp_i2birst_reg_staging_test"
Column | Type | Modifiers
---------------+-----------------------------+-------------- ------------------------------ -----------------------------
action_id | bigint | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass)
reg_uid | integer | not null
evt_uid | integer | not null
evt_id | character varying(10) | not null
operation | character varying(6) | not null
status | smallint | not null
category | character varying(20) | not null default ''::character varying
add_date | timestamp with time zone | not null default now()
mod_date | timestamp with time zone | not null default now()
ingres_data | jsonb |
thread_number | bigint | not null default 0
start_time | timestamp without time zone |
end_time | timestamp without time zone |
Indexes:
"sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id)
"sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category)
Check constraints:
"sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3]))
Even if add an index on evt_id and status same table scan
But
select count(*) from sp_i2birst_reg_staging_test;
count
-------
6860
select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
count
-------
4239
So I can see why the planner is choosing a table scan
My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implications whatsoever on the planner, meaning the planner ignores it. Am I right or wrong ?LIMIT is last clause and it is processed after aggregation.probably you would select count(*) from (select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' LIMIT 10) s;more you have not index on evt_id column - there is composite index, but the chance can be lowRegardsPavel
Thanks
— Armand
Thank you Pavel
I put the counts to show that the number of records retrieved without limit relative to a plain select count(*) is far more than 5% and an index is just from this very reason deemed useless, aka the restriction is really non existent
— Armand
armand pirvu <armand.pirvu@gmail.com> writes: > My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? You're quite wrong. The presence of a LIMIT causes the planner to prefer "fast start" plans, since it will then optimize on the basis of picking the lowest estimated cost to fetch the first N rows. As an example, you're more likely to get an ordered indexscan than a seqscan-and-sort for small N, though there are many cases where seqscan-and-sort wins if the need is to fetch the whole table. regards, tom lane
> On Jun 5, 2018, at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > armand pirvu <armand.pirvu@gmail.com> writes: >> My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? > > You're quite wrong. The presence of a LIMIT causes the planner to prefer > "fast start" plans, since it will then optimize on the basis of picking > the lowest estimated cost to fetch the first N rows. As an example, > you're more likely to get an ordered indexscan than a seqscan-and-sort > for small N, though there are many cases where seqscan-and-sort wins > if the need is to fetch the whole table. > > regards, tom lane Thank you Tom So since select count(*) from sp_i2birst_reg_staging_test; count ------- 6860 and select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=0 ; count ------- 4239 That means to me I fetch almost the whole table and then I fall in the case you described seqscan-and-sort wins over indexscan. My statement was made because in the case of an index it gets used as long as the data returned back falls below 10% (orso) from the total data in the table and in the case of the original query no matter how low I get the N still seq scanbut I guess is again the above sescan-and-sort scenario (see below) create index fooidx on sp_i2birst_reg_staging_test (evt_id, status); vacuum analyze sp_i2birst_reg_staging_test; explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=1; QUERY PLAN Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 rows=500 width=519) (actual time=0.097..0.527rows=500 loops=1) Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) Planning time: 1.024 ms Execution time: 0.766 ms this gets 500 rows out of 6860 explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=1 limit 10; QUERY PLAN Limit (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 loops=1) -> Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 rows=500 width=519) (actual time=0.072..0.101rows=10 loops=1) Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) Planning time: 0.280 ms Execution time: 0.173 ms Back to the original explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' and status=0 limit 1 ; QUERY PLAN Limit (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 loops=1) -> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.019..0.019 rows=1loops=1) Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0)) Rows Removed by Filter: 1 Planning time: 0.286 ms Execution time: 0.110 ms — Armand
On 6 June 2018 at 07:17, armand pirvu <armand.pirvu@gmail.com> wrote: > So since > select count(*) from sp_i2birst_reg_staging_test; > count > ------- > 6860 > and > select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=0 ; > count > ------- > 4239 > > That means to me I fetch almost the whole table and then I fall in the case you described seqscan-and-sort wins over indexscan. The planner simply assumes that 1 in (6860.0 / 4239.0) rows matches your WHERE clause. Since you want 10 rows from the LIMIT, it thinks it'll just need to read 17 rows from the heap to answer your query. > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=1; > QUERY PLAN > Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 rows=500 width=519) (actual time=0.097..0.527rows=500 loops=1) > Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 1.024 ms > Execution time: 0.766 ms > this gets 500 rows out of 6860 > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=1 limit 10; > > QUERY PLAN > Limit (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 loops=1) > -> Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 rows=500 width=519) (actual time=0.072..0.101rows=10 loops=1) > Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 0.280 ms > Execution time: 0.173 ms > > Back to the original > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=0 limit 1 ; > > QUERY PLAN > Limit (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 loops=1) > -> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.019..0.019 rows=1loops=1) > Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0)) > Rows Removed by Filter: 1 > Planning time: 0.286 ms > Execution time: 0.110 ms For the more restrictive status, the planner thinks more rows will need to be looked at, which increases the cost of the seqscan, and the planner favours the index scan. You can see the planner estimates 500 rows will match the status=1 query. So thinks ceil(1 * (6860 / 500.0) * 10) = 138 rows will need looked at in the seqscan plan. That's obviously more costly than 17 rows. So the index scan begins to look more favourable. The planner always assumes the rows are evenly distributed, which is not always the case. If your ten rows were at the end of the heap, then the seqscan would be a bad plan. In this case "Rows Removed by Filter" would be high. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services