Thread: limit and query planner

limit and query planner

From

armand pirvu

Date:

05 June 2018, 21:24:25

All

Please see below

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 10;
                                                            QUERY PLAN
           

-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1)
   ->  Seq Scan on sp_i2birst_reg_staging_test  (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429
rows=10loops=1) 
         Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
         Rows Removed by Filter: 115
 Planning time: 3.022 ms
 Execution time: 0.639 ms
(6 rows)



birstdb=# \d sp_i2birst_reg_staging_test
                                     Table "csischema.sp_i2birst_reg_staging_test"
    Column     |            Type             |                                Modifiers
---------------+-----------------------------+-------------------------------------------------------------------------
 action_id     | bigint                      | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass)
 reg_uid       | integer                     | not null
 evt_uid       | integer                     | not null
 evt_id        | character varying(10)       | not null
 operation     | character varying(6)        | not null
 status        | smallint                    | not null
 category      | character varying(20)       | not null default ''::character varying
 add_date      | timestamp with time zone    | not null default now()
 mod_date      | timestamp with time zone    | not null default now()
 ingres_data   | jsonb                       |
 thread_number | bigint                      | not null default 0
 start_time    | timestamp without time zone |
 end_time      | timestamp without time zone |
Indexes:
    "sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id)
    "sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category)
Check constraints:
    "sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3]))

Even if add an index on evt_id and status same table scan

But

select count(*) from sp_i2birst_reg_staging_test;
 count
-------
  6860

select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
 count
-------
  4239

So I can see why the planner is choosing a table scan

My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no
implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? 

Thanks
— Armand

Re: limit and query planner

From

Pavel Stehule

Date:

05 June 2018, 21:32:22

2018-06-05 20:24 GMT+02:00 armand pirvu <armand.pirvu@gmail.com>:

All

Please see below

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1)
-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429 rows=10 loops=1)
Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
Rows Removed by Filter: 115
Planning time: 3.022 ms
Execution time: 0.639 ms
(6 rows)

birstdb=# \d sp_i2birst_reg_staging_test
Table "csischema.sp_i2birst_reg_staging_test"
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------------------------------
action_id | bigint | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass)
reg_uid | integer | not null
evt_uid | integer | not null
evt_id | character varying(10) | not null
operation | character varying(6) | not null
status | smallint | not null
category | character varying(20) | not null default ''::character varying
add_date | timestamp with time zone | not null default now()
mod_date | timestamp with time zone | not null default now()
ingres_data | jsonb |
thread_number | bigint | not null default 0
start_time | timestamp without time zone |
end_time | timestamp without time zone |
Indexes:
"sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id)
"sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category)
Check constraints:
"sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3]))

Even if add an index on evt_id and status same table scan

But

select count(*) from sp_i2birst_reg_staging_test;
count
-------
6860

select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
count
-------
4239

So I can see why the planner is choosing a table scan

My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implications whatsoever on the planner, meaning the planner ignores it. Am I right or wrong ?

LIMIT is last clause and it is processed after aggregation.

probably you would select count(*) from (select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' LIMIT 10) s;

more you have not index on evt_id column - there is composite index, but the chance can be low

Regards

Pavel

Thanks
— Armand

Re: limit and query planner

From

armand pirvu

Date:

05 June 2018, 21:36:16

On Jun 5, 2018, at 1:32 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2018-06-05 20:24 GMT+02:00 armand pirvu <armand.pirvu@gmail.com>:
All

Please see below

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.29 rows=10 width=519) (actual time=0.110..0.439 rows=10 loops=1)
-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 width=519) (actual time=0.109..0.429 rows=10 loops=1)
Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
Rows Removed by Filter: 115
Planning time: 3.022 ms
Execution time: 0.639 ms
(6 rows)

birstdb=# \d sp_i2birst_reg_staging_test
Table "csischema.sp_i2birst_reg_staging_test"
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------------------------------
action_id | bigint | not null default nextval('i2birst_reg_staging_action_id_seq'::regclass)
reg_uid | integer | not null
evt_uid | integer | not null
evt_id | character varying(10) | not null
operation | character varying(6) | not null
status | smallint | not null
category | character varying(20) | not null default ''::character varying
add_date | timestamp with time zone | not null default now()
mod_date | timestamp with time zone | not null default now()
ingres_data | jsonb |
thread_number | bigint | not null default 0
start_time | timestamp without time zone |
end_time | timestamp without time zone |
Indexes:
"sp_i2birst_reg_staging_test_pkey" PRIMARY KEY, btree (action_id)
"sp_i2birst_reg_staging_test_idx" btree (status, evt_id, category)
Check constraints:
"sp_i2birst_reg_staging_test_status_check" CHECK (status = ANY (ARRAY[0, 1, 2, 3]))

Even if add an index on evt_id and status same table scan

But

select count(*) from sp_i2birst_reg_staging_test;
count
-------
6860

select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
count
-------
4239

So I can see why the planner is choosing a table scan

My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no implications whatsoever on the planner, meaning the planner ignores it. Am I right or wrong ?

LIMIT is last clause and it is processed after aggregation.

probably you would select count(*) from (select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' LIMIT 10) s;

more you have not index on evt_id column - there is composite index, but the chance can be low

Regards

Pavel

Thanks
— Armand

Thank you Pavel

I put the counts to show that the number of records retrieved without limit relative to a plain select count(*) is far more than 5% and an index is just from this very reason deemed useless, aka the restriction is really non existent

— Armand

Re: limit and query planner

From

Tom Lane

Date:

05 June 2018, 22:02:29

armand pirvu <armand.pirvu@gmail.com> writes:
> My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no
implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? 

You're quite wrong.  The presence of a LIMIT causes the planner to prefer
"fast start" plans, since it will then optimize on the basis of picking
the lowest estimated cost to fetch the first N rows.  As an example,
you're more likely to get an ordered indexscan than a seqscan-and-sort
for small N, though there are many cases where seqscan-and-sort wins
if the need is to fetch the whole table.

            regards, tom lane

Re: limit and query planner

From

armand pirvu

Date:

05 June 2018, 22:17:22

> On Jun 5, 2018, at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> armand pirvu <armand.pirvu@gmail.com> writes:
>> My question is: I suspect the limit simply limits the fethching to the first n-records retrieved and has no
implicationswhatsoever on the planner, meaning the planner ignores it. Am I right or wrong ? 
>
> You're quite wrong.  The presence of a LIMIT causes the planner to prefer
> "fast start" plans, since it will then optimize on the basis of picking
> the lowest estimated cost to fetch the first N rows.  As an example,
> you're more likely to get an ordered indexscan than a seqscan-and-sort
> for small N, though there are many cases where seqscan-and-sort wins
> if the need is to fetch the whole table.
>
>             regards, tom lane



Thank you Tom


So since
select count(*) from sp_i2birst_reg_staging_test;
count
-------
 6860
and
select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 ;
count
-------
 4239

That means to me I fetch almost the whole table and then I fall in the case you described seqscan-and-sort wins over
indexscan. 

My statement was made because in the case of an index it gets used as long as the data returned back falls below 10%
(orso) from the total data in the table and in the case of the original query no matter how low I get the N still seq
scanbut I guess is again the above sescan-and-sort scenario (see below) 

create index fooidx on sp_i2birst_reg_staging_test (evt_id, status);
vacuum analyze sp_i2birst_reg_staging_test;

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=1;
                                                                 QUERY PLAN
                    
 Index Scan using fooidx on sp_i2birst_reg_staging_test  (cost=0.28..202.91 rows=500 width=519) (actual
time=0.097..0.527rows=500 loops=1) 
   Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
 Planning time: 1.024 ms
 Execution time: 0.766 ms
this gets 500 rows out of 6860

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=1 limit 10;

                                                                   QUERY PLAN
                        
 Limit  (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 loops=1)
   ->  Index Scan using fooidx on sp_i2birst_reg_staging_test  (cost=0.28..202.91 rows=500 width=519) (actual
time=0.072..0.101rows=10 loops=1) 
         Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
 Planning time: 0.280 ms
 Execution time: 0.173 ms

Back to the original

explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
and status=0 limit 1 ;

                                                            QUERY PLAN
          
 Limit  (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 loops=1)
   ->  Seq Scan on sp_i2birst_reg_staging_test  (cost=0.00..548.40 rows=4239 width=519) (actual time=0.019..0.019
rows=1loops=1) 
         Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
         Rows Removed by Filter: 1
 Planning time: 0.286 ms
 Execution time: 0.110 ms



— Armand

Re: limit and query planner

From

David Rowley

Date:

06 June 2018, 07:45:26

On 6 June 2018 at 07:17, armand pirvu <armand.pirvu@gmail.com> wrote:
> So since
> select count(*) from sp_i2birst_reg_staging_test;
> count
> -------
>  6860
> and
> select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
> and status=0 ;
> count
> -------
>  4239
>
> That means to me I fetch almost the whole table and then I fall in the case you described seqscan-and-sort wins over
indexscan.
 

The planner simply assumes that 1 in (6860.0 / 4239.0) rows matches
your WHERE clause. Since you want 10 rows from the LIMIT, it thinks
it'll just need to read 17 rows from the heap to answer your query.

> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
> and status=1;
>                                                                  QUERY PLAN
>  Index Scan using fooidx on sp_i2birst_reg_staging_test  (cost=0.28..202.91 rows=500 width=519) (actual
time=0.097..0.527rows=500 loops=1)
 
>    Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
>  Planning time: 1.024 ms
>  Execution time: 0.766 ms
> this gets 500 rows out of 6860
>
> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
> and status=1 limit 10;
>
>                                                                    QUERY PLAN
>  Limit  (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 loops=1)
>    ->  Index Scan using fooidx on sp_i2birst_reg_staging_test  (cost=0.28..202.91 rows=500 width=519) (actual
time=0.072..0.101rows=10 loops=1)
 
>          Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
>  Planning time: 0.280 ms
>  Execution time: 0.173 ms
>
> Back to the original
>
> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
> and status=0 limit 1 ;
>
>                                                             QUERY PLAN
>  Limit  (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 loops=1)
>    ->  Seq Scan on sp_i2birst_reg_staging_test  (cost=0.00..548.40 rows=4239 width=519) (actual time=0.019..0.019
rows=1loops=1)
 
>          Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
>          Rows Removed by Filter: 1
>  Planning time: 0.286 ms
>  Execution time: 0.110 ms

For the more restrictive status, the planner thinks more rows will
need to be looked at, which increases the cost of the seqscan, and the
planner favours the index scan.  You can see the planner estimates 500
rows will match the status=1 query. So thinks ceil(1 *  (6860 / 500.0)
* 10) = 138 rows will need looked at in the seqscan plan. That's
obviously more costly than 17 rows. So the index scan begins to look
more favourable.

The planner always assumes the rows are evenly distributed, which is
not always the case. If your ten rows were at the end of the heap,
then the seqscan would be a bad plan. In this case "Rows Removed by
Filter" would be high.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services