Thread: query using incorrect index

query using incorrect index

From
Russell Keane
Date:

 

Using PG 9.0 and given 2 queries (messageq_current is a view on the messageq_table):

 

select entity_id from messageq_current

where entity_id = 123456;

 

select entity_id from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2

order by entity_id desc

limit 1;

 

and 2 indexes (there are 15 indexes in total but they are left out here for brevity):

 

messageq1:

CREATE INDEX messageq1

  ON messageq_table

  USING btree

  (entity_id);

 

And messageq4:

 

CREATE INDEX messageq4

  ON messageq_table

  USING btree

  (inactive, staff_ty, staff_id, incoming, tran_dt);

 

With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 is very slow (241.515ms).

If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) but then query 1 must use a different index and is therefore slower (> 5ms).

 

So, to the Query plans:

With messageq1:

"Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared hit=32 read=18870 written=12"

"  ->  Index Scan Backward using messageq1 on prac_live_10112.messageq_table  (cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"        Buffers: shared hit=32 read=18870 written=12"

"Total runtime: 241.515 ms"

 

Without messageq1:

"Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared read=3"

"  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual time=0.054..0.054 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Sort Key: messageq_table.entity_id"

"        Sort Method:  quicksort  Memory: 17kB"

"        ->  Bitmap Heap Scan on prac_live_10112.messageq_table  (cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 loops=1)"

"              Output: messageq_table.entity_id"

"              Recheck Cond: ((messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2))"

"              Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"              Buffers: shared read=3"

"              ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 width=0) (actual time=0.040..0.040 rows=0 loops=1)"

"                    Index Cond: ((messageq_table.inactive = false) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (messageq_table.incoming = true))"

"                    Buffers: shared read=3"

"Total runtime: 0.098 ms"

 

Clearly the statistics are off somehow but I really don’t know where to start.

 

Any help you can give me would be very much appreciated.

 

Regards,

 

Russell Keane

INPS

 

Tel:         +44 (0)20 7501 7277

 

Subscribe to the Vision e-newsletter

Subscribe to the Helpline Support Bulletin

  Subscribe to the Helpline Blog RSS Feed

 



Registered name: In Practice Systems Ltd.
Registered address: The Bread Factory, 1a Broughton Street, London, SW8 3QJ
Registered Number: 1788577
Registered in England
Visit our Internet Web site at www.inps.co.uk
The information in this internet email is confidential and is intended solely for the addressee. Access, copying or re-use of information in it by anyone else is not authorised. Any views or opinions presented are solely those of the author and do not necessarily represent those of INPS or any of its affiliates. If you are not the intended recipient please contact is.helpdesk@inps.co.uk

Attachment

Re: query using incorrect index

From
"Kevin Grittner"
Date:
Russell Keane <Russell.Keane@inps.co.uk> wrote:

> Clearly the statistics are off somehow but I really don't know
> where to start.
>
> Any help you can give me would be very much appreciated.

It would help to know your more about your hardware and PostgreSQL
configuration.  The latter can probably best be communicated by
copy/paste of the results of the query on this page:

http://wiki.postgresql.org/wiki/Server_Configuration

Can you also post the EXPLAIN ANALYZE output for the slow query with
both indexes present but without the LIMIT clause?

-Kevin

Re: query using incorrect index

From
Robert Klemme
Date:


On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane <Russell.Keane@inps.co.uk> wrote:

 

Using PG 9.0 and given 2 queries (messageq_current is a view on the messageq_table):

 

select entity_id from messageq_current

where entity_id = 123456;

 

select entity_id from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2

order by entity_id desc

limit 1;

 

and 2 indexes (there are 15 indexes in total but they are left out here for brevity):

 

messageq1:

CREATE INDEX messageq1

  ON messageq_table

  USING btree

  (entity_id);

 

And messageq4:

 

CREATE INDEX messageq4

  ON messageq_table

  USING btree

  (inactive, staff_ty, staff_id, incoming, tran_dt);


Of course a lot of detail is missing (full schema of table, all the other indexes) but with "inactive" a boolean column I suspect selectivity might not be too good here and so having it as a first column in a covering index is at least questionable.  If query 2 is frequent you might also want to consider creating a partial index only on (staff_ty, staff_id) with filtering criteria on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?

select max(entity_id) as entity_id

from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2; 

 

With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 is very slow (241.515ms).

If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) but then query 1 must use a different index and is therefore slower (> 5ms).

 

So, to the Query plans:


Of which query?  Shouldn't there be four plans in total?  I'd post plans here:
http://explain.depesz.com/
 

With messageq1:

"Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared hit=32 read=18870 written=12"

"  ->  Index Scan Backward using messageq1 on prac_live_10112.messageq_table  (cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"        Buffers: shared hit=32 read=18870 written=12"

"Total runtime: 241.515 ms"

 

Without messageq1:

"Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared read=3"

"  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual time=0.054..0.054 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Sort Key: messageq_table.entity_id"

"        Sort Method:  quicksort  Memory: 17kB"

"        ->  Bitmap Heap Scan on prac_live_10112.messageq_table  (cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 loops=1)"

"              Output: messageq_table.entity_id"

"              Recheck Cond: ((messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2))"

"              Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"              Buffers: shared read=3"

"              ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 width=0) (actual time=0.040..0.040 rows=0 loops=1)"

"                    Index Cond: ((messageq_table.inactive = false) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (messageq_table.incoming = true))"

"                    Buffers: shared read=3"

"Total runtime: 0.098 ms"

 

Clearly the statistics are off somehow but I really don’t know where to start.

 

Any help you can give me would be very much appreciated.


Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: query using incorrect index

From
Russell Keane
Date:

You’re right, a lot of information is missing but I’m unsure that the other information will make too much difference.

I could drop all the other indexes on the table which aren’t used here and the queries would still use the indexes they are currently using.

 

I appreciate the idea that a boolean column selectivity might not be great. I’ve just tried creating indexes as follows:

CREATE INDEX messageq17

  ON messageq_table

  USING btree

  (staff_ty, staff_id, incoming, inactive, entity_id);

 

CREATE INDEX messageq18

  ON messageq_table

  USING btree

  (staff_ty, staff_id);

 

When running query 2 as it stands the same thing happens, it still uses the messageq1 index.

 

The query is logically the same as using max, you are correct, but it’s generated on the fly so the limit or the queried column may change.

 

The query plans were for the second query as I’m unsure that the first query is really relevant, it was simply there to justify the messageq1 index.

 

Thanks,

 

From: Robert Klemme [mailto:shortcutter@googlemail.com]
Sent: 03 August 2012 10:18
To: Russell Keane; pgsql-performance
Subject: Re: [PERFORM] query using incorrect index

 

 

On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane <Russell.Keane@inps.co.uk> wrote:

 

Using PG 9.0 and given 2 queries (messageq_current is a view on the messageq_table):

 

select entity_id from messageq_current

where entity_id = 123456;

 

select entity_id from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2

order by entity_id desc

limit 1;

 

and 2 indexes (there are 15 indexes in total but they are left out here for brevity):

 

messageq1:

CREATE INDEX messageq1

  ON messageq_table

  USING btree

  (entity_id);

 

And messageq4:

 

CREATE INDEX messageq4

  ON messageq_table

  USING btree

  (inactive, staff_ty, staff_id, incoming, tran_dt);


Of course a lot of detail is missing (full schema of table, all the other indexes) but with "inactive" a boolean column I suspect selectivity might not be too good here and so having it as a first column in a covering index is at least questionable.  If query 2 is frequent you might also want to consider creating a partial index only on (staff_ty, staff_id) with filtering criteria on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?

select max(entity_id) as entity_id

from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2;

 

 

With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 is very slow (241.515ms).

If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) but then query 1 must use a different index and is therefore slower (> 5ms).

 

So, to the Query plans:


Of which query?  Shouldn't there be four plans in total?  I'd post plans here:
http://explain.depesz.com/
 

With messageq1:

"Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared hit=32 read=18870 written=12"

"  ->  Index Scan Backward using messageq1 on prac_live_10112.messageq_table  (cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"        Buffers: shared hit=32 read=18870 written=12"

"Total runtime: 241.515 ms"

 

Without messageq1:

"Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1)"

"  Output: messageq_table.entity_id"

"  Buffers: shared read=3"

"  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual time=0.054..0.054 rows=0 loops=1)"

"        Output: messageq_table.entity_id"

"        Sort Key: messageq_table.entity_id"

"        Sort Method:  quicksort  Memory: 17kB"

"        ->  Bitmap Heap Scan on prac_live_10112.messageq_table  (cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 loops=1)"

"              Output: messageq_table.entity_id"

"              Recheck Cond: ((messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2))"

"              Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"

"              Buffers: shared read=3"

"              ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 width=0) (actual time=0.040..0.040 rows=0 loops=1)"

"                    Index Cond: ((messageq_table.inactive = false) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (messageq_table.incoming = true))"

"                    Buffers: shared read=3"

"Total runtime: 0.098 ms"

 

Clearly the statistics are off somehow but I really don’t know where to start.

 

Any help you can give me would be very much appreciated.


Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: query using incorrect index

From
Russell Keane
Date:
Settings query:
"version";"PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 32-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"lc_collate";"English_United Kingdom.1252"
"lc_ctype";"English_United Kingdom.1252"
"listen_addresses";"*"
"log_destination";"stderr"
"log_duration";"off"
"log_line_prefix";"%t "
"log_min_duration_statement";"1ms"
"log_statement";"none"
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5433"
"search_path";"prac_live_10112, prac_shared_10112, global"
"server_encoding";"UTF8"
"shared_buffers";"32MB"
"TimeZone";"Europe/London"
"work_mem";"1MB"

Hardware:
It's important to note that this is a (purposely) low spec development machine but the performance story is a similar
oneon our test setup which is a lot closer to our live environment. (I'm in the process of getting figures on this). 
E8400 Core 2 Duo (2.99GHz)
4GB ram
xp (latest sp and all updates)
1 300GB SATA2 drive with 170 GB free space

Explain analyse with both indexes present but without the limit (uses the correct index):

"Sort  (cost=12534.90..12534.97 rows=25 width=4) (actual time=0.055..0.055 rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Sort Key: messageq_table.entity_id"
"  Sort Method:  quicksort  Memory: 17kB"
"  Buffers: shared read=3"
"  ->  Bitmap Heap Scan on prac_live_10112.messageq_table  (cost=174.09..12534.32 rows=25 width=4) (actual
time=0.040..0.040rows=0 loops=1)" 
"        Output: messageq_table.entity_id"
"        Recheck Cond: ((messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2))"
"        Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND
(aud_status_to_flag(messageq_table.aud_status)= 1))" 
"        Buffers: shared read=3"
"        ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 width=0) (actual time=0.037..0.037 rows=0
loops=1)"
"              Index Cond: ((messageq_table.inactive = false) AND (messageq_table.staff_ty = 2) AND
(messageq_table.staff_id= 2) AND (messageq_table.incoming = true))" 
"              Buffers: shared read=3"
"Total runtime: 0.092 ms"



-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: 02 August 2012 21:13
To: Russell Keane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query using incorrect index

Russell Keane <Russell.Keane@inps.co.uk> wrote:

> Clearly the statistics are off somehow but I really don't know where
> to start.
>
> Any help you can give me would be very much appreciated.

It would help to know your more about your hardware and PostgreSQL configuration.  The latter can probably best be
communicatedby copy/paste of the results of the query on this page: 

http://wiki.postgresql.org/wiki/Server_Configuration

Can you also post the EXPLAIN ANALYZE output for the slow query with both indexes present but without the LIMIT clause?

-Kevin

Re: query using incorrect index

From
"Kevin Grittner"
Date:
Russell Keane <Russell.Keane@inps.co.uk> wrote:

> "log_min_duration_statement";"1ms"

> "shared_buffers";"32MB"
> "work_mem";"1MB"

Those are pretty low values even for a 4GB machine.  I suggest the
following changes and additions, based on the fact that you seem to
have the active portion of the database fully cached.

shared_buffers = '160MB'
work_mem = '8MB'
seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
effective_cache_size = '2GB'

> Explain analyse with both indexes present but without the limit
> (uses the correct index):

> "Total runtime: 0.092 ms"

Part of problem is that it thinks it will find a matching row fairly
quickly, and having done so using the index it chose will mean it is
the *right* row.  The problem is that there are no matching rows, so
it has to scan the entire index.  More fine-grained statistics
*might* help.  If other techniques don't help, you can rewrite the
query slightly to create an optimization fence, but that should be a
last resort.  I agree with Robert that if you have a lot of queries
that select on "incoming" and/or "inactive", a conditional index
(with a WHERE clause in its definition) is likely to be very
helpful.

-Kevin

Re: query using incorrect index

From
Russell Keane
Date:
I tried creating the following index:

CREATE INDEX messageq17
  ON messageq_table
  USING btree
  (staff_ty, staff_id, entity_id)
  WHERE inactive = false;

'inactive = false' (active would be much easy but this is legacy) records should make up a smaller proportion of the
overalldataset (and much more of the queries will specify this clause) and the results are very promising. 

I will also try changing the settings and report back.

Thanks again guys,



-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: 03 August 2012 15:34
To: Russell Keane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query using incorrect index

Russell Keane <Russell.Keane@inps.co.uk> wrote:

> "log_min_duration_statement";"1ms"

> "shared_buffers";"32MB"
> "work_mem";"1MB"

Those are pretty low values even for a 4GB machine.  I suggest the following changes and additions, based on the fact
thatyou seem to have the active portion of the database fully cached. 

shared_buffers = '160MB'
work_mem = '8MB'
seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
effective_cache_size = '2GB'

> Explain analyse with both indexes present but without the limit (uses
> the correct index):

> "Total runtime: 0.092 ms"

Part of problem is that it thinks it will find a matching row fairly quickly, and having done so using the index it
chosewill mean it is the *right* row.  The problem is that there are no matching rows, so it has to scan the entire
index. More fine-grained statistics 
*might* help.  If other techniques don't help, you can rewrite the query slightly to create an optimization fence, but
thatshould be a last resort.  I agree with Robert that if you have a lot of queries that select on "incoming" and/or
"inactive",a conditional index (with a WHERE clause in its definition) is likely to be very helpful. 

-Kevin