Thread: [PERFORM] Performance issue in PostgreSQL server...

[PERFORM] Performance issue in PostgreSQL server...

From

Dinesh Chandra 12108

Date:

03 March 2017, 16:29:49

Dear Experts,

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

CPU’s : 8

Core(s) per socket: 4

Socket(s): 2

Model name: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB

SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720

kernel.shmall = 1073741824

Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB

temp_buffers = 32MB

work_mem = 512MB

maintenance_work_mem = 2048MB

max_files_per_process = 2000

checkpoint_segments = 200

max_wal_senders = 5

wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers

Queries taking lot of time are:

==================================

2017-03-02 00:46:50 IST LOG: duration: 2492951.927 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

2017-03-02 01:05:16 IST LOG: duration: 516250.512 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

Top command output:

top - 15:13:15 up 66 days, 3:45, 8 users, load average: 1.84, 1.59, 1.57

Tasks: 830 total, 1 running, 828 sleeping, 0 stopped, 1 zombie

Cpu(s): 3.4%us, 0.7%sy, 0.0%ni, 81.7%id, 14.2%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 32830016k total, 32142596k used, 687420k free, 77460k buffers

Swap: 8190972k total, 204196k used, 7986776k free, 27981268k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

30639 postgres 20 0 10.5g 4.7g 4.7g S 13.5 14.9 10:20.95 postgres

18185 postgres 20 0 10.5g 603m 596m S 4.9 1.9 2:51.16 postgres

16543 postgres 20 0 10.5g 2.8g 2.8g S 4.3 8.8 1:34.04 postgres

14710 postgres 20 0 10.5g 2.9g 2.9g S 3.9 9.2 1:20.84 postgres

1214 root 20 0 15668 1848 896 S 1.0 0.0 130:46.43 top

13462 postgres 20 0 10.5g 1.4g 1.3g S 1.0 4.3 0:25.56 postgres

20081 root 20 0 15668 1880 936 R 1.0 0.0 0:00.12 top

13478 postgres 20 0 10.5g 2.1g 2.1g S 0.7 6.9 0:56.43 postgres

41107 root 20 0 416m 10m 4892 S 0.7 0.0 305:25.71 pgadmin3

2680 root 20 0 0 0 0 S 0.3 0.0 103:38.54 nfsiod

3558 root 20 0 13688 1100 992 S 0.3 0.0 45:00.36 gam_server

15576 root 20 0 0 0 0 S 0.3 0.0 0:01.16 flush-253:1

18430 postgres 20 0 10.5g 18m 13m S 0.3 0.1 0:00.64 postgres

20083 postgres 20 0 105m 1852 1416 S 0.3 0.0 0:00.01 bash

24188 postgres 20 0 102m 1856 832 S 0.3 0.0 0:23.39 sshd

28250 postgres 20 0 156m 1292 528 S 0.3 0.0 0:46.86 postgres

1 root 20 0 19356 1188 996 S 0.0 0.0 0:05.00 init

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)|

DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Nur Agus

Date:

03 March 2017, 18:23:36

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Experts,

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

The configuration of database server is :

Architecture:         x86_64
CPU op-mode(s):        32-bit, 64-bit
CPU’s : 8
Core(s) per socket:    4
Socket(s):             2
Model name:            Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824

Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1                      # min 32kB, -1 sets based on shared_buffers

Queries taking lot of time are:
==================================

2017-03-02 00:46:50 IST LOG: duration: 2492951.927 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

2017-03-02 01:05:16 IST LOG: duration: 516250.512 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

Top command output:

top - 15:13:15 up 66 days, 3:45, 8 users, load average: 1.84, 1.59, 1.57
Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
Cpu(s): 3.4%us, 0.7%sy, 0.0%ni, 81.7%id, 14.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32830016k total, 32142596k used,   687420k free,    77460k buffers
Swap: 8190972k total,   204196k used, 7986776k free, 27981268k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
30639 postgres 20   0 10.5g 4.7g 4.7g S 13.5 14.9 10:20.95 postgres
18185 postgres 20   0 10.5g 603m 596m S 4.9 1.9   2:51.16 postgres
16543 postgres 20   0 10.5g 2.8g 2.8g S 4.3 8.8   1:34.04 postgres
14710 postgres 20   0 10.5g 2.9g 2.9g S 3.9 9.2   1:20.84 postgres
1214 root      20   0 15668 1848 896 S 1.0 0.0 130:46.43 top
13462 postgres 20   0 10.5g 1.4g 1.3g S 1.0 4.3   0:25.56 postgres
20081 root      20   0 15668 1880 936 R 1.0 0.0   0:00.12 top
13478 postgres 20   0 10.5g 2.1g 2.1g S 0.7 6.9   0:56.43 postgres
41107 root      20   0 416m 10m 4892 S 0.7 0.0 305:25.71 pgadmin3
2680 root      20   0     0    0    0 S 0.3 0.0 103:38.54 nfsiod
3558 root      20   0 13688 1100 992 S 0.3 0.0 45:00.36 gam_server
15576 root      20   0     0    0    0 S 0.3 0.0   0:01.16 flush-253:1
18430 postgres 20   0 10.5g 18m 13m S 0.3 0.1   0:00.64 postgres
20083 postgres 20   0 105m 1852 1416 S 0.3 0.0   0:00.01 bash
24188 postgres 20   0 102m 1856 832 S 0.3 0.0   0:23.39 sshd
28250 postgres 20   0 156m 1292 528 S 0.3 0.0   0:46.86 postgres
1 root      20   0 19356 1188 996 S 0.0 0.0   0:05.00 init

Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)|

DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Dinesh Chandra 12108

Date:

03 March 2017, 18:44:07

Dear Nur,

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------

Unique (cost=1679730.32..1679837.46 rows=21428 width=8) (actual time=154753.528..155657.818 rows=1607489 loops=1)

-> Sort (cost=1679730.32..1679783.89 rows=21428 width=8) (actual time=154753.514..155087.734 rows=4053270 loops=1)

Sort Key: p.feature_id

Sort Method: quicksort Memory: 288302kB

-> Hash Join (cost=1501657.09..1678188.87 rows=21428 width=8) (actual time=144146.620..152050.311 rows=4053270 loops=1)

Hash Cond: (oe.evd_feature_id = p.feature_id)

Join Filter: ((p.modification_time > '2015-05-10 03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > '2015-05-10 03:36:5

6.056+05:30'::timestamp with time zone))

-> Seq Scan on observation_evidence oe (cost=0.00..121733.18 rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1)

-> Hash (cost=1483472.70..1483472.70 rows=1454751 width=16) (actual time=144144.653..144144.653 rows=1607491 loops=1)

Buckets: 262144 Batches: 1 Memory Usage: 75352kB

-> Index Scan using point_domain_class_id_index on point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1

59 rows=1607491 loops=1)

Index Cond: (domain_class_id = 11)

Total runtime: 155787.379 ms

(13 rows)

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.

------------------------------------------------------------------

Mobile: +91-9953975849 | Ext 1078 |dinesh.chandra@cyient.com

Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.

From: Nur Agus [mailto:nuragus.linux@gmail.com]
Sent: 03 March, 2017 5:54 PM
To: Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

Hello Dinesh,

You can try the EXPLAIN tool

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Experts,

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

The configuration of database server is :

Architecture:         x86_64
CPU op-mode(s):        32-bit, 64-bit
CPU’s : 8
Core(s) per socket:    4
Socket(s):             2
Model name:            Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824

Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1                      # min 32kB, -1 sets based on shared_buffers

Queries taking lot of time are:
==================================

2017-03-02 00:46:50 IST LOG: duration: 2492951.927 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

2017-03-02 01:05:16 IST LOG: duration: 516250.512 ms execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

Top command output:

top - 15:13:15 up 66 days, 3:45, 8 users, load average: 1.84, 1.59, 1.57
Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
Cpu(s): 3.4%us, 0.7%sy, 0.0%ni, 81.7%id, 14.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32830016k total, 32142596k used,   687420k free,    77460k buffers
Swap: 8190972k total,   204196k used, 7986776k free, 27981268k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
30639 postgres 20   0 10.5g 4.7g 4.7g S 13.5 14.9 10:20.95 postgres
18185 postgres 20   0 10.5g 603m 596m S 4.9 1.9   2:51.16 postgres
16543 postgres 20   0 10.5g 2.8g 2.8g S 4.3 8.8   1:34.04 postgres
14710 postgres 20   0 10.5g 2.9g 2.9g S 3.9 9.2   1:20.84 postgres
1214 root      20   0 15668 1848 896 S 1.0 0.0 130:46.43 top
13462 postgres 20   0 10.5g 1.4g 1.3g S 1.0 4.3   0:25.56 postgres
20081 root      20   0 15668 1880 936 R 1.0 0.0   0:00.12 top
13478 postgres 20   0 10.5g 2.1g 2.1g S 0.7 6.9   0:56.43 postgres
41107 root      20   0 416m 10m 4892 S 0.7 0.0 305:25.71 pgadmin3
2680 root      20   0     0    0    0 S 0.3 0.0 103:38.54 nfsiod
3558 root      20   0 13688 1100 992 S 0.3 0.0 45:00.36 gam_server
15576 root      20   0     0    0    0 S 0.3 0.0   0:01.16 flush-253:1
18430 postgres 20   0 10.5g 18m 13m S 0.3 0.1   0:00.64 postgres
20083 postgres 20   0 105m 1852 1416 S 0.3 0.0   0:00.01 bash
24188 postgres 20   0 102m 1856 832 S 0.3 0.0   0:23.39 sshd
28250 postgres 20   0 156m 1292 528 S 0.3 0.0   0:46.86 postgres
1 root      20   0 19356 1188 996 S 0.0 0.0   0:05.00 init

Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)|

DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Tom Lane

Date:

03 March 2017, 21:19:48

Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> writes:
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
evidence.observation_evidenceoe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
(p.modification_time> '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY
feature_id

I think that's a fundamentally slow query and you're not going to be able
to make it better without rethinking your requirements and/or data
representation.  As written, that requires the server to form the entire
join of p to oe on feature_id, with the only filter before the join being
the evidently-none-too-selective domain_class_id condition.  Only after
joining can it apply the OR condition.  So this is inherently processing a
lot of rows.

If the OR arms were individually pretty selective you could rewrite this
into a UNION of two joins, a la the discussion at
https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873489@BlueTreble.com
but given the dates involved I'm betting that won't help very much.

Or maybe you could try

select feature_id from p where domain_class_id IN (11) AND p.modification_time > '2015-05-10 00:06:56.056 IST'
intersect
select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 IST'
order by feature_id

although I'm not entirely certain that that has exactly the same
semantics (-ENOCAFFEINE), and it might still be none too quick.

            regards, tom lane

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Jeff Janes

Date:

06 March 2017, 10:23:08

On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Nur,

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

...

-> Index Scan using point_domain_class_id_index on point p (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1
59 rows=1607491 loops=1)
Index Cond: (domain_class_id = 11)

Why wouldn't this be using a bitmap scan rather than a regular index scan? It seems like it should prefer the bitmap scan, unless the table is well clustered on domain_class_id. In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan =off to see what that gives. If it gives a seq scan, then repeat with enable_seqscan also turned off. Or If it gives the bitmap scan, then repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on (domain_class_id, modification_time) and hope for an index only scan. Except that you are on 9.1, so first you would have to upgrade. Which would allow you to use BUFFERS in the explain analyze, as well as track_io_timings, both of which would also be pretty nice to see. Using 9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the execution you found in the log file? Was the top output you showed in the first email happening at the time the really slow query was running, or was that from a different period?

Cheers,

Jeff

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Justin Pryzby

Date:

06 March 2017, 11:24:02

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >                      ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70
rows=1454751width=16) (actual time=27.265..142101.1 59 rows=1607491 loops=1) 
> >                            Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

Justin

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Dinesh Chandra 12108

Date:

06 March 2017, 18:17:22

Dear Justin,

Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;


schemaname | tablename |     attname     | inherited | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation 


"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.00393333,0.00183333,0.00146667,0.0005,0.0003,6.66667e-05,6.66667e-05}"|""|0.889078


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.


-----Original Message-----
From: Justin Pryzby [mailto:pryzby@telsasoft.com]
Sent: 06 March, 2017 10:54 AM
To: Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com>
Cc: Nur Agus <nuragus.linux@gmail.com>; Jeff Janes <jeff.janes@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN
> > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id
> > WHERE p.domain_class_id IN (11) AND (p.modification_time >
> > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
> > 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >                      ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70
rows=1454751width=16) (actual time=27.265..142101.1 59 rows=1607491 loops=1) 
> >                            Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is
> well clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

Justin

________________________________

DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged
information.Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended
recipient,please contact the sender by reply email and destroy all copies of the original message. Check all
attachmentsfor viruses before opening them. All views or opinions presented in this e-mail are those of the author and
maynot reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From

Justin Pryzby

Date:

28 April 2017, 00:47:47

On Mon, Mar 06, 2017 at 12:17:22PM +0000, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;
>
>
> schemaname | tablename |     attname     | inherited | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation 
>
>
"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.00393333,0.00183333,0.00146667,0.0005,0.0003,6.66667e-05,6.66667e-05}"|""|0.889078

On Fri, Mar 03, 2017 at 12:44:07PM +0000, Dinesh Chandra 12108 wrote:
>        ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70 rows=1454751 width=16)
(actualtime=27.265..142101.1 59 rows=1607491 loops=1) 

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.

If you're still fighting this, would you send result of:

SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;

if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys.  The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.

Is the query much faster if you first reindex point_domain_class_id_index ?

This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520D6610.8040907@emulex.com
>
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@telsasoft.com
>
https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou1kn@4ax.com

Justin