Thread: [PERFORM] Performance issue in PostgreSQL server...

[PERFORM] Performance issue in PostgreSQL server...

From
Dinesh Chandra 12108
Date:

Dear Experts,

 

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

 

The configuration of database server is :

 

Architecture:         x86_64  

CPU op-mode(s):        32-bit, 64-bit

CPU’s : 8

Core(s) per socket:    4

Socket(s):             2

Model name:            Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

 

RAM : 32 GB

SWAP :8 Gb

 

Kernel parameter:

 

kernel.shmmax = 32212254720

kernel.shmall = 1073741824

 

 

Values of PostgreSQL.conf parameters are :

 

shared_buffers = 10GB

temp_buffers = 32MB

work_mem = 512MB                             

maintenance_work_mem = 2048MB

max_files_per_process = 2000

checkpoint_segments = 200

max_wal_senders = 5   

wal_buffers = -1                      # min 32kB, -1 sets based on shared_buffers

 

 

Queries taking lot of time are:

==================================

 

 

2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 

 

2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

 

 

Top command output:

 

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57

Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie

Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  32830016k total, 32142596k used,   687420k free,    77460k buffers

Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres

18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres

16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres

14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres

1214 root      20   0 15668 1848  896 S  1.0  0.0 130:46.43 top

13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres

20081 root      20   0 15668 1880  936 R  1.0  0.0   0:00.12 top

13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres

41107 root      20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3

2680 root      20   0     0    0    0 S  0.3  0.0 103:38.54 nfsiod

3558 root      20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server

15576 root      20   0     0    0    0 S  0.3  0.0   0:01.16 flush-253:1

18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres

20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash

24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd

28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres

1 root      20   0 19356 1188  996 S  0.0  0.0   0:05.00 init

 

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)|

 




DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From
Nur Agus
Date:
Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Experts,

 

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

 

The configuration of database server is :

 

Architecture:         x86_64  

CPU op-mode(s):        32-bit, 64-bit

CPU’s : 8

Core(s) per socket:    4

Socket(s):             2

Model name:            Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

 

RAM : 32 GB

SWAP :8 Gb

 

Kernel parameter:

 

kernel.shmmax = 32212254720

kernel.shmall = 1073741824

 

 

Values of PostgreSQL.conf parameters are :

 

shared_buffers = 10GB

temp_buffers = 32MB

work_mem = 512MB                             

maintenance_work_mem = 2048MB

max_files_per_process = 2000

checkpoint_segments = 200

max_wal_senders = 5   

wal_buffers = -1                      # min 32kB, -1 sets based on shared_buffers

 

 

Queries taking lot of time are:

==================================

 

 

2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 

 

2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

 

 

Top command output:

 

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57

Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie

Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  32830016k total, 32142596k used,   687420k free,    77460k buffers

Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres

18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres

16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres

14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres

1214 root      20   0 15668 1848  896 S  1.0  0.0 130:46.43 top

13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres

20081 root      20   0 15668 1880  936 R  1.0  0.0   0:00.12 top

13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres

41107 root      20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3

2680 root      20   0     0    0    0 S  0.3  0.0 103:38.54 nfsiod

3558 root      20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server

15576 root      20   0     0    0    0 S  0.3  0.0   0:01.16 flush-253:1

18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres

20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash

24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd

28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres

1 root      20   0 19356 1188  996 S  0.0  0.0   0:05.00 init

 

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)|

 




DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] Performance issue in PostgreSQL server...

From
Dinesh Chandra 12108
Date:

Dear Nur,

 

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 

 

                                                                                              QUERY PLAN

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------

Unique  (cost=1679730.32..1679837.46 rows=21428 width=8) (actual time=154753.528..155657.818 rows=1607489 loops=1)

   ->  Sort  (cost=1679730.32..1679783.89 rows=21428 width=8) (actual time=154753.514..155087.734 rows=4053270 loops=1)

         Sort Key: p.feature_id

         Sort Method: quicksort  Memory: 288302kB

         ->  Hash Join  (cost=1501657.09..1678188.87 rows=21428 width=8) (actual time=144146.620..152050.311 rows=4053270 loops=1)

               Hash Cond: (oe.evd_feature_id = p.feature_id)

               Join Filter: ((p.modification_time > '2015-05-10 03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > '2015-05-10 03:36:5

6.056+05:30'::timestamp with time zone))

               ->  Seq Scan on observation_evidence oe  (cost=0.00..121733.18 rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1)

               ->  Hash  (cost=1483472.70..1483472.70 rows=1454751 width=16) (actual time=144144.653..144144.653 rows=1607491 loops=1)

                     Buckets: 262144  Batches: 1  Memory Usage: 75352kB

                     ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1

59 rows=1607491 loops=1)

                           Index Cond: (domain_class_id = 11)

Total runtime: 155787.379 ms

(13 rows)

 

 

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.

------------------------------------------------------------------

Mobile: +91-9953975849 | Ext 1078 |dinesh.chandra@cyient.com

Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.

 

From: Nur Agus [mailto:nuragus.linux@gmail.com]
Sent: 03 March, 2017 5:54 PM
To: Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

 

Hello Dinesh,

 

You can try the EXPLAIN tool

 

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 

Then paste here the result.

 

Thanks

 

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Experts,

 

I need your suggestions to resolve the performance issue reported on our PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, some select queries with order by clause are taking lot of time in execution and forcing applications to give slow response.

 

The configuration of database server is :

 

Architecture:         x86_64  

CPU op-mode(s):        32-bit, 64-bit

CPU’s : 8

Core(s) per socket:    4

Socket(s):             2

Model name:            Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

 

RAM : 32 GB

SWAP :8 Gb

 

Kernel parameter:

 

kernel.shmmax = 32212254720

kernel.shmall = 1073741824

 

 

Values of PostgreSQL.conf parameters are :

 

shared_buffers = 10GB

temp_buffers = 32MB

work_mem = 512MB                             

maintenance_work_mem = 2048MB

max_files_per_process = 2000

checkpoint_segments = 200

max_wal_senders = 5   

wal_buffers = -1                      # min 32kB, -1 sets based on shared_buffers

 

 

Queries taking lot of time are:

==================================

 

 

2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 

 

2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute <unnamed>: SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY feature_id

 

 

Top command output:

 

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57

Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie

Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  32830016k total, 32142596k used,   687420k free,    77460k buffers

Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres

18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres

16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres

14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres

1214 root      20   0 15668 1848  896 S  1.0  0.0 130:46.43 top

13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres

20081 root      20   0 15668 1880  936 R  1.0  0.0   0:00.12 top

13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres

41107 root      20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3

2680 root      20   0     0    0    0 S  0.3  0.0 103:38.54 nfsiod

3558 root      20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server

15576 root      20   0     0    0    0 S  0.3  0.0   0:01.16 flush-253:1

18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres

20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash

24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd

28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres

1 root      20   0 19356 1188  996 S  0.0  0.0   0:05.00 init

 

Regards,

Dinesh Chandra

|Database administrator (Oracle/PostgreSQL)|

 

 



DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Check all attachments for viruses before opening them. All views or opinions presented in this e-mail are those of the author and may not reflect the opinion of Cyient or those of our affiliates.

 

Re: [PERFORM] Performance issue in PostgreSQL server...

From
Tom Lane
Date:
Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> writes:
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
evidence.observation_evidenceoe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
(p.modification_time> '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY
feature_id

I think that's a fundamentally slow query and you're not going to be able
to make it better without rethinking your requirements and/or data
representation.  As written, that requires the server to form the entire
join of p to oe on feature_id, with the only filter before the join being
the evidently-none-too-selective domain_class_id condition.  Only after
joining can it apply the OR condition.  So this is inherently processing a
lot of rows.

If the OR arms were individually pretty selective you could rewrite this
into a UNION of two joins, a la the discussion at
https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873489@BlueTreble.com
but given the dates involved I'm betting that won't help very much.

Or maybe you could try

select feature_id from p where domain_class_id IN (11) AND p.modification_time > '2015-05-10 00:06:56.056 IST'
intersect
select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 IST'
order by feature_id

although I'm not entirely certain that that has exactly the same
semantics (-ENOCAFFEINE), and it might still be none too quick.

            regards, tom lane


Re: [PERFORM] Performance issue in PostgreSQL server...

From
Jeff Janes
Date:
On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:

Dear Nur,

 

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

 


...
 

                     ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1

59 rows=1607491 loops=1)

                           Index Cond: (domain_class_id = 11)


Why wouldn't this be using a bitmap scan rather than a regular index scan?  It seems like it should prefer the bitmap scan, unless the table is well clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan =off to see what that gives.  If it gives a seq scan, then repeat with enable_seqscan also turned off.  Or If it gives the bitmap scan, then repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on (domain_class_id, modification_time) and hope for an index only scan.  Except that you are on 9.1, so first you would have to upgrade.  Which would allow you to use BUFFERS in the explain analyze, as well as track_io_timings, both of which would also be pretty nice to see.  Using 9.1 is like having one hand tied behind your back.  

Also, any idea why this execution of this query 15 is times faster than the execution you found in the log file?  Was the top output you showed in the first email happening at the time the really slow query was running, or was that from a different period?

Cheers,

Jeff

Re: [PERFORM] Performance issue in PostgreSQL server...

From
Justin Pryzby
Date:
On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >                      ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70
rows=1454751width=16) (actual time=27.265..142101.1 59 rows=1607491 loops=1) 
> >                            Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

Justin


Re: [PERFORM] Performance issue in PostgreSQL server...

From
Dinesh Chandra 12108
Date:
Dear Justin,

Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;


schemaname | tablename |     attname     | inherited | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation 


"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.00393333,0.00183333,0.00146667,0.0005,0.0003,6.66667e-05,6.66667e-05}"|""|0.889078


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.


-----Original Message-----
From: Justin Pryzby [mailto:pryzby@telsasoft.com]
Sent: 06 March, 2017 10:54 AM
To: Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com>
Cc: Nur Agus <nuragus.linux@gmail.com>; Jeff Janes <jeff.janes@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <Dinesh.Chandra@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN
> > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id
> > WHERE p.domain_class_id IN (11) AND (p.modification_time >
> > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
> > 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >                      ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70
rows=1454751width=16) (actual time=27.265..142101.1 59 rows=1607491 loops=1) 
> >                            Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is
> well clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

Justin

________________________________

DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged
information.Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended
recipient,please contact the sender by reply email and destroy all copies of the original message. Check all
attachmentsfor viruses before opening them. All views or opinions presented in this e-mail are those of the author and
maynot reflect the opinion of Cyient or those of our affiliates. 


Re: [PERFORM] Performance issue in PostgreSQL server...

From
Justin Pryzby
Date:
On Mon, Mar 06, 2017 at 12:17:22PM +0000, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;
>
>
> schemaname | tablename |     attname     | inherited | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation 
>
>
"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.00393333,0.00183333,0.00146667,0.0005,0.0003,6.66667e-05,6.66667e-05}"|""|0.889078

On Fri, Mar 03, 2017 at 12:44:07PM +0000, Dinesh Chandra 12108 wrote:
>        ->  Index Scan using point_domain_class_id_index on point p  (cost=0.00..1483472.70 rows=1454751 width=16)
(actualtime=27.265..142101.1 59 rows=1607491 loops=1) 

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.

If you're still fighting this, would you send result of:

SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;

if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys.  The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.

Is the query much faster if you first reindex point_domain_class_id_index ?

This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520D6610.8040907@emulex.com
>
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@telsasoft.com
>
https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou1kn@4ax.com

Justin