Re: Performance issue - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Performance issue
Date
Msg-id 20200615000516.GP14879@telsasoft.com
Whole thread Raw
In response to Performance issue  (Nagaraj Raj <nagaraj.sf@yahoo.com>)
List pgsql-performance
On Sun, Jun 14, 2020 at 10:45:52PM +0000, Nagaraj Raj wrote:
> My PostgreSQL server 10.11 running on windows which are running very slow. DB has two tables with ~200Mil records in
each.user queries are very slow even explain analyze also taking a longer.
 
> 
> Could you please help me to tune this query and any suggestions to improve system performance?

> CREATE TABLE test1
> (
>     individual_entity_proxy_id bigint NOT NULL,
...
>     CONSTRAINT pk_i_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
> 
> );
> CREATE INDEX indx_prospect_indv_entty_id ON test1 USING btree (individual_entity_proxy_id )

This index is redundant with the primary key, which implicitly creates a unique
index.

The table structure seems strange: you have two tables with the same PK column,
which is how they're being joined.  It seems like that's better expressed as a
single table with all the columns rather than separate tables (but see below).

> explain analyze select COUNT(*) as "DII_1"
>   from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID"
>                                 from test1 table0
>                                 inner join test2 table1

I think this may be better written as something like:

| SELECT COUNT(id) FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.id=t0.id AND ...) AND ...

It's guaranteed to be distinct since it's a PK column, so it doesn't need a
"Unique" node.

I think it might prefer an seq scan on t0, which might be good since it seems
to be returning over 10% of records.

> Explain Analyze :
> 
> "Aggregate  (cost=5345632.91..5345632.92 rows=1 width=8) (actual time=442688.462..442688.462 rows=1 loops=1)"
> "  ->  Unique  (cost=150.13..4943749.39 rows=32150682 width=8) (actual time=0.022..439964.214 rows=32368180
loops=1)"
> "        ->  Merge Join  (cost=150.13..4863372.68 rows=32150682 width=8) (actual time=0.021..435818.276 rows=32368180
loops=1)"
> "              Merge Cond: (table0.individual_entity_proxy_id = table1.individual_entity_proxy_id)"
> "              ->  Index Scan using indx_prospect_indv_entty_id on test1 table0  (cost=0.56..2493461.92 rows=32233405
width=8)(actual time=0.011..63009.551 rows=32368180 loops=1)"
 
> "                    Filter: ((direct_mail_preference IS NULL) AND ((last_contacted_dm_datetime IS NULL) OR
(last_contacted_dm_datetime< '2020-03-15 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind =
'N'::bpchar)AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address =
'Y'::bpchar)AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))"
 
> "                    Rows Removed by Filter: 7709177"
> "              ->  Index Scan using pk_entity_proxy_id on test2 table1  (cost=0.56..1867677.94 rows=40071417 width=8)
(actualtime=0.008..363534.437 rows=40077727 loops=1)"
 
> "                    Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND
((cstmr_prspct_ind)::text= 'Prospect'::text))"
 
> "                    Rows Removed by Filter: 94756"

It might help to show explain(ANALYZE,BUFFERS).

It looks like test2/table1 index scan is a lot slower than table0.
Maybe table1 gets lots of updates, so isn't clustered on its primary key, so
the index scan is highly random.  You could check the "correlation" of its PK
ID column:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

If true, that would be a good reason to have separate tables.

> vCore: 32

Possibly it would be advantageous to use parallel query.
A better query+plan might allow that.

-- 
Justin



pgsql-performance by date:

Previous
From: David Rowley
Date:
Subject: Re: Performance issue
Next
From: Michael Lewis
Date:
Subject: Re: view reading information_schema is slow in PostgreSQL 12