Home > mailing lists

Thread: How to improve the performance of my SQL query？

How to improve the performance of my SQL query？

From

gzh

Date:

20 July 2023, 10:09:22

Hi everyone,

I'm running into some performance issues with my SQL query.

The following SQL query is taking a long time to execute.

explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = '009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = '009'

AND TBL_INF.RY_CD = '000001'

)

----- Execution Plan -----

Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)

-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)

-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)

-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))

Heap Fetches: 1

-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms

--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

When I take the following sql statement, the index works fine and the query is fast.

select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '000001'

The amount of data in the table is as follows.

TBL_SHA 38700325

TBL_INF 35546

Any suggestions for improving the performance of the query would be greatly appreciated.

Thanks in advance!

Re: How to improve the performance of my SQL query？

From

Erik Wienhold

Date:

20 July 2023, 11:07:15

> On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote:
>
> I'm running into some performance issues with my SQL query.
> The following SQL query is taking a long time to execute.
>
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
> (select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = '009'
> AND TBL_INF.RY_CD = '000001'
> )
> ----- Execution Plan -----
> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1
loops=1)
> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
> Heap Fetches: 1
> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000
loops=1)
> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
> Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms
> --------------------------------------------------------------------------------
>
> The index is defined as follows.
>
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>
>
> When I take the following sql statement, the index works fine and the query
> is fast.
>
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '000001'
>
> The amount of data in the table is as follows.
> TBL_SHA 38700325
> TBL_INF 35546
>
> Any suggestions for improving the performance of the query would be greatly
> appreciated.

You can try EXISTS instead of IN to correlate the subquery and the outer query:

    SELECT count(et_cd)
    FROM tbl_sha
    WHERE tbl_sha.ms_cd = '009'
    AND tbl_sha.etrys = '000001'
    AND EXISTS (
      SELECT
      FROM tbl_inf
      WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
      AND tbl_inf.ry_cd = tbl_sha.etrys
    )

--
Erik

Re: How to improve the performance of my SQL query？

From

"Peter J. Holzer"

Date:

20 July 2023, 12:44:51

On 2023-07-20 15:09:22 +0800, gzh wrote:
> Hi everyone,
>
>
> I'm running into some performance issues with my SQL query.
>
> The following SQL query is taking a long time to execute.
>
>
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = '009'
>    AND TBL_INF.RY_CD = '000001'
>    )
>
> ----- Execution Plan -----
[...]
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18)
>                                        (actual time=97264.138..123554.792 rows=3200000 loops=1)
>                     Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>                     Rows Removed by Filter: 32000325
[...]
> --------------------------------------------------------------------------------
>
>
> The index is defined as follows.
>
>
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>
>
>
> When I take the following sql statement, the index works fine and the query is
> fast.
>
>
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '000001'

What's the plan for that query?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re:Re: How to improve the performance of my SQL query？

From

gzh

Date:

20 July 2023, 14:36:04

Thank you very much for taking the time to reply to my question.

Sorry, I provided incorrect information.

The index also does not work in the following query statement.

> select COUNT(ET_CD)

> from TBL_SHA

> WHERE MS_CD = '009'

> AND ETRYS = '000001'

QUERY PLAN

Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)

-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)

-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)

-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 11833442

Planning Time: 0.118 ms

Execution Time: 128668.290 ms

The TBL_SHA table has another index, as shown below.

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

When I take the following query statement, the result is returned quickly.

Why does index_search_01 always not work?

explain analyze

select TO_CHAR(MAX(TBL_SHA.ET_DAY),'YYYYMMDD') AS ET_DAY

from TBL_SHA

WHERE MS_CD = '008'

AND ET_CD = '000003'

QUERY PLAN

Limit (cost=4.11..4.13 rows=1 width=32) (actual time=0.043..0.044 rows=1 loops=1)

InitPlan 1 (returns $0)

-> Limit (cost=0.56..4.11 rows=1 width=8) (actual time=0.032..0.033 rows=1 loops=1)

-> Index Scan Backward using index_search_01 on TBL_SHA (cost=0.56..10836962.40 rows=3054052 width=8) (actual time=0.032..0.032 rows=1 loops=1)

Index Cond: ((MS_CD = '008'::bpchar) AND (ent_day IS NOT NULL))

Filter: (ET_CD = '000003'::bpchar)

-> Result (cost=4.11..4.13 rows=1 width=32) (actual time=0.042..0.042 rows=1 loops=1)

Planning Time: 0.228 ms

Execution Time: 0.070 ms


At 2023-07-20 16:07:15, "Erik Wienhold" <ewie@ewie.name> wrote:
>> On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote:
>>
>> I'm running into some performance issues with my SQL query.
>> The following SQL query is taking a long time to execute.
>>
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = '009'
>> and TBL_SHA.ETRYS in
>> (select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = '009'
>> AND TBL_INF.RY_CD = '000001'
>> )
>> ----- Execution Plan -----
>> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
>> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
>> Heap Fetches: 1
>> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
>> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>> Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>> --------------------------------------------------------------------------------
>>
>> The index is defined as follows.
>>
>> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>>
>>
>> When I take the following sql statement, the index works fine and the query
>> is fast.
>>
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE MS_CD = '009'
>> AND ETRYS = '000001'
>>
>> The amount of data in the table is as follows.
>> TBL_SHA 38700325
>> TBL_INF 35546
>>
>> Any suggestions for improving the performance of the query would be greatly
>> appreciated.
>
>You can try EXISTS instead of IN to correlate the subquery and the outer query:
>
>	SELECT count(et_cd)
>	FROM tbl_sha
>	WHERE tbl_sha.ms_cd = '009'
>	AND tbl_sha.etrys = '000001'
>	AND EXISTS (
>	  SELECT
>	  FROM tbl_inf
>	  WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
>	  AND tbl_inf.ry_cd = tbl_sha.etrys
>	)
>
>--
>Erik

Re: How to improve the performance of my SQL query？

From

Laurenz Albe

Date:

20 July 2023, 14:58:59

On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:
> I'm running into some performance issues with my SQL query.
> The following SQL query is taking a long time to execute.
>
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = '009'
>    AND TBL_INF.RY_CD = '000001'
>    )
> ----- Execution Plan -----
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000
loops=1)
>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual
time=0.025..0.030rows=1 loops=1) 
>                     Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
>                     Heap Fetches: 1
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual
time=97264.138..123554.792rows=3200000 loops=1) 
>                     Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>                     Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms
> --------------------------------------------------------------------------------
>
> The index is defined as follows.
>
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

Actual rows = 3200000, rows removed by filter is ten times as much.
It should use an index.

> When I take the following sql statement, the index works fine and the query is fast.
>
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '000001'
>
> The amount of data in the table is as follows.
> TBL_SHA    38700325
> TBL_INF    35546

This looks very much like it is a problem with the data types.
I see that you are using "character", which you shouldn't do.

What I cannot see is if the columns are defined as "character" or whether you bind
the parameters as "character".  Can you show us the table definition of "TBL_SHA"
and "TBL_INF"?

Yours,
Laurenz Albe

Re: How to improve the performance of my SQL query？

From

gzh

Date:

20 July 2023, 17:14:52

Thank you for your reply.

The information I provided is incorrect, please see my previous reply.

>What I cannot see is if the columns are defined as "character" or whether you bind
>the parameters as "character".  Can you show us the table definition of "TBL_SHA"
>and "TBL_INF"?
For information security reasons, I can't provide the table definition, these columns are defined as "character".


At 2023-07-20 19:58:59, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:
>On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:
>> I'm running into some performance issues with my SQL query.
>> The following SQL query is taking a long time to execute.
>> 
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = '009'
>> and TBL_SHA.ETRYS in
>>    (select TBL_INF.RY_CD
>>     from TBL_INF
>>     WHERE TBL_INF.MS_CD = '009'
>>    AND TBL_INF.RY_CD = '000001'
>>    )
>> ----- Execution Plan -----
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
>>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>>                     Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
>>                     Heap Fetches: 1
>>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
>>                     Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>>                     Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>> --------------------------------------------------------------------------------
>> 
>> The index is defined as follows.
>> 
>> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
>
>Actual rows = 3200000, rows removed by filter is ten times as much.
>It should use an index.
>
>> When I take the following sql statement, the index works fine and the query is fast.
>> 
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE MS_CD = '009'
>> AND ETRYS = '000001'
>> 
>> The amount of data in the table is as follows.
>> TBL_SHA    38700325
>> TBL_INF    35546
>
>This looks very much like it is a problem with the data types.
>I see that you are using "character", which you shouldn't do.
>
>What I cannot see is if the columns are defined as "character" or whether you bind
>the parameters as "character".  Can you show us the table definition of "TBL_SHA"
>and "TBL_INF"?
>
>Yours,
>Laurenz Albe

Re: How to improve the performance of my SQL query？

From

Laurenz Albe

Date:

20 July 2023, 17:48:19

On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:
> The information I provided is incorrect, please see my previous reply.

My question remains: I would like to see the table definitions.

Also, did you ANALYZE the tables?

Yours,
Laurenz Albe

Re: Re: How to improve the performance of my SQL query？

From

jian he

Date:

20 July 2023, 18:20:16

On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote:
>
>
> Thank you very much for taking the time to reply to my question.
>
>
> Sorry, I provided incorrect information.
>
> The index also does not work in the following query statement.
>
>
> > select COUNT(ET_CD)
>
> > from TBL_SHA
>
> > WHERE MS_CD = '009'
>
> > AND ETRYS = '000001'
>
>
> QUERY PLAN
>
> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)
>
>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1
loops=1)
>
>         ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)
>
>               Workers Planned: 2
>
>               Workers Launched: 2
>
>               ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258
rows=1loops=3) 
>
>                     ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 rows=1237762 width=9) (actual
time=75357.455..128531.615rows=1066667 loops=3) 
>
>                           Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>
>                           Rows Removed by Filter: 11833442
>
> Planning Time: 0.118 ms
>
> Execution Time: 128668.290 ms
>
>
> The TBL_SHA table has another index, as shown below.
>
>
> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
>
> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)
>

>                           Rows Removed by Filter: 11833442
select (38700325 - 11833442) /38700325.0;
is 0.69 approx.
So I think it says around 69%  of rows satisfy the query condition.

but I am not sure in the following 2 cases, whether the actual rows
are noisy or not. I can not find the doc explaining it.
> Partial Aggregate  (actual time=128655.256..128655.258 rows=1 loops=3)
> Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)

Re: How to improve the performance of my SQL query？

From

gzh

Date:

21 July 2023, 04:43:52

The definitions of the columns used in SQL are as follows.

TBL_SHA

ms_cd character(6) NOT NULL -- PRIMARY KEY

et_cd character(8)

etrys character(8)

TBL_INF

ms_cd character(6) NOT NULL -- PRIMARY KEY

ry_cd character(8) NOT NULL -- PRIMARY KEY

I made some modifications to the data, and I realized that I should not change the length of the data.

The actual data and its corresponding execution plan are shown below.

explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)

-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)

-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)

-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))

Heap Fetches: 1

-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)

Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms


At 2023-07-20 22:48:19, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:
>On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:
>> The information I provided is incorrect, please see my previous reply.
>
>My question remains: I would like to see the table definitions.
>
>Also, did you ANALYZE the tables?
>
>Yours,
>Laurenz Albe

Re:How to improve the performance of my SQL query？

From

gzh

Date:

21 July 2023, 05:02:51

>select (38700325 - 11833442) /38700325.0;
>is 0.69 approx.
>So I think it says around 69%  of rows satisfy the query condition.

Thank you for your reply. I have learned a lot from it.


At 2023-07-20 23:20:16, "jian he" <jian.universality@gmail.com> wrote:
>On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote:
>>
>>
>> Thank you very much for taking the time to reply to my question.
>>
>>
>> Sorry, I provided incorrect information.
>>
>> The index also does not work in the following query statement.
>>
>>
>> > select COUNT(ET_CD)
>>
>> > from TBL_SHA
>>
>> > WHERE MS_CD = '009'
>>
>> > AND ETRYS = '000001'
>>
>>
>> QUERY PLAN
>>
>> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)
>>
>>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)
>>
>>         ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)
>>
>>               Workers Planned: 2
>>
>>               Workers Launched: 2
>>
>>               ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
>>
>>                     ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)
>>
>>                           Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>>
>>                           Rows Removed by Filter: 11833442
>>
>> Planning Time: 0.118 ms
>>
>> Execution Time: 128668.290 ms
>>
>>
>> The TBL_SHA table has another index, as shown below.
>>
>>
>> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
>>
>> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)
>>
>
>>                           Rows Removed by Filter: 11833442
>select (38700325 - 11833442) /38700325.0;
>is 0.69 approx.
>So I think it says around 69%  of rows satisfy the query condition.
>
>but I am not sure in the following 2 cases, whether the actual rows
>are noisy or not. I can not find the doc explaining it.
>> Partial Aggregate  (actual time=128655.256..128655.258 rows=1 loops=3)
>> Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)

Re: How to improve the performance of my SQL query？

From

Laurenz Albe

Date:

24 July 2023, 08:43:46

On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:
> The definitions of the columns used in SQL are as follows.
>
> TBL_SHA
>
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
>
> TBL_INF
>
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> ry_cd character(8) NOT NULL       -- PRIMARY KEY
>
> I made some modifications to the data, and I realized that I should not change the length of the data. 
> The actual data and its corresponding execution plan are shown below.
>
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = 'MLD009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = 'MLD009'
>    AND TBL_INF.RY_CD = '00000001'
>    )
> ----- Execution Plan -----
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000
loops=1)
>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual
time=0.025..0.030rows=1 loops=1) 
>                     Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
>                     Heap Fetches: 1
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual
time=97264.138..123554.792rows=3200000 loops=1) 
>                     Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
>                     Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms

Thanks.  That should definitely use a b-tree index defined on (ms_cd, etrsys).

Did you change any parameters that have an impact on query planning?
You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

Yours,
Laurenz Albe

Re: How to improve the performance of my SQL query？

From

gzh

Date:

24 July 2023, 12:54:19

>Did you change any parameters that have an impact on query planning?

>You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

I added some parameters and re-executed the Execution Plan.

Except for the index not taking effect, I still don't know the reason why the index is not working.

Is it because there is too much data that meets the conditions?

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)

Output: (count(tbl_sha.et_cd))

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)

Output: count(tbl_sha.et_cd)

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)

Output: tbl_sha.et_cd

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)

Output: tbl_inf.ms_cd, tbl_inf.ry_cd

Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=4

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)

Output: tbl_sha.et_cd, tbl_sha.etrys

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

-> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)

Output: tbl_sha.et_cd, tbl_sha.etrys

Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))

Rows Removed by Filter: 14062278

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1

Buffers: shared hit=17875 read=706862

I/O Timings: read=119193.744

Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1

Buffers: shared hit=19813 read=706359

I/O Timings: read=119386.899

Planning:

Buffers: shared hit=42

Planning Time: 1.024 ms

Execution Time: 128717.731 ms


At 2023-07-24 13:43:46, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:
>On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:
>> The definitions of the columns used in SQL are as follows.
>> 
>> TBL_SHA
>> 
>> ms_cd character(6) NOT NULL       -- PRIMARY KEY
>> et_cd character(8)
>> etrys character(8)
>> 
>> TBL_INF
>> 
>> ms_cd character(6) NOT NULL       -- PRIMARY KEY
>> ry_cd character(8) NOT NULL       -- PRIMARY KEY
>> 
>> I made some modifications to the data, and I realized that I should not change the length of the data. 
>> The actual data and its corresponding execution plan are shown below.
>> 
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>> and TBL_SHA.ETRYS in
>>    (select TBL_INF.RY_CD
>>     from TBL_INF
>>     WHERE TBL_INF.MS_CD = 'MLD009'
>>    AND TBL_INF.RY_CD = '00000001'
>>    )
>> ----- Execution Plan -----
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
>>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>>                     Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
>>                     Heap Fetches: 1
>>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
>>                     Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
>>                     Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>
>Thanks.  That should definitely use a b-tree index defined on (ms_cd, etrsys).
>
>Did you change any parameters that have an impact on query planning?
>You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
>
>Yours,
>Laurenz Albe

Re: How to improve the performance of my SQL query？

From

jian he

Date:

24 July 2023, 17:42:01

On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote:
>
> >Did you change any parameters that have an impact on query planning?
>
> >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
>
> I added some parameters and re-executed the Execution Plan.
>
> Except for the index not taking effect, I still don't know the reason why the index is not working.
>
> Is it because there is too much data that meets the conditions?
>
>
> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)
>
> select COUNT(ET_CD)
>
> from TBL_SHA
>
> WHERE TBL_SHA.MS_CD = 'MLD009'
>
> and TBL_SHA.ETRYS in
>
>    (select TBL_INF.RY_CD
>
>     from TBL_INF
>
>     WHERE TBL_INF.MS_CD = 'MLD009'
>
>    AND TBL_INF.RY_CD = '00000001'
>
>    )
>
>
> ----- Execution Plan -----
>
> Limit  (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)
>
>   Output: (count(tbl_sha.et_cd))
>
>   Buffers: shared hit=58948 read=2112758
>
>   I/O Timings: read=357249.120
>
>   ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)
>
>         Output: count(tbl_sha.et_cd)
>
>         Buffers: shared hit=58948 read=2112758
>
>         I/O Timings: read=357249.120
>
>         ->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500
loops=1)
>
>               Output: tbl_sha.et_cd
>
>               Buffers: shared hit=58948 read=2112758
>
>               I/O Timings: read=357249.120
>
>               ->  Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf  (cost=0.29..8.31 rows=1 width=9) (actual
time=0.046..0.051rows=1 loops=1) 
>
>                     Output: tbl_inf.ms_cd, tbl_inf.ry_cd
>
>                     Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))
>
>                     Heap Fetches: 1
>
>                     Buffers: shared hit=4
>
>               ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087
rows=2613500loops=1) 
>
>                     Output: tbl_sha.et_cd, tbl_sha.etrys
>
>                     Workers Planned: 2
>
>                     Workers Launched: 2
>
>                     Buffers: shared hit=58944 read=2112758
>
>                     I/O Timings: read=357249.120
>
>                     ->  Parallel Seq Scan on mtpdb.tbl_sha  (cost=0.00..2454183.88 rows=1012745 width=18) (actual
time=952.728..127583.089rows=871167 loops=3) 
>
>                           Output: tbl_sha.et_cd, tbl_sha.etrys
>
>                           Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))
>
>                           Rows Removed by Filter: 14062278
>
>                           Buffers: shared hit=58944 read=2112758
>
>                           I/O Timings: read=357249.120
>
>                           Worker 0:  actual time=1432.292..127762.181 rows=988036 loops=1
>
>                             Buffers: shared hit=17875 read=706862
>
>                             I/O Timings: read=119193.744
>
>                           Worker 1:  actual time=1425.878..127786.777 rows=992381 loops=1
>
>                             Buffers: shared hit=19813 read=706359
>
>                             I/O Timings: read=119386.899
>
> Planning:
>
>   Buffers: shared hit=42
>
> Planning Time: 1.024 ms
>
> Execution Time: 128717.731 ms
>
>

I think the whole query can just:
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';

> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = 'MLD009'
>    AND TBL_INF.RY_CD = '00000001'
>    )

if subquery after IN clause part, no rows returned then the whole
query would return zero row.
if many duplicates rows returned, then there is no point of evaluate
something like {1 in (1,1,1,1,1,1,)}

Re: How to improve the performance of my SQL query？

From

Laurenz Albe

Date:

24 July 2023, 18:58:50

On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:
> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) 

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
Or do you have a version that is too old for SETTINGS?

One other idea: check if the index is INVALID (this will
be visible if you run "\d tablenane" in "psql").
Invalid indexes won't be used.

Yours,
Laurenz Albe

Re: How to improve the performance of my SQL query？

From

gzh

Date:

25 July 2023, 06:11:13

>I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...

>Or do you have a version that is too old for SETTINGS?

Sorry. Please refer to the following execution plan.

EXPLAIN (ANALYZE, BUFFERS, SETTINGS)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=131595.624..131599.529 rows=1 loops=1)

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.341..131256.445 rows=2513500 loops=1)

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.471..1.476 rows=1 loops=1)

Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=2 read=2

I/O Timings: read=1.412

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=0.866..130696.440 rows=2513500 loops=1)

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=11 read=2171691

I/O Timings: read=365862.464

-> Parallel Seq Scan on tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=0.215..130476.981 rows=837833 loops=3)

Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))

Rows Removed by Filter: 13728945

Buffers: shared hit=11 read=2171691

I/O Timings: read=365862.464

Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb'

Planning:

Buffers: shared hit=167 read=7

I/O Timings: read=2.735

Planning Time: 3.733 ms

Execution Time: 131599.594 ms


At 2023-07-24 23:58:50, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:
>On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:
>> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) 
>
>I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
>Or do you have a version that is too old for SETTINGS?
>
>One other idea: check if the index is INVALID (this will
>be visible if you run "\d tablenane" in "psql").
>Invalid indexes won't be used.
>
>Yours,
>Laurenz Albe

Re: How to improve the performance of my SQL query？

From

gzh

Date:

25 July 2023, 06:58:42

Thank you for your reply.

>I think the whole query can just:
>select COUNT(ET_CD)
>from TBL_SHA
>WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';

This is related to the business logic.

>if many duplicates rows returned, then there is no point of evaluate
>something like {1 in (1,1,1,1,1,1,)}

Because the primary key of the 'tbl_inf' table only consists of 'ms_cd' and 'ry_cd' columns, the subquery will not return duplicate rows.


At 2023-07-24 22:42:01, "jian he" <jian.universality@gmail.com> wrote:
>On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote:
>>
>> >Did you change any parameters that have an impact on query planning?
>>
>> >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
>>
>> I added some parameters and re-executed the Execution Plan.
>>
>> Except for the index not taking effect, I still don't know the reason why the index is not working.
>>
>> Is it because there is too much data that meets the conditions?
>>
>>
>> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)
>>
>> select COUNT(ET_CD)
>>
>> from TBL_SHA
>>
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>>
>> and TBL_SHA.ETRYS in
>>
>>    (select TBL_INF.RY_CD
>>
>>     from TBL_INF
>>
>>     WHERE TBL_INF.MS_CD = 'MLD009'
>>
>>    AND TBL_INF.RY_CD = '00000001'
>>
>>    )
>>
>>
>> ----- Execution Plan -----
>>
>> Limit  (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)
>>
>>   Output: (count(tbl_sha.et_cd))
>>
>>   Buffers: shared hit=58948 read=2112758
>>
>>   I/O Timings: read=357249.120
>>
>>   ->  Aggregate  (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)
>>
>>         Output: count(tbl_sha.et_cd)
>>
>>         Buffers: shared hit=58948 read=2112758
>>
>>         I/O Timings: read=357249.120
>>
>>         ->  Nested Loop  (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)
>>
>>               Output: tbl_sha.et_cd
>>
>>               Buffers: shared hit=58948 read=2112758
>>
>>               I/O Timings: read=357249.120
>>
>>               ->  Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf  (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)
>>
>>                     Output: tbl_inf.ms_cd, tbl_inf.ry_cd
>>
>>                     Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))
>>
>>                     Heap Fetches: 1
>>
>>                     Buffers: shared hit=4
>>
>>               ->  Gather  (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)
>>
>>                     Output: tbl_sha.et_cd, tbl_sha.etrys
>>
>>                     Workers Planned: 2
>>
>>                     Workers Launched: 2
>>
>>                     Buffers: shared hit=58944 read=2112758
>>
>>                     I/O Timings: read=357249.120
>>
>>                     ->  Parallel Seq Scan on mtpdb.tbl_sha  (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)
>>
>>                           Output: tbl_sha.et_cd, tbl_sha.etrys
>>
>>                           Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))
>>
>>                           Rows Removed by Filter: 14062278
>>
>>                           Buffers: shared hit=58944 read=2112758
>>
>>                           I/O Timings: read=357249.120
>>
>>                           Worker 0:  actual time=1432.292..127762.181 rows=988036 loops=1
>>
>>                             Buffers: shared hit=17875 read=706862
>>
>>                             I/O Timings: read=119193.744
>>
>>                           Worker 1:  actual time=1425.878..127786.777 rows=992381 loops=1
>>
>>                             Buffers: shared hit=19813 read=706359
>>
>>                             I/O Timings: read=119386.899
>>
>> Planning:
>>
>>   Buffers: shared hit=42
>>
>> Planning Time: 1.024 ms
>>
>> Execution Time: 128717.731 ms
>>
>>
>
>I think the whole query can just:
>select COUNT(ET_CD)
>from TBL_SHA
>WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';
>
>> and TBL_SHA.ETRYS in
>>    (select TBL_INF.RY_CD
>>     from TBL_INF
>>     WHERE TBL_INF.MS_CD = 'MLD009'
>>    AND TBL_INF.RY_CD = '00000001'
>>    )
>
>if subquery after IN clause part, no rows returned then the whole
>query would return zero row.
>if many duplicates rows returned, then there is no point of evaluate
>something like {1 in (1,1,1,1,1,1,)}

Re: How to improve the performance of my SQL query？

From

Laurenz Albe

Date:

25 July 2023, 08:56:25

On Tue, 2023-07-25 at 11:11 +0800, gzh wrote:
> > I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
> > Or do you have a version that is too old for SETTINGS?
> Sorry. Please refer to the following execution plan.
>
> [...]
> Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb'
> [...]

No unusual settings.  So that theory is dead.

> > One other idea: check if the index is INVALID (this will
> > be visible if you run "\d tablenane" in "psql").
> > Invalid indexes won't be used.

Did you check that?

After that, I am out of ideas.

Yours,
Laurenz Albe

Re: Re: How to improve the performance of my SQL query？

From

David Rowley

Date:

25 July 2023, 15:58:02

On Thu, 20 Jul 2023 at 23:36, gzh <gzhcoder@126.com> wrote:
>
>
> Thank you very much for taking the time to reply to my question.
>
>
> Sorry, I provided incorrect information.
>
> The index also does not work in the following query statement.
>
>
> > select COUNT(ET_CD)
> > from TBL_SHA
> > WHERE MS_CD = '009'
> > AND ETRYS = '000001'
>
> QUERY PLAN
> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)
>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1
loops=1)
>         ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)
>               Workers Planned: 2
>               Workers Launched: 2
>               ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258
rows=1loops=3)
 
>                     ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 rows=1237762 width=9) (actual
time=75357.455..128531.615rows=1066667 loops=3)
 
>                           Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
>                           Rows Removed by Filter: 11833442
> Planning Time: 0.118 ms
> Execution Time: 128668.290 ms
>
> The TBL_SHA table has another index, as shown below.
>
> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)
>
> When I take the following query statement, the result is returned quickly.
> Why does index_search_01 always not work?

The method to access the table is decided by the query planner based
on costs.  The costs are driven off the row estimates which are driven
from table statistics.  If the table statistics, for example say that
99% of rows have MS_CD = '009', then scanning an index on MS_CD is
unlikely to be a good idea as that would likely require random access
to the heap.  It's likely better to perform a table scan and then just
filter out the 1% of rows that don't match.

Try executing the query after having done:

SET enable_seqscan TO off;

What plan does it use now?

Is that plan faster or slower than the seq scan plan?

David

Re: How to improve the performance of my SQL query？

From

David Rowley

Date:

25 July 2023, 16:04:16

On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote:
>
> The definitions of the columns used in SQL are as follows.
>
> TBL_SHA
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = 'MLD009'
> and TBL_SHA.ETRYS in
>    (select TBL_INF.RY_CD
>     from TBL_INF
>     WHERE TBL_INF.MS_CD = 'MLD009'
>    AND TBL_INF.RY_CD = '00000001'
>    )
> ----- Execution Plan -----
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000
loops=1)
>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual
time=0.025..0.030rows=1 loops=1)
 
>                     Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
>                     Heap Fetches: 1
>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual
time=97264.138..123554.792rows=3200000 loops=1)
 
>                     Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
>                     Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not
accurately described the table definition, however.

David

Re: How to improve the performance of my SQL query？

From

gzh

Date:

26 July 2023, 10:46:16

Thank you very much for taking the time to reply to my question. 

>You might want to check your description of the table definitions.>Going by the above EXPLAIN ANALYZE output, it very much does not look
>like ms_cd is the primary key of TBL_SHA. If it is then it's very
>weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
>data corruption if that's the case. I suspect you've just not
>accurately described the table definition, however.
The primary key of the SHA table has six fields, and ms_cd is just one of them. 
I'm sorry, I didn't make that clear.

>Try executing the query after having done:
>
>SET enable_seqscan TO off;
>
>What plan does it use now?
>
>Is that plan faster or slower than the seq scan plan?
There's improvement, but it's still quite slow.

QUERY PLAN (enable_seqscan=on)
Limit  (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.822..167183.058 rows=1 loops=1)  ->  Aggregate  (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.820..167183.056 rows=1 loops=1)        ->  Nested Loop  (cost=1000.29..2688558.85 rows=1983209 width=9) (actual time=43544.753..166906.304 rows=2413500 loops=1)              ->  Index Only Scan using tbl_inf_pkc on tbl_inf  (cost=0.29..8.31 rows=1 width=9) (actual time=1.034..1.038 rows=1 loops=1)                    Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))                    Heap Fetches: 1              ->  Gather  (cost=1000.00..2668718.45 rows=1983209 width=18) (actual time=43543.714..166447.333 rows=2413500 loops=1)                    Workers Planned: 2                    Workers Launched: 2                    ->  Parallel Seq Scan on tbl_sha  (cost=0.00..2469397.55 rows=826337 width=18) (actual time=43537.056..166225.162 rows=804500 loops=3)                          Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))                          Rows Removed by Filter: 15362328
Planning Time: 2.942 ms
Execution Time: 167183.133 ms

SET enable_seqscan TO off;

QUERY PLAN (enable_seqscan=off)
Limit  (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.419..22320.102 rows=1 loops=1)  ->  Aggregate  (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.418..22320.100 rows=1 loops=1)        ->  Nested Loop  (cost=93112.74..2876169.16 rows=1921561 width=9) (actual time=265.880..22000.432 rows=2413500 loops=1)              ->  Index Only Scan using tbl_inf_pkc on tbl_inf  (cost=0.29..8.31 rows=1 width=9) (actual time=0.013..0.020 rows=1 loops=1)                    Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))                    Heap Fetches: 1              ->  Gather  (cost=93112.45..2856945.24 rows=1921561 width=18) (actual time=265.864..21535.325 rows=2413500 loops=1)                    Workers Planned: 2                    Workers Launched: 2                    ->  Parallel Bitmap Heap Scan on tbl_sha  (cost=92112.45..2663789.14 rows=800650 width=18) (actual time=260.540..21442.169 rows=804500 loops=3)                          Recheck Cond: (ms_cd = 'MLD009'::bpchar)                          Rows Removed by Index Recheck: 49                          Filter: (etrys = '00000001'::bpchar)                          Rows Removed by Filter: 295500                          Heap Blocks: exact=13788 lossy=10565                          ->  Bitmap Index Scan on index_search_04_mscd_cdate  (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 rows=3300000 loops=1)                                Index Cond: (ms_cd = 'MLD009'::bpchar)
Planning Time: 0.670 ms
Execution Time: 22320.153 ms


At 2023-07-25 21:04:16, "David Rowley" <dgrowleyml@gmail.com> wrote:
>On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote:
>>
>> The definitions of the columns used in SQL are as follows.
>>
>> TBL_SHA
>> ms_cd character(6) NOT NULL       -- PRIMARY KEY
>> et_cd character(8)
>> etrys character(8)
>
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>> and TBL_SHA.ETRYS in
>>    (select TBL_INF.RY_CD
>>     from TBL_INF
>>     WHERE TBL_INF.MS_CD = 'MLD009'
>>    AND TBL_INF.RY_CD = '00000001'
>>    )
>> ----- Execution Plan -----
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
>>         ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
>>               ->  Index Only Scan using TBL_INF_pkc on TBL_INF  (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>>                     Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
>>                     Heap Fetches: 1
>>               ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
>>                     Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
>>                     Rows Removed by Filter: 32000325
>> Planning Time: 0.162 ms
>> Execution Time: 124168.838 ms
>
>You might want to check your description of the table definitions.
>Going by the above EXPLAIN ANALYZE output, it very much does not look
>like ms_cd is the primary key of TBL_SHA. If it is then it's very
>weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
>data corruption if that's the case. I suspect you've just not
>accurately described the table definition, however.
>
>David

Re: How to improve the performance of my SQL query？

From

David Rowley

Date:

27 July 2023, 06:36:20

On Wed, 26 Jul 2023 at 19:46, gzh <gzhcoder@126.com> wrote:
> QUERY PLAN (enable_seqscan=on)

> Execution Time: 167183.133 ms

> QUERY PLAN (enable_seqscan=off)

> Execution Time: 22320.153 ms

effective_cache_size and random_page_cost are the settings you should
be adjusting to coax the planner into using the index.

A rule of thumb for effective_cache_size would be to set it to about
75% of RAM. There are certainly cases where lower would make more
sense, certainly, 75% will make more sense than the default 4GB value
in the majority of cases.

For random_page_cost, the default of 4.0 has been the default since
HDDs were common. SSDs are common now and, comparatively to sequential
I/O, their random I/O is faster than that of an HDD, so you may get
better results by lowering random_page_cost.

David

Re: How to improve the performance of my SQL query？

From

gzh

Date:

27 July 2023, 11:09:27

Thank you for your suggestion.

The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.

Could the low performance be due to the low configuration of AWS RDS?

We are considering trying a higher configuration instance.


At 2023-07-27 11:36:20, "David Rowley" <dgrowleyml@gmail.com> wrote:
>On Wed, 26 Jul 2023 at 19:46, gzh <gzhcoder@126.com> wrote:
>> QUERY PLAN (enable_seqscan=on)
>
>> Execution Time: 167183.133 ms
>
>> QUERY PLAN (enable_seqscan=off)
>
>> Execution Time: 22320.153 ms
>
>effective_cache_size and random_page_cost are the settings you should
>be adjusting to coax the planner into using the index.
>
>A rule of thumb for effective_cache_size would be to set it to about
>75% of RAM. There are certainly cases where lower would make more
>sense, certainly, 75% will make more sense than the default 4GB value
>in the majority of cases.
>
>For random_page_cost, the default of 4.0 has been the default since
>HDDs were common. SSDs are common now and, comparatively to sequential
>I/O, their random I/O is faster than that of an HDD, so you may get
>better results by lowering random_page_cost.
>
>David

Re: How to improve the performance of my SQL query？

From

Charly

Date:

27 July 2023, 23:38:39

Hi "gzh",

Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you have:

> TBL_SHA
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

Assuming a composed index here by the 3 columns.

> TBL_INF
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> ry_cd character(8) NOT NULL -- PRIMARY KEY

Here it's more clear that there is a composed index based on those 2 columns.

The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:

- One is that the index may not be exactly the same as described above;

- Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;

There may be other problems happening there, but those are the ones I see more evident from the description of the problem.

Giving a closer look to your query I really didn't understand the reasoning to have that subselect as it seems only to validate the two relations have common items or to validate a parent relationship, in this case you can probably use a join to get the same result:

SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001';

You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?

WITH tbi (ry_cd) AS (
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs .etrys
WHERE tbs .ms_cd = 'MLD009';

You can alternatively try the CTE using the both columns in the JOIN clause.

On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:

Thank you for your suggestion.
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.
Could the low performance be due to the low configuration of AWS RDS?
We are considering trying a higher configuration instance.

Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.

I hope it helps.

Regards,

Charly Batista

Shanghai, China

carlbsb@gmail.com

https://www.linkedin.com/in/charlybatista/

Linux user #391083

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
George Bernard Shaw (1856-1950)

Re: How to improve the performance of my SQL query？

From

gzh

Date:

28 July 2023, 14:50:29

Thank you very much for taking the time to reply to my question.

> The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:

> - One is that the index may not be exactly the same as described above;

The primary key of the tbl_sha table consists of several fields, and ms_cd is just one of them. I just explained the definitions of the fields used in the query SQL.

> - Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;

The index of TBL_SHA table is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS)

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

> You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?

> WITH tbi (ry_cd) AS (

> SELECT tbl_inf.ry_cd

> FROM tbl_inf tbi

> WHERE tbi.ms_cd = 'MLD009'

> AND tbl_inf.ry_cd = '00000001'

> ) SELECT COUNT(et_cd) FROM tbl_sha tbs

> JOIN tbi ON tbi.ry_cd = tbs .etrys

> WHERE tbs .ms_cd = 'MLD009';

The SQL execution encountered an error, so I made some modifications.

Please refer to the execution plan.

WHERE tbi.ms_cd = 'MLD009'

→

WHERE tbl_inf.ms_cd = 'MLD009'

QUERY PLAN

Limit (cost=2668811.76..2668811.77 rows=1 width=8) (actual time=133555.074..133557.729 rows=1 loops=1)

-> Aggregate (cost=2668811.76..2668811.77 rows=1 width=8) (actual time=133555.072..133557.726 rows=1 loops=1)

-> Nested Loop (cost=1000.29..2664512.83 rows=1719572 width=9) (actual time=29657.638..133341.053 rows=2113500 loops=1)

-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.316..1.321 rows=1 loops=1)

Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))

Heap Fetches: 1

-> Gather (cost=1000.00..2647308.80 rows=1719572 width=18) (actual time=29656.318..132969.910 rows=2113500 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on tbl_sha tbs (cost=0.00..2474351.60 rows=716488 width=18) (actual time=29654.184..132876.292 rows=704500 loops=3)

Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))

Rows Removed by Filter: 14678996

Planning Time: 0.164 ms

Execution Time: 133557.767 ms

> Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.

I strongly agree with your viewpoint, but I currently don't have a solution in mind for the problem.

At 2023-07-28 04:38:39, "Charly" <carlbsb@gmail.com> wrote:

Hi "gzh",

Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you have:
> TBL_SHA
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
Assuming a composed index here by the 3 columns.

> TBL_INF
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> ry_cd character(8) NOT NULL -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2 columns.

The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:
- One is that the index may not be exactly the same as described above;
- Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;
There may be other problems happening there, but those are the ones I see more evident from the description of the problem.

Giving a closer look to your query I really didn't understand the reasoning to have that subselect as it seems only to validate the two relations have common items or to validate a parent relationship, in this case you can probably use a join to get the same result:

SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
    AND tbl_inf.ry_cd = '00000001';

You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?

WITH tbi (ry_cd) AS (
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs .etrys
    WHERE tbs .ms_cd = 'MLD009';

You can alternatively try the CTE using the both columns in the JOIN clause.

On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:
Thank you for your suggestion.
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.
Could the low performance be due to the low configuration of AWS RDS?
We are considering trying a higher configuration instance.

Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.

I hope it helps.

--
Regards,

Charly Batista
Shanghai, China
carlbsb@gmail.com
https://www.linkedin.com/in/charlybatista/
Linux user #391083

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
George Bernard Shaw (1856-1950)

Re: How to improve the performance of my SQL query？

From

"Peter J. Holzer"

Date:

29 July 2023, 11:59:07

On 2023-07-26 15:46:16 +0800, gzh wrote:
> SET enable_seqscan TO off;
[...]
>                     ->  Parallel Bitmap Heap Scan on tbl_sha  (cost=92112.45..2663789.14 rows=800650 width=18)
(actualtime=260.540..21442.169 rows=804500 loops=3) 
>                           Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>                           Rows Removed by Index Recheck: 49
>                           Filter: (etrys = '00000001'::bpchar)
>                           Rows Removed by Filter: 295500
>                           Heap Blocks: exact=13788 lossy=10565
>                           ->  Bitmap Index Scan on index_search_04_mscd_cdate  (cost=0.00..91632.06 rows=3402599
width=0)(actual time=249.718..249.718 rows=3300000 loops=1) 
>                                 Index Cond: (ms_cd = 'MLD009'::bpchar)

So now it's using index_search_04_mscd_cdate which contains only ms_cd
(and - judging from the name, other fields not relevant to this query),
but it still doesn't use index_search_01 which would fit the query
exactly. I can understand that Postgres prefers a sequential scan over
an index scan (the number of matching rows is about 10% of the total
table size which is a lot), but why would it prefer a less specific
index to a more specific one?

Can you get Postgres to use that index at all?

Find a combination of ms_cd and etrys which doesn't cover millions of
rows and try that.

Also try lowering random_page_cost.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: How to improve the performance of my SQL query？

From

Alban Hertroys

Date:

29 July 2023, 12:37:55

> On 29 Jul 2023, at 10:59, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
> On 2023-07-26 15:46:16 +0800, gzh wrote:
>> SET enable_seqscan TO off;
> [...]
>>                    ->  Parallel Bitmap Heap Scan on tbl_sha  (cost=92112.45..2663789.14 rows=800650 width=18)
(actualtime=260.540..21442.169 rows=804500 loops=3) 
>>                          Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>>                          Rows Removed by Index Recheck: 49
>>                          Filter: (etrys = '00000001'::bpchar)
>>                          Rows Removed by Filter: 295500
>>                          Heap Blocks: exact=13788 lossy=10565
>>                          ->  Bitmap Index Scan on index_search_04_mscd_cdate  (cost=0.00..91632.06 rows=3402599
width=0)(actual time=249.718..249.718 rows=3300000 loops=1) 
>>                                Index Cond: (ms_cd = 'MLD009'::bpchar)
>
> So now it's using index_search_04_mscd_cdate which contains only ms_cd
> (and - judging from the name, other fields not relevant to this query),
> but it still doesn't use index_search_01 which would fit the query
> exactly. I can understand that Postgres prefers a sequential scan over
> an index scan (the number of matching rows is about 10% of the total
> table size which is a lot), but why would it prefer a less specific
> index to a more specific one?
>
> Can you get Postgres to use that index at all?
>
> Find a combination of ms_cd and etrys which doesn't cover millions of
> rows and try that.
>
> Also try lowering random_page_cost.

Wasn’t this an RDS server with just 4GB of memory?

How large are those multi-column indices? Perhaps they don’t (all) fit into available cache memory and the server
decidedto use the one that it had cached? 

I’m frankly not at all certain how the server would behave around such resource shortage situations, but I suppose
loadingan uncached index into cache could get a higher cost than using a less optimal (costlier) index that’s already
cached.

Regarding lowering random_page_cost; If your index files are on SSD storage, lowering that sufficiently (to a realistic
value)could then sufficiently lower the cost of loading that uncached index into memory, evicting the index it was
usingin above plan to make room (unless other active sessions are using it). 

Alban Hertroys
--
There is always an exception to always.