Re: How to improve the performance of my SQL query? - Mailing list pgsql-general

From Charly
Subject Re: How to improve the performance of my SQL query?
Date
Msg-id CAD-uDDG1ruPdA45X+0Q+53YUt8R+TBLFBthCsqKs-+it=Fb0aA@mail.gmail.com
Whole thread Raw
In response to Re: How to improve the performance of my SQL query?  (gzh <gzhcoder@126.com>)
Responses Re: How to improve the performance of my SQL query?
List pgsql-general
Hi "gzh",

Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you have:
> TBL_SHA
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
Assuming a composed index here by the 3 columns.

> TBL_INF
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> ry_cd character(8) NOT NULL       -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2 columns.

The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:
 - One is that the index may not be exactly the same as described above;
 - Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;
 There may be other problems happening there, but those are the ones I see more evident from the description of the problem.

Giving a closer look to your query I really didn't understand the reasoning to have that subselect as it seems only to validate the two relations have common items or to validate a parent relationship, in this case you can probably use a join to get the same result:
 
SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
    AND tbl_inf.ry_cd = '00000001';

You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?

WITH tbi (ry_cd) AS (
    SELECT tbl_inf.ry_cd
    FROM tbl_inf tbi
    WHERE tbi.ms_cd = 'MLD009'
        AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs .etrys
    WHERE tbs .ms_cd = 'MLD009';

You can alternatively try the CTE using the both columns in the JOIN clause.

On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:
Thank you for your suggestion. 
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB. 
Could the low performance be due to the low configuration of AWS RDS? 
We are considering trying a higher configuration instance.

Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.

I hope it helps.

--
Regards,

Charly Batista
Shanghai, China
Linux user #391083

“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
      George Bernard Shaw (1856-1950)

pgsql-general by date:

Previous
From: Atul Kumar
Date:
Subject: password error in batch script
Next
From: Adrian Klaver
Date:
Subject: Re: password error in batch script