Thread: How to improve the performance of my SQL query?
Hi everyone,
I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.
explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
--------------------------------------------------------------------------------
The index is defined as follows.
CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
When I take the following sql statement, the index works fine and the query is fast.
select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'
The amount of data in the table is as follows.
TBL_SHA 38700325
TBL_INF 35546
Any suggestions for improving the performance of the query would be greatly appreciated.
Thanks in advance!
> On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote: > > I'm running into some performance issues with my SQL query. > The following SQL query is taking a long time to execute. > > explain analyze > select COUNT(ET_CD) > from TBL_SHA > WHERE TBL_SHA.MS_CD = '009' > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = '009' > AND TBL_INF.RY_CD = '000001' > ) > ----- Execution Plan ----- > Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) > -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) > -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) > -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1) > Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar)) > Heap Fetches: 1 > -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1) > Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) > Rows Removed by Filter: 32000325 > Planning Time: 0.162 ms > Execution Time: 124168.838 ms > -------------------------------------------------------------------------------- > > The index is defined as follows. > > CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS); > > > When I take the following sql statement, the index works fine and the query > is fast. > > select COUNT(ET_CD) > from TBL_SHA > WHERE MS_CD = '009' > AND ETRYS = '000001' > > The amount of data in the table is as follows. > TBL_SHA 38700325 > TBL_INF 35546 > > Any suggestions for improving the performance of the query would be greatly > appreciated. You can try EXISTS instead of IN to correlate the subquery and the outer query: SELECT count(et_cd) FROM tbl_sha WHERE tbl_sha.ms_cd = '009' AND tbl_sha.etrys = '000001' AND EXISTS ( SELECT FROM tbl_inf WHERE tbl_inf.ms_cd = tbl_sha.ms_cd AND tbl_inf.ry_cd = tbl_sha.etrys ) -- Erik
On 2023-07-20 15:09:22 +0800, gzh wrote: > Hi everyone, > > > I'm running into some performance issues with my SQL query. > > The following SQL query is taking a long time to execute. > > > explain analyze > select COUNT(ET_CD) > from TBL_SHA > WHERE TBL_SHA.MS_CD = '009' > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = '009' > AND TBL_INF.RY_CD = '000001' > ) > > ----- Execution Plan ----- [...] > -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) > (actual time=97264.138..123554.792 rows=3200000 loops=1) > Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) > Rows Removed by Filter: 32000325 [...] > -------------------------------------------------------------------------------- > > > The index is defined as follows. > > > CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS); > > > > When I take the following sql statement, the index works fine and the query is > fast. > > > select COUNT(ET_CD) > from TBL_SHA > WHERE MS_CD = '009' > AND ETRYS = '000001' What's the plan for that query? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Thank you very much for taking the time to reply to my question.
Sorry, I provided incorrect information.
The index also does not work in the following query statement.
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '000001'
QUERY PLAN
Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)
-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)
-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 11833442
Planning Time: 0.118 ms
Execution Time: 128668.290 ms
The TBL_SHA table has another index, as shown below.
CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)
When I take the following query statement, the result is returned quickly.
Why does index_search_01 always not work?
explain analyze
select TO_CHAR(MAX(TBL_SHA.ET_DAY),'YYYYMMDD') AS ET_DAY
from TBL_SHA
WHERE MS_CD = '008'
AND ET_CD = '000003'
QUERY PLAN
Limit (cost=4.11..4.13 rows=1 width=32) (actual time=0.043..0.044 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.56..4.11 rows=1 width=8) (actual time=0.032..0.033 rows=1 loops=1)
-> Index Scan Backward using index_search_01 on TBL_SHA (cost=0.56..10836962.40 rows=3054052 width=8) (actual time=0.032..0.032 rows=1 loops=1)
Index Cond: ((MS_CD = '008'::bpchar) AND (ent_day IS NOT NULL))
Filter: (ET_CD = '000003'::bpchar)
-> Result (cost=4.11..4.13 rows=1 width=32) (actual time=0.042..0.042 rows=1 loops=1)
Planning Time: 0.228 ms
Execution Time: 0.070 ms
At 2023-07-20 16:07:15, "Erik Wienhold" <ewie@ewie.name> wrote: >> On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote: >> >> I'm running into some performance issues with my SQL query. >> The following SQL query is taking a long time to execute. >> >> explain analyze >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE TBL_SHA.MS_CD = '009' >> and TBL_SHA.ETRYS in >> (select TBL_INF.RY_CD >> from TBL_INF >> WHERE TBL_INF.MS_CD = '009' >> AND TBL_INF.RY_CD = '000001' >> ) >> ----- Execution Plan ----- >> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) >> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) >> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) >> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1) >> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar)) >> Heap Fetches: 1 >> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1) >> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) >> Rows Removed by Filter: 32000325 >> Planning Time: 0.162 ms >> Execution Time: 124168.838 ms >> -------------------------------------------------------------------------------- >> >> The index is defined as follows. >> >> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS); >> >> >> When I take the following sql statement, the index works fine and the query >> is fast. >> >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE MS_CD = '009' >> AND ETRYS = '000001' >> >> The amount of data in the table is as follows. >> TBL_SHA 38700325 >> TBL_INF 35546 >> >> Any suggestions for improving the performance of the query would be greatly >> appreciated. > >You can try EXISTS instead of IN to correlate the subquery and the outer query: > > SELECT count(et_cd) > FROM tbl_sha > WHERE tbl_sha.ms_cd = '009' > AND tbl_sha.etrys = '000001' > AND EXISTS ( > SELECT > FROM tbl_inf > WHERE tbl_inf.ms_cd = tbl_sha.ms_cd > AND tbl_inf.ry_cd = tbl_sha.etrys > ) > >-- >Erik
On Thu, 2023-07-20 at 15:09 +0800, gzh wrote: > I'm running into some performance issues with my SQL query. > The following SQL query is taking a long time to execute. > > explain analyze > select COUNT(ET_CD) > from TBL_SHA > WHERE TBL_SHA.MS_CD = '009' > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = '009' > AND TBL_INF.RY_CD = '000001' > ) > ----- Execution Plan ----- > Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) > -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) > -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) > -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030rows=1 loops=1) > Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar)) > Heap Fetches: 1 > -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792rows=3200000 loops=1) > Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) > Rows Removed by Filter: 32000325 > Planning Time: 0.162 ms > Execution Time: 124168.838 ms > -------------------------------------------------------------------------------- > > The index is defined as follows. > > CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS); Actual rows = 3200000, rows removed by filter is ten times as much. It should use an index. > When I take the following sql statement, the index works fine and the query is fast. > > select COUNT(ET_CD) > from TBL_SHA > WHERE MS_CD = '009' > AND ETRYS = '000001' > > The amount of data in the table is as follows. > TBL_SHA 38700325 > TBL_INF 35546 This looks very much like it is a problem with the data types. I see that you are using "character", which you shouldn't do. What I cannot see is if the columns are defined as "character" or whether you bind the parameters as "character". Can you show us the table definition of "TBL_SHA" and "TBL_INF"? Yours, Laurenz Albe
>What I cannot see is if the columns are defined as "character" or whether you bind >the parameters as "character". Can you show us the table definition of "TBL_SHA">and "TBL_INF"?For information security reasons, I can't provide the table definition, these columns are defined as "character".
At 2023-07-20 19:58:59, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote: >On Thu, 2023-07-20 at 15:09 +0800, gzh wrote: >> I'm running into some performance issues with my SQL query. >> The following SQL query is taking a long time to execute. >> >> explain analyze >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE TBL_SHA.MS_CD = '009' >> and TBL_SHA.ETRYS in >> (select TBL_INF.RY_CD >> from TBL_INF >> WHERE TBL_INF.MS_CD = '009' >> AND TBL_INF.RY_CD = '000001' >> ) >> ----- Execution Plan ----- >> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) >> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) >> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) >> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1) >> Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar)) >> Heap Fetches: 1 >> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1) >> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) >> Rows Removed by Filter: 32000325 >> Planning Time: 0.162 ms >> Execution Time: 124168.838 ms >> -------------------------------------------------------------------------------- >> >> The index is defined as follows. >> >> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS); > >Actual rows = 3200000, rows removed by filter is ten times as much. >It should use an index. > >> When I take the following sql statement, the index works fine and the query is fast. >> >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE MS_CD = '009' >> AND ETRYS = '000001' >> >> The amount of data in the table is as follows. >> TBL_SHA 38700325 >> TBL_INF 35546 > >This looks very much like it is a problem with the data types. >I see that you are using "character", which you shouldn't do. > >What I cannot see is if the columns are defined as "character" or whether you bind >the parameters as "character". Can you show us the table definition of "TBL_SHA" >and "TBL_INF"? > >Yours, >Laurenz Albe
On Thu, 2023-07-20 at 22:14 +0800, gzh wrote: > The information I provided is incorrect, please see my previous reply. My question remains: I would like to see the table definitions. Also, did you ANALYZE the tables? Yours, Laurenz Albe
On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote: > > > Thank you very much for taking the time to reply to my question. > > > Sorry, I provided incorrect information. > > The index also does not work in the following query statement. > > > > select COUNT(ET_CD) > > > from TBL_SHA > > > WHERE MS_CD = '009' > > > AND ETRYS = '000001' > > > QUERY PLAN > > Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1) > > -> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1) > > -> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1loops=3) > > -> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615rows=1066667 loops=3) > > Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) > > Rows Removed by Filter: 11833442 > > Planning Time: 0.118 ms > > Execution Time: 128668.290 ms > > > The TBL_SHA table has another index, as shown below. > > > CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS) > > CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD) > > Rows Removed by Filter: 11833442 select (38700325 - 11833442) /38700325.0; is 0.69 approx. So I think it says around 69% of rows satisfy the query condition. but I am not sure in the following 2 cases, whether the actual rows are noisy or not. I can not find the doc explaining it. > Partial Aggregate (actual time=128655.256..128655.258 rows=1 loops=3) > Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)
The definitions of the columns used in SQL are as follows.
TBL_SHA
ms_cd character(6) NOT NULL -- PRIMARY KEY
et_cd character(8)
etrys character(8)
TBL_INF
ms_cd character(6) NOT NULL -- PRIMARY KEY
ry_cd character(8) NOT NULL -- PRIMARY KEY
I made some modifications to the data, and I realized that I should not change the length of the data.
The actual data and its corresponding execution plan are shown below.
explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
At 2023-07-20 22:48:19, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote: >On Thu, 2023-07-20 at 22:14 +0800, gzh wrote: >> The information I provided is incorrect, please see my previous reply. > >My question remains: I would like to see the table definitions. > >Also, did you ANALYZE the tables? > >Yours, >Laurenz Albe
>select (38700325 - 11833442) /38700325.0; >is 0.69 approx. >So I think it says around 69% of rows satisfy the query condition.
Thank you for your reply. I have learned a lot from it.
At 2023-07-20 23:20:16, "jian he" <jian.universality@gmail.com> wrote: >On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote: >> >> >> Thank you very much for taking the time to reply to my question. >> >> >> Sorry, I provided incorrect information. >> >> The index also does not work in the following query statement. >> >> >> > select COUNT(ET_CD) >> >> > from TBL_SHA >> >> > WHERE MS_CD = '009' >> >> > AND ETRYS = '000001' >> >> >> QUERY PLAN >> >> Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1) >> >> -> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1) >> >> -> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1) >> >> Workers Planned: 2 >> >> Workers Launched: 2 >> >> -> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3) >> >> -> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3) >> >> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) >> >> Rows Removed by Filter: 11833442 >> >> Planning Time: 0.118 ms >> >> Execution Time: 128668.290 ms >> >> >> The TBL_SHA table has another index, as shown below. >> >> >> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS) >> >> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD) >> > >> Rows Removed by Filter: 11833442 >select (38700325 - 11833442) /38700325.0; >is 0.69 approx. >So I think it says around 69% of rows satisfy the query condition. > >but I am not sure in the following 2 cases, whether the actual rows >are noisy or not. I can not find the doc explaining it. >> Partial Aggregate (actual time=128655.256..128655.258 rows=1 loops=3) >> Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)
On Fri, 2023-07-21 at 09:43 +0800, gzh wrote: > The definitions of the columns used in SQL are as follows. > > TBL_SHA > > ms_cd character(6) NOT NULL -- PRIMARY KEY > et_cd character(8) > etrys character(8) > > TBL_INF > > ms_cd character(6) NOT NULL -- PRIMARY KEY > ry_cd character(8) NOT NULL -- PRIMARY KEY > > I made some modifications to the data, and I realized that I should not change the length of the data. > The actual data and its corresponding execution plan are shown below. > > explain analyze > select COUNT(ET_CD) > from TBL_SHA > WHERE TBL_SHA.MS_CD = 'MLD009' > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = 'MLD009' > AND TBL_INF.RY_CD = '00000001' > ) > ----- Execution Plan ----- > Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) > -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) > -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) > -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030rows=1 loops=1) > Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar)) > Heap Fetches: 1 > -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792rows=3200000 loops=1) > Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar)) > Rows Removed by Filter: 32000325 > Planning Time: 0.162 ms > Execution Time: 124168.838 ms Thanks. That should definitely use a b-tree index defined on (ms_cd, etrsys). Did you change any parameters that have an impact on query planning? You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS). Yours, Laurenz Albe
>Did you change any parameters that have an impact on query planning?
>You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).
I added some parameters and re-executed the Execution Plan.
Except for the index not taking effect, I still don't know the reason why the index is not working.
Is it because there is too much data that meets the conditions?
EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)
Output: (count(tbl_sha.et_cd))
Buffers: shared hit=58948 read=2112758
I/O Timings: read=357249.120
-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)
Output: count(tbl_sha.et_cd)
Buffers: shared hit=58948 read=2112758
I/O Timings: read=357249.120
-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)
Output: tbl_sha.et_cd
Buffers: shared hit=58948 read=2112758
I/O Timings: read=357249.120
-> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)
Output: tbl_inf.ms_cd, tbl_inf.ry_cd
Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))
Heap Fetches: 1
Buffers: shared hit=4
-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)
Output: tbl_sha.et_cd, tbl_sha.etrys
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=58944 read=2112758
I/O Timings: read=357249.120
-> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)
Output: tbl_sha.et_cd, tbl_sha.etrys
Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))
Rows Removed by Filter: 14062278
Buffers: shared hit=58944 read=2112758
I/O Timings: read=357249.120
Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1
Buffers: shared hit=17875 read=706862
I/O Timings: read=119193.744
Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1
Buffers: shared hit=19813 read=706359
I/O Timings: read=119386.899
Planning:
Buffers: shared hit=42
Planning Time: 1.024 ms
Execution Time: 128717.731 ms
At 2023-07-24 13:43:46, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote: >On Fri, 2023-07-21 at 09:43 +0800, gzh wrote: >> The definitions of the columns used in SQL are as follows. >> >> TBL_SHA >> >> ms_cd character(6) NOT NULL -- PRIMARY KEY >> et_cd character(8) >> etrys character(8) >> >> TBL_INF >> >> ms_cd character(6) NOT NULL -- PRIMARY KEY >> ry_cd character(8) NOT NULL -- PRIMARY KEY >> >> I made some modifications to the data, and I realized that I should not change the length of the data. >> The actual data and its corresponding execution plan are shown below. >> >> explain analyze >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE TBL_SHA.MS_CD = 'MLD009' >> and TBL_SHA.ETRYS in >> (select TBL_INF.RY_CD >> from TBL_INF >> WHERE TBL_INF.MS_CD = 'MLD009' >> AND TBL_INF.RY_CD = '00000001' >> ) >> ----- Execution Plan ----- >> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) >> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) >> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) >> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1) >> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar)) >> Heap Fetches: 1 >> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1) >> Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar)) >> Rows Removed by Filter: 32000325 >> Planning Time: 0.162 ms >> Execution Time: 124168.838 ms > >Thanks. That should definitely use a b-tree index defined on (ms_cd, etrsys). > >Did you change any parameters that have an impact on query planning? >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS). > >Yours, >Laurenz Albe
On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote: > > >Did you change any parameters that have an impact on query planning? > > >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS). > > I added some parameters and re-executed the Execution Plan. > > Except for the index not taking effect, I still don't know the reason why the index is not working. > > Is it because there is too much data that meets the conditions? > > > EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) > > select COUNT(ET_CD) > > from TBL_SHA > > WHERE TBL_SHA.MS_CD = 'MLD009' > > and TBL_SHA.ETRYS in > > (select TBL_INF.RY_CD > > from TBL_INF > > WHERE TBL_INF.MS_CD = 'MLD009' > > AND TBL_INF.RY_CD = '00000001' > > ) > > > ----- Execution Plan ----- > > Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1) > > Output: (count(tbl_sha.et_cd)) > > Buffers: shared hit=58948 read=2112758 > > I/O Timings: read=357249.120 > > -> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1) > > Output: count(tbl_sha.et_cd) > > Buffers: shared hit=58948 read=2112758 > > I/O Timings: read=357249.120 > > -> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1) > > Output: tbl_sha.et_cd > > Buffers: shared hit=58948 read=2112758 > > I/O Timings: read=357249.120 > > -> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051rows=1 loops=1) > > Output: tbl_inf.ms_cd, tbl_inf.ry_cd > > Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar)) > > Heap Fetches: 1 > > Buffers: shared hit=4 > > -> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500loops=1) > > Output: tbl_sha.et_cd, tbl_sha.etrys > > Workers Planned: 2 > > Workers Launched: 2 > > Buffers: shared hit=58944 read=2112758 > > I/O Timings: read=357249.120 > > -> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089rows=871167 loops=3) > > Output: tbl_sha.et_cd, tbl_sha.etrys > > Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar)) > > Rows Removed by Filter: 14062278 > > Buffers: shared hit=58944 read=2112758 > > I/O Timings: read=357249.120 > > Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1 > > Buffers: shared hit=17875 read=706862 > > I/O Timings: read=119193.744 > > Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1 > > Buffers: shared hit=19813 read=706359 > > I/O Timings: read=119386.899 > > Planning: > > Buffers: shared hit=42 > > Planning Time: 1.024 ms > > Execution Time: 128717.731 ms > > I think the whole query can just: select COUNT(ET_CD) from TBL_SHA WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001'; > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = 'MLD009' > AND TBL_INF.RY_CD = '00000001' > ) if subquery after IN clause part, no rows returned then the whole query would return zero row. if many duplicates rows returned, then there is no point of evaluate something like {1 in (1,1,1,1,1,1,)}
On Mon, 2023-07-24 at 17:54 +0800, gzh wrote: > EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ... Or do you have a version that is too old for SETTINGS? One other idea: check if the index is INVALID (this will be visible if you run "\d tablenane" in "psql"). Invalid indexes won't be used. Yours, Laurenz Albe
>I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
>Or do you have a version that is too old for SETTINGS?
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Buffers: shared hit=13 read=2171693
I/O Timings: read=365863.877
-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=131595.624..131599.529 rows=1 loops=1)
Buffers: shared hit=13 read=2171693
I/O Timings: read=365863.877
-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.341..131256.445 rows=2513500 loops=1)
Buffers: shared hit=13 read=2171693
I/O Timings: read=365863.877
-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.471..1.476 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))
Heap Fetches: 1
Buffers: shared hit=2 read=2
I/O Timings: read=1.412
-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=0.866..130696.440 rows=2513500 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=11 read=2171691
I/O Timings: read=365862.464
-> Parallel Seq Scan on tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=0.215..130476.981 rows=837833 loops=3)
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))
Rows Removed by Filter: 13728945
Buffers: shared hit=11 read=2171691
I/O Timings: read=365862.464
Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb'
Planning:
Buffers: shared hit=167 read=7
I/O Timings: read=2.735
Planning Time: 3.733 ms
Execution Time: 131599.594 ms
At 2023-07-24 23:58:50, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote: >On Mon, 2023-07-24 at 17:54 +0800, gzh wrote: >> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) > >I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ... >Or do you have a version that is too old for SETTINGS? > >One other idea: check if the index is INVALID (this will >be visible if you run "\d tablenane" in "psql"). >Invalid indexes won't be used. > >Yours, >Laurenz Albe
>I think the whole query can just: >select COUNT(ET_CD) >from TBL_SHA >WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';
This is related to the business logic.
>if many duplicates rows returned, then there is no point of evaluate >something like {1 in (1,1,1,1,1,1,)}
Because the primary key of the 'tbl_inf' table only consists of 'ms_cd' and 'ry_cd' columns, the subquery will not return duplicate rows.
At 2023-07-24 22:42:01, "jian he" <jian.universality@gmail.com> wrote: >On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote: >> >> >Did you change any parameters that have an impact on query planning? >> >> >You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS). >> >> I added some parameters and re-executed the Execution Plan. >> >> Except for the index not taking effect, I still don't know the reason why the index is not working. >> >> Is it because there is too much data that meets the conditions? >> >> >> EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON) >> >> select COUNT(ET_CD) >> >> from TBL_SHA >> >> WHERE TBL_SHA.MS_CD = 'MLD009' >> >> and TBL_SHA.ETRYS in >> >> (select TBL_INF.RY_CD >> >> from TBL_INF >> >> WHERE TBL_INF.MS_CD = 'MLD009' >> >> AND TBL_INF.RY_CD = '00000001' >> >> ) >> >> >> ----- Execution Plan ----- >> >> Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1) >> >> Output: (count(tbl_sha.et_cd)) >> >> Buffers: shared hit=58948 read=2112758 >> >> I/O Timings: read=357249.120 >> >> -> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1) >> >> Output: count(tbl_sha.et_cd) >> >> Buffers: shared hit=58948 read=2112758 >> >> I/O Timings: read=357249.120 >> >> -> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1) >> >> Output: tbl_sha.et_cd >> >> Buffers: shared hit=58948 read=2112758 >> >> I/O Timings: read=357249.120 >> >> -> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1) >> >> Output: tbl_inf.ms_cd, tbl_inf.ry_cd >> >> Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar)) >> >> Heap Fetches: 1 >> >> Buffers: shared hit=4 >> >> -> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1) >> >> Output: tbl_sha.et_cd, tbl_sha.etrys >> >> Workers Planned: 2 >> >> Workers Launched: 2 >> >> Buffers: shared hit=58944 read=2112758 >> >> I/O Timings: read=357249.120 >> >> -> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3) >> >> Output: tbl_sha.et_cd, tbl_sha.etrys >> >> Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar)) >> >> Rows Removed by Filter: 14062278 >> >> Buffers: shared hit=58944 read=2112758 >> >> I/O Timings: read=357249.120 >> >> Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1 >> >> Buffers: shared hit=17875 read=706862 >> >> I/O Timings: read=119193.744 >> >> Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1 >> >> Buffers: shared hit=19813 read=706359 >> >> I/O Timings: read=119386.899 >> >> Planning: >> >> Buffers: shared hit=42 >> >> Planning Time: 1.024 ms >> >> Execution Time: 128717.731 ms >> >> > >I think the whole query can just: >select COUNT(ET_CD) >from TBL_SHA >WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001'; > >> and TBL_SHA.ETRYS in >> (select TBL_INF.RY_CD >> from TBL_INF >> WHERE TBL_INF.MS_CD = 'MLD009' >> AND TBL_INF.RY_CD = '00000001' >> ) > >if subquery after IN clause part, no rows returned then the whole >query would return zero row. >if many duplicates rows returned, then there is no point of evaluate >something like {1 in (1,1,1,1,1,1,)}
On Tue, 2023-07-25 at 11:11 +0800, gzh wrote: > > I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ... > > Or do you have a version that is too old for SETTINGS? > Sorry. Please refer to the following execution plan. > > [...] > Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb' > [...] No unusual settings. So that theory is dead. > > One other idea: check if the index is INVALID (this will > > be visible if you run "\d tablenane" in "psql"). > > Invalid indexes won't be used. Did you check that? After that, I am out of ideas. Yours, Laurenz Albe
On Thu, 20 Jul 2023 at 23:36, gzh <gzhcoder@126.com> wrote: > > > Thank you very much for taking the time to reply to my question. > > > Sorry, I provided incorrect information. > > The index also does not work in the following query statement. > > > > select COUNT(ET_CD) > > from TBL_SHA > > WHERE MS_CD = '009' > > AND ETRYS = '000001' > > QUERY PLAN > Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1) > -> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1) > -> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1loops=3) > -> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615rows=1066667 loops=3) > Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar)) > Rows Removed by Filter: 11833442 > Planning Time: 0.118 ms > Execution Time: 128668.290 ms > > The TBL_SHA table has another index, as shown below. > > CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS) > CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD) > > When I take the following query statement, the result is returned quickly. > Why does index_search_01 always not work? The method to access the table is decided by the query planner based on costs. The costs are driven off the row estimates which are driven from table statistics. If the table statistics, for example say that 99% of rows have MS_CD = '009', then scanning an index on MS_CD is unlikely to be a good idea as that would likely require random access to the heap. It's likely better to perform a table scan and then just filter out the 1% of rows that don't match. Try executing the query after having done: SET enable_seqscan TO off; What plan does it use now? Is that plan faster or slower than the seq scan plan? David
On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote: > > The definitions of the columns used in SQL are as follows. > > TBL_SHA > ms_cd character(6) NOT NULL -- PRIMARY KEY > et_cd character(8) > etrys character(8) > explain analyze > select COUNT(ET_CD) > from TBL_SHA > WHERE TBL_SHA.MS_CD = 'MLD009' > and TBL_SHA.ETRYS in > (select TBL_INF.RY_CD > from TBL_INF > WHERE TBL_INF.MS_CD = 'MLD009' > AND TBL_INF.RY_CD = '00000001' > ) > ----- Execution Plan ----- > Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) > -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) > -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) > -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030rows=1 loops=1) > Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar)) > Heap Fetches: 1 > -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792rows=3200000 loops=1) > Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar)) > Rows Removed by Filter: 32000325 > Planning Time: 0.162 ms > Execution Time: 124168.838 ms You might want to check your description of the table definitions. Going by the above EXPLAIN ANALYZE output, it very much does not look like ms_cd is the primary key of TBL_SHA. If it is then it's very weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some data corruption if that's the case. I suspect you've just not accurately described the table definition, however. David
Thank you very much for taking the time to reply to my question.>You might want to check your description of the table definitions.>Going by the above EXPLAIN ANALYZE output, it very much does not look >like ms_cd is the primary key of TBL_SHA. If it is then it's very >weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some >data corruption if that's the case. I suspect you've just not>accurately described the table definition, however.The primary key of the SHA table has six fields, and ms_cd is just one of them.I'm sorry, I didn't make that clear.
>Try executing the query after having done: > >SET enable_seqscan TO off; > >What plan does it use now? >>Is that plan faster or slower than the seq scan plan?There's improvement, but it's still quite slow. QUERY PLAN (enable_seqscan=on)Limit (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.822..167183.058 rows=1 loops=1) -> Aggregate (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.820..167183.056 rows=1 loops=1) -> Nested Loop (cost=1000.29..2688558.85 rows=1983209 width=9) (actual time=43544.753..166906.304 rows=2413500 loops=1) -> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.034..1.038 rows=1 loops=1) Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar)) Heap Fetches: 1 -> Gather (cost=1000.00..2668718.45 rows=1983209 width=18) (actual time=43543.714..166447.333 rows=2413500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbl_sha (cost=0.00..2469397.55 rows=826337 width=18) (actual time=43537.056..166225.162 rows=804500 loops=3) Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar)) Rows Removed by Filter: 15362328 Planning Time: 2.942 ms Execution Time: 167183.133 ms SET enable_seqscan TO off; QUERY PLAN (enable_seqscan=off) Limit (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.419..22320.102 rows=1 loops=1) -> Aggregate (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.418..22320.100 rows=1 loops=1) -> Nested Loop (cost=93112.74..2876169.16 rows=1921561 width=9) (actual time=265.880..22000.432 rows=2413500 loops=1) -> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.013..0.020 rows=1 loops=1) Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar)) Heap Fetches: 1 -> Gather (cost=93112.45..2856945.24 rows=1921561 width=18) (actual time=265.864..21535.325 rows=2413500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on tbl_sha (cost=92112.45..2663789.14 rows=800650 width=18) (actual time=260.540..21442.169 rows=804500 loops=3) Recheck Cond: (ms_cd = 'MLD009'::bpchar) Rows Removed by Index Recheck: 49 Filter: (etrys = '00000001'::bpchar) Rows Removed by Filter: 295500 Heap Blocks: exact=13788 lossy=10565 -> Bitmap Index Scan on index_search_04_mscd_cdate (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 rows=3300000 loops=1) Index Cond: (ms_cd = 'MLD009'::bpchar) Planning Time: 0.670 ms Execution Time: 22320.153 ms
At 2023-07-25 21:04:16, "David Rowley" <dgrowleyml@gmail.com> wrote: >On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote: >> >> The definitions of the columns used in SQL are as follows. >> >> TBL_SHA >> ms_cd character(6) NOT NULL -- PRIMARY KEY >> et_cd character(8) >> etrys character(8) > >> explain analyze >> select COUNT(ET_CD) >> from TBL_SHA >> WHERE TBL_SHA.MS_CD = 'MLD009' >> and TBL_SHA.ETRYS in >> (select TBL_INF.RY_CD >> from TBL_INF >> WHERE TBL_INF.MS_CD = 'MLD009' >> AND TBL_INF.RY_CD = '00000001' >> ) >> ----- Execution Plan ----- >> Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1) >> -> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1) >> -> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1) >> -> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1) >> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar)) >> Heap Fetches: 1 >> -> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1) >> Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar)) >> Rows Removed by Filter: 32000325 >> Planning Time: 0.162 ms >> Execution Time: 124168.838 ms > >You might want to check your description of the table definitions. >Going by the above EXPLAIN ANALYZE output, it very much does not look >like ms_cd is the primary key of TBL_SHA. If it is then it's very >weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some >data corruption if that's the case. I suspect you've just not >accurately described the table definition, however. > >David
On Wed, 26 Jul 2023 at 19:46, gzh <gzhcoder@126.com> wrote: > QUERY PLAN (enable_seqscan=on) > Execution Time: 167183.133 ms > QUERY PLAN (enable_seqscan=off) > Execution Time: 22320.153 ms effective_cache_size and random_page_cost are the settings you should be adjusting to coax the planner into using the index. A rule of thumb for effective_cache_size would be to set it to about 75% of RAM. There are certainly cases where lower would make more sense, certainly, 75% will make more sense than the default 4GB value in the majority of cases. For random_page_cost, the default of 4.0 has been the default since HDDs were common. SSDs are common now and, comparatively to sequential I/O, their random I/O is faster than that of an HDD, so you may get better results by lowering random_page_cost. David
At 2023-07-27 11:36:20, "David Rowley" <dgrowleyml@gmail.com> wrote: >On Wed, 26 Jul 2023 at 19:46, gzh <gzhcoder@126.com> wrote: >> QUERY PLAN (enable_seqscan=on) > >> Execution Time: 167183.133 ms > >> QUERY PLAN (enable_seqscan=off) > >> Execution Time: 22320.153 ms > >effective_cache_size and random_page_cost are the settings you should >be adjusting to coax the planner into using the index. > >A rule of thumb for effective_cache_size would be to set it to about >75% of RAM. There are certainly cases where lower would make more >sense, certainly, 75% will make more sense than the default 4GB value >in the majority of cases. > >For random_page_cost, the default of 4.0 has been the default since >HDDs were common. SSDs are common now and, comparatively to sequential >I/O, their random I/O is faster than that of an HDD, so you may get >better results by lowering random_page_cost. > >David
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
> TBL_INF
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> ry_cd character(8) NOT NULL -- PRIMARY KEY
JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001';
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs .etrys
WHERE tbs .ms_cd = 'MLD009';
Thank you for your suggestion.The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.Could the low performance be due to the low configuration of AWS RDS?We are considering trying a higher configuration instance.
--
“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
George Bernard Shaw (1856-1950)
Thank you very much for taking the time to reply to my question.
> The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:
> - One is that the index may not be exactly the same as described above;
The primary key of the tbl_sha table consists of several fields, and ms_cd is just one of them. I just explained the definitions of the fields used in the query SQL.
> - Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;
The index of TBL_SHA table is defined as follows.
CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS)
CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)
> You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?
>
> WITH tbi (ry_cd) AS (
> SELECT tbl_inf.ry_cd
> FROM tbl_inf tbi
> WHERE tbi.ms_cd = 'MLD009'
> AND tbl_inf.ry_cd = '00000001'
> ) SELECT COUNT(et_cd) FROM tbl_sha tbs
> JOIN tbi ON tbi.ry_cd = tbs .etrys
> WHERE tbs .ms_cd = 'MLD009';
The SQL execution encountered an error, so I made some modifications.
Please refer to the execution plan.
WHERE tbi.ms_cd = 'MLD009'
→
WHERE tbl_inf.ms_cd = 'MLD009'
QUERY PLAN
Limit (cost=2668811.76..2668811.77 rows=1 width=8) (actual time=133555.074..133557.729 rows=1 loops=1)
-> Aggregate (cost=2668811.76..2668811.77 rows=1 width=8) (actual time=133555.072..133557.726 rows=1 loops=1)
-> Nested Loop (cost=1000.29..2664512.83 rows=1719572 width=9) (actual time=29657.638..133341.053 rows=2113500 loops=1)
-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.316..1.321 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))
Heap Fetches: 1
-> Gather (cost=1000.00..2647308.80 rows=1719572 width=18) (actual time=29656.318..132969.910 rows=2113500 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_sha tbs (cost=0.00..2474351.60 rows=716488 width=18) (actual time=29654.184..132876.292 rows=704500 loops=3)
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))
Rows Removed by Filter: 14678996
Planning Time: 0.164 ms
Execution Time: 133557.767 ms
> Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.
I strongly agree with your viewpoint, but I currently don't have a solution in mind for the problem.
At 2023-07-28 04:38:39, "Charly" <carlbsb@gmail.com> wrote:
Hi "gzh",Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you have:> TBL_SHA
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)Assuming a composed index here by the 3 columns.
> TBL_INF
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> ry_cd character(8) NOT NULL -- PRIMARY KEYHere it's more clear that there is a composed index based on those 2 columns.The problem is none of the explains you sent match with the description above. The last one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your standard of 2 characters column name). There may have a couple of explanations to this:- One is that the index may not be exactly the same as described above;- Another one is the order in the index. Because you have a composed index the order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the following order: (ms_cd, etrys, et_cd) and run the same query;There may be other problems happening there, but those are the ones I see more evident from the description of the problem.Giving a closer look to your query I really didn't understand the reasoning to have that subselect as it seems only to validate the two relations have common items or to validate a parent relationship, in this case you can probably use a join to get the same result:SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001';You can also try to trick the optimizer, for example, what is the result (and explain) of the below query?WITH tbi (ry_cd) AS (
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs .etrys
WHERE tbs .ms_cd = 'MLD009';You can alternatively try the CTE using the both columns in the JOIN clause.On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:Thank you for your suggestion.The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.Could the low performance be due to the low configuration of AWS RDS?We are considering trying a higher configuration instance.Well, adding more resources tends to improve performance, but it's usually not linear and the improvement may not be as large as you want for the extra price you are paying. I would first try to understand the performance problem because using the "add more resources" approach may just delay the problem and it tends to get worse with time as the dataset increases.I hope it helps.
--Regards,Charly BatistaShanghai, ChinaLinux user #391083
“If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas."
George Bernard Shaw (1856-1950)
On 2023-07-26 15:46:16 +0800, gzh wrote: > SET enable_seqscan TO off; [...] > -> Parallel Bitmap Heap Scan on tbl_sha (cost=92112.45..2663789.14 rows=800650 width=18) (actualtime=260.540..21442.169 rows=804500 loops=3) > Recheck Cond: (ms_cd = 'MLD009'::bpchar) > Rows Removed by Index Recheck: 49 > Filter: (etrys = '00000001'::bpchar) > Rows Removed by Filter: 295500 > Heap Blocks: exact=13788 lossy=10565 > -> Bitmap Index Scan on index_search_04_mscd_cdate (cost=0.00..91632.06 rows=3402599 width=0)(actual time=249.718..249.718 rows=3300000 loops=1) > Index Cond: (ms_cd = 'MLD009'::bpchar) So now it's using index_search_04_mscd_cdate which contains only ms_cd (and - judging from the name, other fields not relevant to this query), but it still doesn't use index_search_01 which would fit the query exactly. I can understand that Postgres prefers a sequential scan over an index scan (the number of matching rows is about 10% of the total table size which is a lot), but why would it prefer a less specific index to a more specific one? Can you get Postgres to use that index at all? Find a combination of ms_cd and etrys which doesn't cover millions of rows and try that. Also try lowering random_page_cost. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> On 29 Jul 2023, at 10:59, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2023-07-26 15:46:16 +0800, gzh wrote: >> SET enable_seqscan TO off; > [...] >> -> Parallel Bitmap Heap Scan on tbl_sha (cost=92112.45..2663789.14 rows=800650 width=18) (actualtime=260.540..21442.169 rows=804500 loops=3) >> Recheck Cond: (ms_cd = 'MLD009'::bpchar) >> Rows Removed by Index Recheck: 49 >> Filter: (etrys = '00000001'::bpchar) >> Rows Removed by Filter: 295500 >> Heap Blocks: exact=13788 lossy=10565 >> -> Bitmap Index Scan on index_search_04_mscd_cdate (cost=0.00..91632.06 rows=3402599 width=0)(actual time=249.718..249.718 rows=3300000 loops=1) >> Index Cond: (ms_cd = 'MLD009'::bpchar) > > So now it's using index_search_04_mscd_cdate which contains only ms_cd > (and - judging from the name, other fields not relevant to this query), > but it still doesn't use index_search_01 which would fit the query > exactly. I can understand that Postgres prefers a sequential scan over > an index scan (the number of matching rows is about 10% of the total > table size which is a lot), but why would it prefer a less specific > index to a more specific one? > > Can you get Postgres to use that index at all? > > Find a combination of ms_cd and etrys which doesn't cover millions of > rows and try that. > > Also try lowering random_page_cost. Wasn’t this an RDS server with just 4GB of memory? How large are those multi-column indices? Perhaps they don’t (all) fit into available cache memory and the server decidedto use the one that it had cached? I’m frankly not at all certain how the server would behave around such resource shortage situations, but I suppose loadingan uncached index into cache could get a higher cost than using a less optimal (costlier) index that’s already cached. Regarding lowering random_page_cost; If your index files are on SSD storage, lowering that sufficiently (to a realistic value)could then sufficiently lower the cost of loading that uncached index into memory, evicting the index it was usingin above plan to make room (unless other active sessions are using it). Alban Hertroys -- There is always an exception to always.