Thread: PostgreSQL 10.0 SELECT LIMIT performance problem
Hi!
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.
Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit
I’m gathered some technical information about DB structure and execution (see attachment).
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv
www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Attachment
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Attachment
Hi!
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>' siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%') )
order by (tvc_convert_array_to_date(a.
jdata -> 'alarmTime')) desc limit 20;
Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.
Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit
I’m gathered some technical information about DB structure and execution (see attachment).
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lvwww.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Attachment
Hi Pavel!
He know how to trick SQL, but this not a solution. We have different filters with different distribution fields, for example, jdata->>'dn' distribution is 2.8%, but jdata->>’tech’ 25%, this means we must get count and based on count modify select. This is DB server job, not an application server job.
Best regards,
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv
www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:43
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
Hi!
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.
Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit
I’m gathered some technical information about DB structure and execution (see attachment).
Sometimes LIMIT clause can confuse optimizator, when data are not uniform.
You can try OFFSET 0 trick:
Original query: SELECT * FROM t ORDER BY c LIMIT 10
transform to:
SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lvwww.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Attachment
Hi Victor!
I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.
Subselect:
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and e.jdata->>'siteId' IS NOT NULL
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%za%')
oss_objects.jdata->>’sid’ distribution is 4.2% , null value count 319, total rows 31693, but execution plan always think that real rows returned will be less than actual:
Gather (cost=1014.16..5280.19 rows=9 width=32) (actual time=1.304..18.982 rows=1291 loops=1)
-> Nested Loop (cost=14.16..4279.28 rows=5 width=32) (actual time=3.172..14.497 rows=646 loops=2)
-> Parallel Bitmap Heap Scan on tvcis.tvc_entity e (cost=13.87..1694.08 rows=149 width=208) (actual time=3.017..11.597 rows=36 loops=2)
-> Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..17.12 rows=23 width=951) (actual time=0.015..0.054 rows=18 loops=71)
If oss_objects.jdata->>’sid’ distribution is 4.2% then for every unique tvc_entity.jdata->>’siteId’ will be selected 31693 * 4.2% = 23 rows, this means join must return more rows than select from tvc_entity, what is real result, but optimized think in reverse.
We have another example, where we don’t understand how to solve it, and this may not a bug. In this example we have full text search index within one table, and, when we hit value with low hit count, select runs slow:
select a.id, a.jdata
from tvcis.oss_alarms a
where (
to_tsvector(
'simple'::regconfig,
a.ts_vector_fields
) @@ to_tsquery(
'simple',
' (decodedType5b6f0753ossAlarmsGenerator) '
)
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) DESC
limit 20;
Bad case:
Sort (cost=8787.37..8799.58 rows=4885 width=809) (actual time=7.351..7.423 rows=586 loops=1)
Without limit:
Planning time: 0.301 ms
Execution time: 7.574 ms
With limit:
Planning time: 0.312 ms
Execution time: 4972.245 ms
Good case:
From count:
-> Parallel Bitmap Heap Scan on tvcis.oss_alarms a (cost=5080.23..455246.09 rows=248615 width=0) (actual time=1142.280..175706.097 rows=199195 loops=3)
Without limit:
Too many rows to test, get count instead
With limit:
Planning time: 0.370 ms
Execution time: 43.131 ms
But counts is a nightmare:
Planning time: 0.236 ms
Execution time: 176343.730 ms
Couple examples of explain plan attached.
Best regards,
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv
www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
From: Victor Yegorov <vyegorov@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:32
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
Hi, Mareks.
As this is not actually a bug, it's better to use pgsql-performance or pgsql-general for such questions next time.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
As execution plan without limit shows, your join conditions yield 316 rows, explicit sort is fast for this amount
For the plan with the limit, planner prefers to use `idx_oss_alarms_alarm_time` index, to avoid extra sort.
This leads to:
Rows Removed by Join Filter: 10717797
I.e. you're reading 10M rows via index scan and later throw them away, as they do not match your join condition:
Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->> 'ossDn'::text))
It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated, although planner doesn't knows that.
PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not expressions.
You can try disabling `idx_oss_alarms_alarm_time` by using expression in `ORDER BY`:
order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata -> 'alarmTime'))+INTERVAL '0' desc
But be warned — this can help in some situations and make things worse in others.
I would recommend to move JOIN and ORDER BY columns out of JSON and make them direct table columns.
--
Victor Yegorov
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Attachment
I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.
Attachment
Victor Yegorov <vyegorov@gmail.com> writes: > The fact, that planner is not accurate on the estimates of JSON internal > keys is expected, PostgreSQL is not parsing JSON values when gathering > stats. > You cannot expect planner to be picky about all possible corner cases, it > would make planning time enormously huge. Right. The fact that it doesn't make the right guesses without help can't be considered to be a bug in all cases. These are engineering tradeoffs we have to make. > That is the reason I outlined, that important keys should be extracted into > plain columns. If that seems infeasible from an application standpoint, another possibility is to make expression indexes on those important keys. ANALYZE will gather stats on the values of indexed expressions, and then perhaps the planner will have enough info to make better decisions. In the other case mentioned, where the problem is a poor guess about the selectivity of where ( to_tsvector( 'simple'::regconfig, a.ts_vector_fields ) @@ to_tsquery( 'simple', ' (decodedType5b6f0753ossAlarmsGenerator) ' ) ) it's the same problem: the planner has no stats that would let it figure out the selectivity. It can't reasonably extract an estimate on the fly --- if it did, you'd be complaining that planning time was too long. The only way to get reasonable behavior is to set things up so that ANALYZE will accumulate stats about the values of "to_tsvector('simple',a.ts_vector_fields)". You can arrange that either by extracting that into a column, or by making an index on it. regards, tom lane
On 9/12/18 7:04 PM, Tom Lane wrote: > Victor Yegorov <vyegorov@gmail.com> writes: >> The fact, that planner is not accurate on the estimates of JSON internal >> keys is expected, PostgreSQL is not parsing JSON values when gathering >> stats. >> You cannot expect planner to be picky about all possible corner cases, it >> would make planning time enormously huge. > Right. The fact that it doesn't make the right guesses without help > can't be considered to be a bug in all cases. These are engineering > tradeoffs we have to make. > >> That is the reason I outlined, that important keys should be extracted into >> plain columns. > If that seems infeasible from an application standpoint, another > possibility is to make expression indexes on those important keys. > ANALYZE will gather stats on the values of indexed expressions, and then > perhaps the planner will have enough info to make better decisions. Hi, For what it can help, I wrote this article explaining that: https://blog.anayrat.info/en/2017/11/26/postgresql---jsonb-and-statistics/ Regards,
Attachment
Hi!
But we are using value indexes not a json index:
CREATE INDEX idx_oss_alarms_dn
ON oss_alarms
USING btree
((jdata ->> 'dn'::text) COLLATE pg_catalog."default");
CREATE INDEX idx_oss_objects_sid
ON oss_objects
USING btree
((jdata ->> 'sid'::text) COLLATE pg_catalog."default");
CREATE UNIQUE INDEX idx_tvc_entity_sid_u
ON tvc_entity
USING btree
((jdata ->> 'siteId'::text) COLLATE pg_catalog."default")
WHERE (jdata ->> 'siteId'::text) IS NOT NULL;
May be we don’t understand this index behavior?
Mareks
From: Victor Yegorov <vyegorov@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 19:43
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
ср, 12 сент. 2018 г. в 19:21, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.
The fact, that planner is not accurate on the estimates of JSON internal keys is expected, PostgreSQL is not parsing JSON values when gathering stats.
You cannot expect planner to be picky about all possible corner cases, it would make planning time enormously huge.
That is the reason I outlined, that important keys should be extracted into plain columns.
If you still consider this is a bug, please — send isolated reproducible test case that demonstrates the bug.
--
Victor Yegorov
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
Hi!
My point is, why planned ignore table hits and, when joining tables, think that rows will be less than in reality:
Table1: Index Scan using idx_tvc_entity_sid_u on tvcis.tvc_entity e (cost=0.28..1856.72 rows=253 width=208) (actual time=0.091..13.944 rows=71 loops=1)
Table2: Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..16.84 rows=23 width=951) (actual time=0.009..0.033 rows=18 loops=71)
Join: Nested Loop (cost=0.57..6175.55 rows=9 width=951) (actual time=0.112..17.355 rows=1291 loops=1)
Why planned don’t take in account results from each table:
Join rows = Table1 rows * Table2 rows == 253 * 23 = 5819? In this case difference will be 22%, but planner got 9 rows what is 0.7% from actual result
As we see, actual rows almost equal with subquery row multiplication 71 * 18 ~= 1291
Mareks
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: trešdiena, 2018. gada 12. septembris 20:04
To: Victor Yegorov <vyegorov@gmail.com>
Cc: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>; pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
Victor Yegorov <vyegorov@gmail.com> writes:
> The fact, that planner is not accurate on the estimates of JSON
> internal keys is expected, PostgreSQL is not parsing JSON values when
> gathering stats.
> You cannot expect planner to be picky about all possible corner cases,
> it would make planning time enormously huge.
Right. The fact that it doesn't make the right guesses without help can't be considered to be a bug in all cases. These are engineering tradeoffs we have to make.
> That is the reason I outlined, that important keys should be extracted
> into plain columns.
If that seems infeasible from an application standpoint, another possibility is to make expression indexes on those important keys.
ANALYZE will gather stats on the values of indexed expressions, and then perhaps the planner will have enough info to make better decisions.
In the other case mentioned, where the problem is a poor guess about the selectivity of
where (
to_tsvector(
'simple'::regconfig,
a.ts_vector_fields
) @@ to_tsquery(
'simple',
' (decodedType5b6f0753ossAlarmsGenerator) '
)
)
it's the same problem: the planner has no stats that would let it figure out the selectivity. It can't reasonably extract an estimate on the fly --- if it did, you'd be complaining that planning time was too long. The only way to get reasonable behavior is to set things up so that ANALYZE will accumulate stats about the values of "to_tsvector('simple',a.ts_vector_fields)". You can arrange that either by extracting that into a column, or by making an index on it.
regards, tom lane
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
=?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <Mareks.Kalnacs@datakom.lv> writes: > But we are using value indexes not a json index: > CREATE INDEX idx_oss_alarms_dn > ON oss_alarms > USING btree > ((jdata ->> 'dn'::text) COLLATE pg_catalog."default"); I think you're outsmarting yourself by including those COLLATE clauses. They don't do anything, since they're just selecting the default behavior --- but they're enough to make the planner not realize that stats collected on the index expression would be applicable to a plain reference to oss_alarms.jdata ->> 'dn'. In general you want the index expression to be spelled exactly the same way that you refer to the value in queries, else the system may not realize it's relevant. regards, tom lane
Actually Postgres trying to outsmart himself, I do not add COLLATE ;) Original trigger text CREATE INDEX idx_oss_alarms_dn ON oss_alarms ((jdata ->> 'dn')); Mareks -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: ceturtdiena, 2018. gada 13. septembris 20:42 To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv> Cc: Victor Yegorov <vyegorov@gmail.com>; pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>; Pāvels Koržs <Pavels.Korzs@datakom.lv> Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem =?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <Mareks.Kalnacs@datakom.lv> writes: > But we are using value indexes not a json index: > CREATE INDEX idx_oss_alarms_dn > ON oss_alarms > USING btree > ((jdata ->> 'dn'::text) COLLATE pg_catalog."default"); I think you're outsmarting yourself by including those COLLATE clauses. They don't do anything, since they're just selecting the default behavior --- but they're enough to make the planner not realize that stats collected on the index expression would be applicable toa plain reference to oss_alarms.jdata ->> 'dn'. In general you want the index expression to be spelled exactly the sameway that you refer to the value in queries, else the system may not realize it's relevant. regards, tom lane ________________________________ Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciāluinformāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana,izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdasdēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļSIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulēvai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotiedokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības.Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058,Latvija This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential informationand may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit,disseminate or take any action in reliance upon it. If this electronic transmission is received in error, pleasecontact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor alle-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise,by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance ofa contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registeredoffice: Vienibas gatve 109, Riga, LV-1058, Latvia ________________________________