Thread: Aggregate and many LEFT JOIN
Hi all,
I need to optimize the following query
I use it to create a materialized view, but I think there is room for optimization.
I tried to
SET join_collapse_limit TO 15;
with to real difference.
Explain shows that the GROUP AGGREGATE and needed sort kill the performance.
Do you have any hint how to optimize this ?
Regards
Michaël
On Fri, Feb 22, 2019 at 04:14:05PM +0100, kimaidou wrote: > Explain shows that the GROUP AGGREGATE and needed sort kill the performance. > Do you have any hint how to optimize this ? > https://explain.depesz.com/s/6nf This is writing 2GB tempfile, perhaps the query would benefit from larger work_mem: |Sort (cost=3,014,498.66..3,016,923.15 rows=969,796 width=1,818) (actual time=21,745.193..22,446.561 rows=1,212,419 loops=1) | Sort Method: external sort Disk: 1782200kB | Buffers: shared hit=5882951, temp read=230958 written=230958 This is apparently joining without indices: |Nested Loop Left Join (cost=1.76..360,977.37 rows=321,583 width=1,404) (actual time=0.080..1,953.007 rows=321,849 loops=1) | Join Filter: (tgc1.groupe_nom = t.group1_inpn) | Rows Removed by Join Filter: 965547 | Buffers: shared hit=1486327 This perhaps should have an index on tgc2.groupe_type ? |Index Scan using t_group_categorie_pkey on taxon.t_group_categorie tgc2 (cost=0.14..0.42 rows=1 width=28) (actual time=0.002..0.002rows=1 loops=321,849) | Index Cond: (tgc2.groupe_nom = t.group2_inpn) | Filter: (tgc2.groupe_type = 'group2_inpn'::text) | Buffers: shared hit=643687 This would perhaps benefit from an index on tv.cd_ref ? |Index Scan using taxref_consolide_non_filtre_cd_nom_idx on taxon.taxref_consolide_non_filtre tv (cost=0.42..0.63 rows=1width=94) (actual time=0.002..0.002 rows=1 loops=690,785) | Index Cond: (tv.cd_nom = t.cd_ref) | Filter: (tv.cd_nom = tv.cd_ref) | Buffers: shared hit=2764875 I don't think it's causing a significant fraction of the issue, but for some reason this is overestimating rowcount by 2000. Do you need to VACUUM ANALYZE the table ? |Seq Scan on occtax.personne p_1 (cost=0.00..78.04 ROWS=2,204 width=56) (actual time=0.011..0.011 ROWS=1 loops=1) Justin
Curious- Is geqo_threshold still set to 12? Is increasing join_collapse_limit to be higher than geqo_threshold going to have a noticeable impact?
The disk sorts are the killer as Justin says. I wonder how it performs with that increased significantly. Is the storage SSD or traditional hard disks?
Michael Lewis
On Fri, Feb 22, 2019 at 8:54 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Feb 22, 2019 at 04:14:05PM +0100, kimaidou wrote:
> Explain shows that the GROUP AGGREGATE and needed sort kill the performance.
> Do you have any hint how to optimize this ?
> https://explain.depesz.com/s/6nf
This is writing 2GB tempfile, perhaps the query would benefit from larger
work_mem:
|Sort (cost=3,014,498.66..3,016,923.15 rows=969,796 width=1,818) (actual time=21,745.193..22,446.561 rows=1,212,419 loops=1)
| Sort Method: external sort Disk: 1782200kB
| Buffers: shared hit=5882951, temp read=230958 written=230958
This is apparently joining without indices:
|Nested Loop Left Join (cost=1.76..360,977.37 rows=321,583 width=1,404) (actual time=0.080..1,953.007 rows=321,849 loops=1)
| Join Filter: (tgc1.groupe_nom = t.group1_inpn)
| Rows Removed by Join Filter: 965547
| Buffers: shared hit=1486327
This perhaps should have an index on tgc2.groupe_type ?
|Index Scan using t_group_categorie_pkey on taxon.t_group_categorie tgc2 (cost=0.14..0.42 rows=1 width=28) (actual time=0.002..0.002 rows=1 loops=321,849)
| Index Cond: (tgc2.groupe_nom = t.group2_inpn)
| Filter: (tgc2.groupe_type = 'group2_inpn'::text)
| Buffers: shared hit=643687
This would perhaps benefit from an index on tv.cd_ref ?
|Index Scan using taxref_consolide_non_filtre_cd_nom_idx on taxon.taxref_consolide_non_filtre tv (cost=0.42..0.63 rows=1 width=94) (actual time=0.002..0.002 rows=1 loops=690,785)
| Index Cond: (tv.cd_nom = t.cd_ref)
| Filter: (tv.cd_nom = tv.cd_ref)
| Buffers: shared hit=2764875
I don't think it's causing a significant fraction of the issue, but for some
reason this is overestimating rowcount by 2000. Do you need to VACUUM ANALYZE
the table ?
|Seq Scan on occtax.personne p_1 (cost=0.00..78.04 ROWS=2,204 width=56) (actual time=0.011..0.011 ROWS=1 loops=1)
Justin
Thanks for your answers.
I have tried via
--show work_mem; "10485kB" -> initial work_mem for my first post
-- set session work_mem='100000kB';
-- set session geqo_threshold = 12;
-- set session join_collapse_limit = 15;
-- set session work_mem='100000kB';
-- set session geqo_threshold = 12;
-- set session join_collapse_limit = 15;
I have a small machine, with SSD disk and 8GB RAM. I cannot really increase work_mem up to 2GB (or more). There are only 300 000 data in occtax.observation, which will increase (and possibly go up to 3 millions...)
I am running PostgreSQL 9.6. I should probably test it against PostgreSQL 11 as many improvements has been made.
I even tried to remove all non aggregated columns and keep only o.cle_obs (the primary key) to have a
GROUP BY o.cle_obs
AND the query plan does not show a HASH AGGREGATE, but only a GROUP AGGREGATE.
Obviously I have already tried to VACUUM ANALYSE
My current PostgreSQL settings
max_connections = 100
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 10485kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 10485kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
Does the plan change significantly with this-
set session work_mem='250MB';
set session geqo_threshold = 20;
set session join_collapse_limit = 20;
With that expensive sort spilling to disk and then aggregating after that, it would seem like the work_mem being significantly increased is going to make the critical difference. Unless it could fetch the data sorted via an index, but that doesn't seem likely.
I would suggest increase default_statistics_target, but you have good estimates already for the most part. Hopefully someone else will chime in with more.
Michael Lewis
Michael Lewis <mlewis@entrata.com> writes: > Does the plan change significantly with this- > set session work_mem='250MB'; > set session geqo_threshold = 20; > set session join_collapse_limit = 20; Yeah ... by my count there are 16 tables in this query, so raising join_collapse_limit to 15 is not enough to ensure that the planner considers all join orders. Whether use of GEQO is a big problem is harder to say, but it might be. regards, tom lane
Thanks for your answers. I tried with
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;
It seems to have no real impact :
Indeed an index cannot really be used for sorting here, based on the complexity of the returned fields.
Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :
* EXPLAIN : https://explain.depesz.com/s/FR3h -> a group aggregate is used, which : GroupAggregate 1 10,639.313 ms 72.6 %
It is better, but I think 10s for such a query seems bad perf for me.
Regards
Michaël
Le ven. 22 févr. 2019 à 19:06, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Michael Lewis <mlewis@entrata.com> writes:
> Does the plan change significantly with this-
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;
Yeah ... by my count there are 16 tables in this query, so raising
join_collapse_limit to 15 is not enough to ensure that the planner
considers all join orders. Whether use of GEQO is a big problem
is harder to say, but it might be.
regards, tom lane
I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.
* SQL : http://paste.debian.net/1070007/
* EXPLAIN: https://explain.depesz.com/s/D0l
Not really "fast", but I gained 30%
Le lun. 25 févr. 2019 à 09:54, kimaidou <kimaidou@gmail.com> a écrit :
Thanks for your answers. I tried with> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;It seems to have no real impact :Indeed an index cannot really be used for sorting here, based on the complexity of the returned fields.Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :* EXPLAIN : https://explain.depesz.com/s/FR3h -> a group aggregate is used, which : GroupAggregate 1 10,639.313 ms 72.6 %It is better, but I think 10s for such a query seems bad perf for me.RegardsMichaëlLe ven. 22 févr. 2019 à 19:06, Tom Lane <tgl@sss.pgh.pa.us> a écrit :Michael Lewis <mlewis@entrata.com> writes:
> Does the plan change significantly with this-
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;
Yeah ... by my count there are 16 tables in this query, so raising
join_collapse_limit to 15 is not enough to ensure that the planner
considers all join orders. Whether use of GEQO is a big problem
is harder to say, but it might be.
regards, tom lane
On Mon, Feb 25, 2019 at 2:44 AM kimaidou <kimaidou@gmail.com> wrote:
I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.* SQL : http://paste.debian.net/1070007/* EXPLAIN: https://explain.depesz.com/s/D0lNot really "fast", but I gained 30%
It still seems that disk sort and everything after that is where the query plan dies. It seems odd that it went to disk if work_mem was already 250MB. Can you allocate more as a test? As an alternative, if this is a frequently needed data, can you aggregate this data and keep a summarized copy updated periodically?
I manage to avoid the disk sort after performing a VACUUM ANALYSE;
And with a session work_mem = '250MB'
* EXPLAIN https://explain.depesz.com/s/nJ2y
It stills spent 16s
It seems this kind of query will need better hardware to scale...
Thanks for your help
Le lun. 25 févr. 2019 à 19:30, Michael Lewis <mlewis@entrata.com> a écrit :
On Mon, Feb 25, 2019 at 2:44 AM kimaidou <kimaidou@gmail.com> wrote:I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.* SQL : http://paste.debian.net/1070007/* EXPLAIN: https://explain.depesz.com/s/D0lNot really "fast", but I gained 30%It still seems that disk sort and everything after that is where the query plan dies. It seems odd that it went to disk if work_mem was already 250MB. Can you allocate more as a test? As an alternative, if this is a frequently needed data, can you aggregate this data and keep a summarized copy updated periodically?
On Tue, Feb 26, 2019 at 01:54:00PM +0100, kimaidou wrote: > I manage to avoid the disk sort after performing a VACUUM ANALYSE; > And with a session work_mem = '250MB' > > * SQL http://paste.debian.net/1070207/ > * EXPLAIN https://explain.depesz.com/s/nJ2y > > It stills spent 16s > It seems this kind of query will need better hardware to scale... Once you've exhausted other ideas, you could consider making that a TEMPORARY TABLE, and creating an index on it (and analyzing it) and then aggregating. It'd be several separate queries. Justin
On Mon, Feb 25, 2019 at 3:54 AM kimaidou <kimaidou@gmail.com> wrote:
Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :
HashAggregate doesn't support aggregates with DISTINCT. I don't think there is any reason it can't, it is just that no one has gotten around to it.
Aggregates with DISTINCT also kill your ability to get parallel queries.
Cheers,
Jeff