Thread: Slow statement using parallelism after 9.6>11 upgrade
Hi list We have upgraded our database from 9.6 to 11 (and updated PostGIS from 2.3 to 2.5 as well). We are using it among other things to store an OpenStreetMap database, populated by the osm2pgsql program and updated on a daily basis. The process used to take ~1h30 minutes before the upgrade, but after the upgrade it looks like it will never run to completion in a 24h time frame. The problem is apparently that the planner choses to run some statements in parallel, but these take a lot longer than when run without it. Here are the to explain analyze of the same statement, with parallelism on and off : Statement : PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE nodes && ARRAY[$1]; EXPLAIN EXECUTE mark_ways_by_node(1); max_parallel_workers_per_gather = 2 : Gather (cost=12545.61..2357352.25 rows=1420982 width=8) (actual time=2.577..64.028 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on planet_osm_ways (cost=11545.61..2214254.05 rows=592076 width=8) (actual time=0.213..0.213 rows=0 loops=3) Recheck Cond: (nodes && '{1}'::bigint[]) Heap Blocks: exact=1 -> Bitmap Index Scan on planet_osm_ways_nodes_idx (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174 rows=1 loops=1) Index Cond: (nodes && '{1}'::bigint[]) Planning Time: 8.596 ms Execution Time: 64.135 ms max_parallel_workers_per_gather = 0 : Bitmap Heap Scan on planet_osm_ways (cost=11545.61..3462154.46 rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1) Recheck Cond: (nodes && '{1}'::bigint[]) Heap Blocks: exact=1 -> Bitmap Index Scan on planet_osm_ways_nodes_idx (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 rows=1 loops=1) Index Cond: (nodes && '{1}'::bigint[]) Planning Time: 0.193 ms Execution Time: 0.773 ms So this Parallel Bitmap Heap Scan seems to be quite problematic here. What runtime setting should I change to fix this, without losing the benefit of parallelism ? Thanks a lot ! Cheers -- Arnaud
"Arnaud L." <arnaud.listes@codata.eu> writes: > We have upgraded our database from 9.6 to 11 (and updated PostGIS from > 2.3 to 2.5 as well). > ... Have you re-ANALYZEd the database? The problem with this query seems to be the spectacularly awful rowcount estimate here: > -> Bitmap Index Scan on planet_osm_ways_nodes_idx > (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 > rows=1 loops=1) > Index Cond: (nodes && '{1}'::bigint[]) The planner should be able to do better than that, given up-to-date statistics on the "nodes" column. regards, tom lane
Le 03/09/2019 à 15:43, Tom Lane a écrit : > "Arnaud L." <arnaud.listes@codata.eu> writes: >> We have upgraded our database from 9.6 to 11 (and updated PostGIS from >> 2.3 to 2.5 as well). >> ... > > Have you re-ANALYZEd the database? The problem with this query > seems to be the spectacularly awful rowcount estimate here: You mean after the upgrade process ? Yes I have. I've juste re-run "ANALYZE table" to rule this out, estimates are the same. Maybe some statistic target problem ? Estimated number of rows is 284.196.352 Also, this is a GIN index on a bigint[] column. I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the time being which solves this specific problem. These value don't look very sensible though, they are very high compared to the default ones. Cheers -- Arnaud
> On Sep 3, 2019, at 7:06 AM, Arnaud L. <arnaud.listes@codata.eu> wrote: > > Le 03/09/2019 à 15:43, Tom Lane a écrit : >> "Arnaud L." <arnaud.listes@codata.eu> writes: >>> We have upgraded our database from 9.6 to 11 (and updated PostGIS from 2.3 to 2.5 as well). >>> ... >> Have you re-ANALYZEd the database? The problem with this query >> seems to be the spectacularly awful rowcount estimate here: > > You mean after the upgrade process ? Yes I have. > I've juste re-run "ANALYZE table" to rule this out, estimates are the same. > > Maybe some statistic target problem ? Estimated number of rows is 284.196.352 > Also, this is a GIN index on a bigint[] column. > > I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the time being which solves this specific problem.These value don't look very sensible though, they are very high compared to the default ones. You can also leave that setting unchanged and just change the behaviour on your one table: ALTER TABLE nodes SET ( parallel_workers = 0); P. > > Cheers > -- > Arnaud > >
> We have upgraded our database from 9.6 to 11
This is now the latest PG ? PG11.5?
( for example - in PG11.5 fixed: "Fix failure to resolve deadlocks involving multiple parallel worker processes"
> populated by the osm2pgsql program and updated on a daily basis.
> What runtime setting should I change to fix this, without losing the benefit of parallelism ?
- the osm2pgsql has an own parallelizations ... ( osm2pgsql --number-processes .. )
so be careful to add more parallelisation to the PG side with the same osm2pgsql parameters ! ( check the memory usages! )
so be careful to add more parallelisation to the PG side with the same osm2pgsql parameters ! ( check the memory usages! )
- check the benchmarks and the tunnings: https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
- you can ask help on the : "osm dev mail list" (="dev OpenStreetMap developer discusssion" https://lists.openstreetmap.org/listinfo/dev )
Imre
Arnaud L. <arnaud.listes@codata.eu> ezt írta (időpont: 2019. szept. 3., K, 14:11):
Hi list
We have upgraded our database from 9.6 to 11 (and updated PostGIS from
2.3 to 2.5 as well).
We are using it among other things to store an OpenStreetMap database,
populated by the osm2pgsql program and updated on a daily basis.
The process used to take ~1h30 minutes before the upgrade, but after the
upgrade it looks like it will never run to completion in a 24h time frame.
The problem is apparently that the planner choses to run some statements
in parallel, but these take a lot longer than when run without it.
Here are the to explain analyze of the same statement, with parallelism
on and off :
Statement :
PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE
nodes && ARRAY[$1];
EXPLAIN EXECUTE mark_ways_by_node(1);
max_parallel_workers_per_gather = 2 :
Gather (cost=12545.61..2357352.25 rows=1420982 width=8) (actual
time=2.577..64.028 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on planet_osm_ways
(cost=11545.61..2214254.05 rows=592076 width=8) (actual
time=0.213..0.213 rows=0 loops=3)
Recheck Cond: (nodes && '{1}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174
rows=1 loops=1)
Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 8.596 ms
Execution Time: 64.135 ms
max_parallel_workers_per_gather = 0 :
Bitmap Heap Scan on planet_osm_ways (cost=11545.61..3462154.46
rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1)
Recheck Cond: (nodes && '{1}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
rows=1 loops=1)
Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 0.193 ms
Execution Time: 0.773 ms
So this Parallel Bitmap Heap Scan seems to be quite problematic here.
What runtime setting should I change to fix this, without losing the
benefit of parallelism ?
Thanks a lot !
Cheers
--
Arnaud
Le 03/09/2019 à 16:39, Paul Ramsey a écrit : >> On Sep 3, 2019, at 7:06 AM, Arnaud L. <arnaud.listes@codata.eu> wrote: >> I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the time being which solves this specific problem.These value don't look very sensible though, they are very high compared to the default ones. > > You can also leave that setting unchanged and just change the behaviour on your one table: > > ALTER TABLE nodes SET ( parallel_workers = 0); Nice, this will fix this precise query and leave parallelism in place for the other ones. Thanks ! -- Arnaud
Le 03/09/2019 à 17:02, Imre Samu a écrit : > > We have upgraded our database from 9.6 to 11 > > This is now the latest PG ? PG11.5? Yes, PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit. > - the osm2pgsql has an own parallelizations ... ( osm2pgsql > --number-processes .. ) > so be careful to add more parallelisation to the PG side with the > same osm2pgsql parameters ! ( check the memory usages! ) > - check the benchmarks and the tunnings: > https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks > - you can ask help on the : "osm dev mail list" (="dev OpenStreetMap > developer discusssion" https://lists.openstreetmap.org/listinfo/dev ) Yes, I know. It as been fine tuned over some time now to fit our server's capacity. Disabling parallelisme has allowed the update to run in ~5000s, which is consistent with what we could witness before the upgrade. I had started writing an issue on the osm2pgsql github before realizing it was probably a PG-only issue. Cheers -- Arnaud
Le 03/09/2019 à 15:43, Tom Lane a écrit : > "Arnaud L." <arnaud.listes@codata.eu> writes: >> We have upgraded our database from 9.6 to 11 (and updated PostGIS from >> 2.3 to 2.5 as well). >> ... > > Have you re-ANALYZEd the database? The problem with this query > seems to be the spectacularly awful rowcount estimate here: > >> -> Bitmap Index Scan on planet_osm_ways_nodes_idx >> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 >> rows=1 loops=1) >> Index Cond: (nodes && '{1}'::bigint[]) > > The planner should be able to do better than that, given up-to-date > statistics on the "nodes" column. Tom, I can confirm that with up to date statistics the planner is still lost. I did a REINDEX to rule out a broken index and the estimate is still in the 100k+ range. Regards -- Arnaud
Le 04/09/2019 à 09:04, Arnaud L. a écrit : > Tom, I can confirm that with up to date statistics the planner is still > lost. > I did a REINDEX to rule out a broken index and the estimate is still in > the 100k+ range. Sorry, I meant 1M+ range. EXPLAIN ANALYZE select id from planet_osm_ways WHERE nodes && ARRAY[123456789::bigint]; (parallel_workers = 0 on the table as per Paul's recommandation) : Bitmap Heap Scan on planet_osm_ways (cost=11582.45..3535447.30 rows=1419000 width=8) (actual time=0.198..0.199 rows=1 loops=1) Recheck Cond: (nodes && '{123456789}'::bigint[]) Heap Blocks: exact=1 -> Bitmap Index Scan on planet_osm_ways_nodes_idx (cost=0.00..11227.70 rows=1419000 width=0) (actual time=0.151..0.151 rows=1 loops=1) Index Cond: (nodes && '{123456789}'::bigint[]) Planning Time: 0.260 ms Execution Time: 0.249 ms Regards -- Arnaud
Le 04/09/2019 à 09:04, Arnaud L. a écrit : > Le 03/09/2019 à 15:43, Tom Lane a écrit : >> The planner should be able to do better than that, given up-to-date >> statistics on the "nodes" column. > > Tom, I can confirm that with up to date statistics the planner is still > lost. > I did a REINDEX to rule out a broken index and the estimate is still in > the 100k+ range. Setting STATISTICS target to 10.000 on the nodes column and re-analyzing did not help either. Estimates are still ~1.4M rows when result is 1 row. Regards -- Arnaud
Le 03/09/2019 à 15:43, Tom Lane a écrit : > "Arnaud L." <arnaud.listes@codata.eu> writes: >> -> Bitmap Index Scan on planet_osm_ways_nodes_idx >> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268 >> rows=1 loops=1) >> Index Cond: (nodes && '{1}'::bigint[]) > > The planner should be able to do better than that, given up-to-date > statistics on the "nodes" column. Sorry to up this thread, but is there anything I can do to help the planner in this particular case ? REINDEXing did not help, nor did ANALYZEing with different STATISTICS target for this specific column. Regards -- Arnaud