Thread: How does Postgres decide if to use additional workers?

How does Postgres decide if to use additional workers?

From

Thorsten Schöning

Date:

09 February 2021, 15:52:02

Hi all,

I have the following table with an added BTREE-index on "captured_at".

> CREATE TABLE datagram
> (
>   id            bigserial                 NOT NULL,
>   src_re        integer                   NOT NULL,
>   src_clt       integer                   NOT NULL,
>   src_meter     integer                   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint                  NOT NULL,
>   oms_status    smallint                  NOT NULL,
>   oms_enc       bytea,
>   oms_dec       bytea
> );

I have a query containing the following condition:

> AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) - CAST('P5D'
ASINTERVAL)) AND (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + CAST('P0D' AS INTERVAL)) 

When that query is applied to the above table with it's index on
"captured_at", that results in the following query plan. The important
thing to note is that NO parallel workers a re used.

> ->  Hash Join  (cost=246164.35..2004405.07 rows=11323 width=51) (actual time=93.802..5776.755 rows=104607 loops=1)
>     Hash Cond: (meter.meter_bcd = meter_bcd.id)
>     ->  Hash Join  (cost=246019.19..2003889.83 rows=68494 width=37) (actual time=93.067..5744.787 rows=104607
loops=1)
>         Hash Cond: (datagram.src_meter = meter.id)
>         ->  Index Scan using idx_datagram_captured_at_btree on datagram  (cost=0.57..1756571.73 rows=495033 width=20)
(actualtime=0.054..5451.417 rows=514369 loops=1) 
>               Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5
days'::interval))AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval))) 

For various reasons I tested the above table as a partitioned one as
well, with individual partitions containing the rows of one year only.
The important thing to note is that I simply kept the same index on
"captured_at" like before, though the query plan looks different now:

> Workers Planned: 2
> Workers Launched: 2
> ->  Hash Join  (cost=245966.53..272335.67 rows=5419 width=51) (actual time=625.846..1560.103 rows=34869 loops=3)
>     Hash Cond: (datagram_y2020_h2.src_meter = meter.id)
>     ->  Parallel Append  (cost=4.19..25430.72 rows=236911 width=20) (actual time=2.827..863.298 rows=171456 loops=3)
>           Subplans Removed: 23
>           ->  Parallel Index Scan using datagram_y2020_h2_captured_at_idx on datagram_y2020_h2  (cost=0.44..24051.22
rows=236888width=20) (actual time=2.826..848.388 rows=171456 loops=3) 
>                 Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5
days'::interval))AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval))) 

It seems that only because of a different number of rows per
individual table, additional workers are used. Though, in the past I
had all of those "captured_at" in one table as well, only with far
less columns and for that table additional workers have been used,
pretty much like is the case now:

> Workers Planned: 2
> Workers Launched: 2
> ->  Hash Join  (cost=264793.42..1666293.23 rows=4332 width=51) (actual time=96.080..638.802 rows=34869 loops=3)
>     Hash Cond: (oms_rec.meter = meter.id)
>     ->  Nested Loop  (cost=1.14..1400747.39 rows=189399 width=20) (actual time=0.145..496.366 rows=171456 loops=3)
>     ->  Hash  (cost=264709.53..264709.53 rows=6620 width=39) (actual time=95.521..95.528 rows=40044 loops=3)
>           Buckets: 65536 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 3016kB
>           ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  (cost=0.57..14853.95 rows=189399
width=24)(actual time=0.098..81.556 rows=171456 loops=3) 
>           ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.32 rows=1 width=12) (actual time=0.002..0.002
rows=1loops=514369) 
>           ->  Hash Join  (cost=145.59..264709.53 rows=6620 width=39) (actual time=9.883..86.390 rows=40044 loops=3)
>                 Index Cond: (id = clt_rec.oms_rec)
>                 Index Cond: ((captured_at >= ('2020-08-28 10:34:32.855+02'::timestamp with time zone - '5
days'::interval))AND (captured_at <= ('2020-08-28 10:34:32.855+02'::timestamp with time zone + '00:00:00'::interval))) 
>                 Hash Cond: (meter.meter_bcd = meter_bcd.id)

So, based on which facts does Postgres decide if to use aadditional
workers or not? Can I see those decisions explained somewhere? I don't
see anything in the query plan. Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska

Re: How does Postgres decide if to use additional workers?

From

luis.roberto@siscobra.com.br

Date:

09 February 2021, 16:23:06

De: "Thorsten Schöning" <tschoening@am-soft.de>
Para: "pgsql-general" <pgsql-general@lists.postgresql.org>
Enviadas: Terça-feira, 9 de fevereiro de 2021 12:52:02
Assunto: How does Postgres decide if to use additional workers?

So, based on which facts does Postgres decide if to use aadditional
workers or not? Can I see those decisions explained somewhere? I don't
see anything in the query plan. Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

Hi!

I think you should look into min_parallel_table_scan_size and min_parallel_index_scan_size GUCs.

Re: How does Postgres decide if to use additional workers?

From

Philip Semanchuk

Date:

10 February 2021, 14:22:58

> On Feb 9, 2021, at 10:52 AM, Thorsten Schöning <tschoening@am-soft.de> wrote:
>
> So, based on which facts does Postgres decide if to use aadditional
> workers or not? Can I see those decisions explained somewhere? I don't
> see anything in the query plan. Thanks!

Hi Thorsten,
This is an interesting topic for me too. here’s a formula for the max number of workers that Postgres will consider for
atable. Of course, most queries use more than just one table, and I don’t yet understand how Postgres handles the
situationwhere the formula suggests multiple workers for some tables and a single worker for others. 

There was some conversation about this on the performance mailing list in June. Here’s a link to the message that
containsthe formula; there’s more items of interest in the whole thread: 
https://www.postgresql.org/message-id/89423FD3-0F13-447D-8D9E-EB1722150F94%40americanefficient.com

Hope this helps
Philip