Thread: BUG #15614: Query plan: buffer stats from workers in child operations discarded.

BUG #15614: Query plan: buffer stats from workers in child operations discarded.

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15614
Logged by:          Dave Conlin
Email address:      dc345@cantab.net
PostgreSQL version: 10.4
Operating system:   Windows 10
Description:

Hi folks -

In the output of an `EXPLAIN` query, normally child operations are included
in the totals for the buffers values, eg "Shared Hit Blocks". It seems that
when a parent operation is single-threaded, but a child operation starts
workers, then only the buffers stats from the main thread in the child
operation are included in the totals for the parent operation.

A worked example:

I've run the following query. The "towns" table is a massive table that I
created for testing purposes - don't worry, this isn't a real query, just a
silly example to get some big numbers on a parallel query.

EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE)
SELECT
 name, code, article
FROM
 towns
ORDER BY
 name ASC,
 code DESC;

The output looks like this:

```
[
  {
    "Plan": {
      "Node Type": "Gather Merge",
      "Parallel Aware": false,
      "Startup Cost": 1013948.54,
      "Total Cost": 1986244.55,
      "Plan Rows": 8333384,
      "Plan Width": 77,
      "Actual Startup Time": 42978.838,
      "Actual Total Time": 60628.982,
      "Actual Rows": 10000010,
      "Actual Loops": 1,
      "Output": ["name", "code", "article"],
      "Workers Planned": 2,
      "Workers Launched": 2,
      "Shared Hit Blocks": 29,
      "Shared Read Blocks": 47641,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 91342,
      "Temp Written Blocks": 91479,
      "Plans": [
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Startup Cost": 1012948.52,
          "Total Cost": 1023365.25,
          "Plan Rows": 4166692,
          "Plan Width": 77,
          "Actual Startup Time": 42765.496,
          "Actual Total Time": 48526.168,
          "Actual Rows": 3333337,
          "Actual Loops": 3,
          "Output": ["name", "code", "article"],
          "Sort Key": ["towns.name", "towns.code DESC"],
          "Sort Method": "external merge",
          "Sort Space Used": 283856,
          "Sort Space Type": "Disk",
          "Shared Hit Blocks": 170,
          "Shared Read Blocks": 142762,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 273289,
          "Temp Written Blocks": 273700,
          "Workers": [
            {
              "Worker Number": 0,
              "Actual Startup Time": 42588.662,
              "Actual Total Time": 48456.662,
              "Actual Rows": 3277980,
              "Actual Loops": 1,
              "Shared Hit Blocks": 72,
              "Shared Read Blocks": 46794,
              "Shared Dirtied Blocks": 0,
              "Shared Written Blocks": 0,
              "Local Hit Blocks": 0,
              "Local Read Blocks": 0,
              "Local Dirtied Blocks": 0,
              "Local Written Blocks": 0,
              "Temp Read Blocks": 89067,
              "Temp Written Blocks": 89202
            },
            {
              "Worker Number": 1,
              "Actual Startup Time": 42946.705,
              "Actual Total Time": 48799.414,
              "Actual Rows": 3385130,
              "Actual Loops": 1,
              "Shared Hit Blocks": 69,
              "Shared Read Blocks": 48327,
              "Shared Dirtied Blocks": 0,
              "Shared Written Blocks": 0,
              "Local Hit Blocks": 0,
              "Local Read Blocks": 0,
              "Local Dirtied Blocks": 0,
              "Local Written Blocks": 0,
              "Temp Read Blocks": 92880,
              "Temp Written Blocks": 93019
            }
          ],
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": true,
              "Relation Name": "towns",
              "Schema": "public",
              "Alias": "towns",
              "Startup Cost": 0.00,
              "Total Cost": 184524.92,
              "Plan Rows": 4166692,
              "Plan Width": 77,
              "Actual Startup Time": 0.322,
              "Actual Total Time": 8305.886,
              "Actual Rows": 3333337,
              "Actual Loops": 3,
              "Output": ["name", "code", "article"],
              "Shared Hit Blocks": 96,
              "Shared Read Blocks": 142762,
              "Shared Dirtied Blocks": 0,
              "Shared Written Blocks": 0,
              "Local Hit Blocks": 0,
              "Local Read Blocks": 0,
              "Local Dirtied Blocks": 0,
              "Local Written Blocks": 0,
              "Temp Read Blocks": 0,
              "Temp Written Blocks": 0,
              "Workers": [
                {
                  "Worker Number": 0,
                  "Actual Startup Time": 0.105,
                  "Actual Total Time": 8394.629,
                  "Actual Rows": 3277980,
                  "Actual Loops": 1,
                  "Shared Hit Blocks": 35,
                  "Shared Read Blocks": 46794,
                  "Shared Dirtied Blocks": 0,
                  "Shared Written Blocks": 0,
                  "Local Hit Blocks": 0,
                  "Local Read Blocks": 0,
                  "Local Dirtied Blocks": 0,
                  "Local Written Blocks": 0,
                  "Temp Read Blocks": 0,
                  "Temp Written Blocks": 0
                },
                {
                  "Worker Number": 1,
                  "Actual Startup Time": 0.113,
                  "Actual Total Time": 8139.382,
                  "Actual Rows": 3385130,
                  "Actual Loops": 1,
                  "Shared Hit Blocks": 32,
                  "Shared Read Blocks": 48327,
                  "Shared Dirtied Blocks": 0,
                  "Shared Written Blocks": 0,
                  "Local Hit Blocks": 0,
                  "Local Read Blocks": 0,
                  "Local Dirtied Blocks": 0,
                  "Local Written Blocks": 0,
                  "Temp Read Blocks": 0,
                  "Temp Written Blocks": 0
                }
              ]
            }
          ]
        }
      ]
    },
    "Planning Time": 22.898,
    "Triggers": [
    ],
    "Execution Time": 61133.161
  }
]
```

The Sort operation has a "Shared Read Blocks" value of 142,762, with a
breakdown of:
46,794 in worker 0
48,327 in worker 1
47,641 in the main thread.

Its parent, the Gather Merge operation, has a "Shared Read Blocks" value of
47,641. This clearly doesn't include all the total blocks from its children
(it's less than the number for its only child), but it is equal to the value
for the main thread for the Sort operation. This would make sense if worker
values are being discarded/lost, since I wouldn't expect the gather merge to
be reading any new data.

This pattern is identical for all 10 of the buffers values - the value for
the Gather Merge operation is equal to the value for the main thread in the
Sort child operation.

So it looks like the buffers values for worker threads are discarded for
operations which don't manage the workers - is this intended/desired
behaviour?

Other info about my setup:

    Postgres version: "PostgreSQL 10.4, compiled by Visual C++ build 1800,
64-bit"
    I installed Postgres using the EnterpriseDB one-click installer
    OS: Windows 10, v1803
    I'm using Jetbrains Datagrip to connect to Postgres
    No errors are logged.
    Altered config settings:


    application_name     PostgreSQL JDBC Driver     session
    client_encoding     UTF8     client
    DateStyle     ISO, MDY     client
    default_text_search_config     pg_catalog.english     configuration file
    dynamic_shared_memory_type     windows     configuration file
    extra_float_digits     3     session
    lc_messages     English_United States.1252     configuration file
    lc_monetary     English_United States.1252     configuration file
    lc_numeric     English_United States.1252     configuration file
    lc_time     English_United States.1252     configuration file
    listen_addresses     *     configuration file
    log_destination     stderr     configuration file
    logging_collector     on     configuration file
    max_connections     100     configuration file
    max_stack_depth     2MB     environment variable
    port     5432     configuration file
    shared_buffers     128MB     configuration file
    TimeZone     UTC     client

Thanks in advance

Dave


On 1/31/19 10:33 AM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15614
> Logged by:          Dave Conlin
> Email address:      dc345@cantab.net
> PostgreSQL version: 10.4
> Operating system:   Windows 10
> Description:
> 
> Hi folks -
> 
> In the output of an `EXPLAIN` query, normally child operations are included
> in the totals for the buffers values, eg "Shared Hit Blocks". It seems that
> when a parent operation is single-threaded, but a child operation starts
> workers, then only the buffers stats from the main thread in the child
> operation are included in the totals for the parent operation.
> 


Hello,

This should have been fixed in 10.5 :

https://www.postgresql.org/docs/current/release-10-5.html
 > Fix EXPLAIN's accounting for resource usage, particularly buffer 
accesses, in parallel workers (Amit Kapila, Robert Haas)

Could you apply the minor update and test again?

Regards,



It is, thanks a lot.

Sorry, I should have checked - I thought the minor versions were getting
bugfixes, but I was just confused with the pre v10 way of numbering I think

Thanks

Dave

On 31/01/2019 10:28, Adrien NAYRAT wrote:
> On 1/31/19 10:33 AM, PG Bug reporting form wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      15614
>> Logged by:          Dave Conlin
>> Email address:      dc345@cantab.net
>> PostgreSQL version: 10.4
>> Operating system:   Windows 10
>> Description:
>>
>> Hi folks -
>>
>> In the output of an `EXPLAIN` query, normally child operations are
>> included
>> in the totals for the buffers values, eg "Shared Hit Blocks". It
>> seems that
>> when a parent operation is single-threaded, but a child operation starts
>> workers, then only the buffers stats from the main thread in the child
>> operation are included in the totals for the parent operation.
>>
>
>
> Hello,
>
> This should have been fixed in 10.5 :
>
> https://www.postgresql.org/docs/current/release-10-5.html
> > Fix EXPLAIN's accounting for resource usage, particularly buffer
> accesses, in parallel workers (Amit Kapila, Robert Haas)
>
> Could you apply the minor update and test again?
>
> Regards,
>