synchronize_seqscans' description is a bit misleading - Mailing list pgsql-hackers

From Gurjeet Singh
Subject synchronize_seqscans' description is a bit misleading
Date
Msg-id CABwTF4VwxS+jjT2RZSzHny5LArW+jFjFn5uiGH8cTRCXETGNag@mail.gmail.com
Whole thread Raw
Responses Re: [DOCS] synchronize_seqscans' description is a bit misleading
List pgsql-hackers
If I'm reading the code right [1], this GUC does not actually *synchronize* the scans, but instead just makes sure that a new scan starts from a block that was reported by some other backend performing a scan on the same relation.

Since the backends scanning the relation may be processing the relation at different speeds, even though each one took the hint when starting the scan, they may end up being out of sync with each other. Even in a single query, there may be different scan nodes scanning different parts of the same relation, and even they don't synchronize with each other (and for good reason).

Imagining that all scans on a table are always synchronized, may make some wrongly believe that adding more backends scanning the same table will not incur any extra I/O; that is, only one stream of blocks will be read from disk no matter how many backends you add to the mix. I noticed this when I was creating partition tables, and each of those was a CREATE TABLE AS SELECT FROM original_table (to avoid WAL generation), and running more than 3 such transactions caused the disk read throughput to behave unpredictably, sometimes even dipping below 1 MB/s for a few seconds at a stretch.

Please note that I am not complaining about the implementation, which I think is the best we can do without making backends wait for each other. It's just that the documentation [2] implies that the scans are synchronized through the entire run, which is clearly not the case. So I'd like the docs to be improved to reflect that.

How about something like:

<doc>
synchronize_seqscans (boolean)
    This allows sequential scans of large tables to start from a point in the table that is already being read by another backend. This increases the probability that concurrent scans read the same block at about the same time and hence share the I/O workload. Note that, due to the difference in speeds of processing the table, the backends may eventually get out of sync, and hence stop sharing the I/O workload.

    When this is enabled, ... The default is on.
</doc>

Best regards,

[1] src/backend/access/heap/heapam.c
[2] http://www.postgresql.org/docs/9.2/static/runtime-config-compatible.html#GUC-SYNCHRONIZE-SEQSCANS

--
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: SIGHUP not received by custom bgworkers if postmaster is notified
Next
From: Tom Lane
Date:
Subject: Re: [DOCS] synchronize_seqscans' description is a bit misleading