Re: old synchronized scan patch - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: old synchronized scan patch
Date
Msg-id 1165451514.3839.458.camel@silverbirch.site
Whole thread Raw
In response to Re: old synchronized scan patch  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2006-12-06 at 15:12 -0500, Tom Lane wrote:

> I think all we need as far as buffer management goes is what I suggested
> before, namely have seqscans on large tables tell bufmgr not to
> increment the usage counter for their accesses.  If it stays zero then
> the buffers will be recycled as soon as the sweep gets back to them,
> which is exactly what we want.  

This is good, yet it addresses only the non-cache spoiling behaviour.

BTW, we would need something special to spot and Append node with
multiple  SeqScans occurring on partitioned tables in sequence.
Individual scans may not be that large but overall the set could be
huge. There's not much point implementing behaviour such as "table must
be bigger than 10x shared_buffers" because it works directly against the
role of partitioning.

> The window for additional backends to
> pick up on the sync scan is then however much of shared_buffers aren't
> occupied by blocks being accessed normally.

Non-cache spoiling means window reduction, so you can't catch it by
chance.

> If we have syncscan members that are spread out over any significant
> range of the table's blocks, then the problem of setting the hint
> properly becomes a lot more pressing.  We'd like incoming joiners to
> start at a fairly low block number, ie not become one of the "pack
> leaders" but one of the "trailers" --- since the higher they start,
> the more blocks they'll need to re-read at the end of their cycles,
> yet those are blocks they could have had "for free" if they'd started
> as a trailer.  I don't see any cheap way to bias the behavior in that
> direction, though.

Well that's the problem. Agree about the pack leaders/trailers.

What's the solution?

You can synchronise every block or every N blocks, but otherwise: how
will you know the optimal point to start the scan? That will require
*some* synchronisation to be optimal.

Most scans don't go at the same rate naturally. Different plans do
different amounts of work between page requests. Allowing them to go
their own individual ways would be very wasteful of I/O resources, so
making some people wait for others is an essential aspect of efficiency,
just like it is with the OS.

Synchronisation costs very little in comparison with the I/O it saves.

Perhaps there are ways of doing this without central control, so that
backend error conditions don't need to be considered. Seems like a
simple timeout would be sufficient to exclude backends from the Conga,
which would be sufficient to handle error cases and special cases.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: "Dawid Kuroczko"
Date:
Subject: Re: Configuring BLCKSZ and XLOGSEGSZ (in 8.3)
Next
From: Bruce Momjian
Date:
Subject: Heading to Mexico