Re: Sync Scan update - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sync Scan update
Date
Msg-id 1166553441.24294.30.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Sync Scan update  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Sync Scan update  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
On Tue, 2006-12-19 at 18:05 +0000, Gregory Stark wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> 
> > Like to see some tests with 2 parallel threads, since that is the most
> > common case. I'd also like to see some tests with varying queries,
> > rather than all use select count(*). My worry is that these tests all
> > progress along their scans at exactly the same rate, so are likely to
> > stay in touch. What happens when we have significantly more CPU work to
> > do on one scan - does it fall behind??
> 
> If it's just CPU then I would expect the cache to help the followers keep up
> pretty easily. What concerns me is queries that involve more I/O. For example
> if the leader is doing a straight sequential scan and the follower is doing a
> nested loop join driven by the sequential scan. Or worse, what happens if the

That would be one painful query: scanning two tables in a nested loop,
neither of which fit into physical memory! ;)

If one table does fit into memory, it's likely to stay there since a
nested loop will keep the pages so hot.

I can't think of a way to test two big tables in a nested loop because
it would take so long. However, it would be worth trying it with an
index, because that would cause random I/O during the scan.

> leader is doing a nested loop and the follower which is just doing a straight
> sequential scan is being held back?
> 

The follower will never be held back in my current implementation.

My current implementation relies on the scans to stay close together
once they start close together. If one falls seriously behind, it will
fall outside of the main "cache trail" and cause the performance to
degrade due to disk seeking and lower cache efficiency.

I think Simon is concerned about CPU because that will be a common case:
if one scan is CPU bound and another is I/O bound, they will progress at
different rates. That's bound to cause seeking and poor cache
efficiency.

Although I don't think either of these cases will be worse than current
behavior, it warrants more testing.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Sync Scan update
Next
From: Bruce Momjian
Date:
Subject: Re: Companies Contributing to Open Source