Home > mailing lists

Re: old synchronized scan patch - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: old synchronized scan patch
Date	December 4, 2006 19:38:42
Msg-id	1165275500.25371.63.camel@dogma.v10.wvs Whole thread
In response to	Re: old synchronized scan patch ("Luke Lonergan" <llonergan@greenplum.com>)
Responses	Re: old synchronized scan patch Re: old synchronized scan patch
List	pgsql-hackers

Tree view

On Mon, 2006-12-04 at 15:03 -0800, Luke Lonergan wrote:
> Jeff,
> > Now that 8.3 is open, I was considering a revival of this old patch:
> > 
> > http://archives.postgresql.org/pgsql-hackers/2005-02/msg00832.php
> > 
> > I could probably clean it up with a little help from someone on this
> > list.
> > 
> > 
> > Is there some interest in this patch?
> 
> Yes.
> 

<snip>

> Where I think sync scan could have a big benefit is for multi-user business
> intelligence workloads where there are a few huge fact tables of interest to
> a wide audience.  Example: 5 business analysts come to work at 9AM and start
> ad-hoc queries expected to run in about 15 minutes each.  Each query
> sequential scans a 10 billion row fact table once, which takes 10 minutes of
> the query runtime.  With sync scan the last one completes in 35 minutes.
> Without sync scan the last completes in 75 minutes.  In this case sync scan
> significantly improves the experience of 5 people.
> 

Thank you for your input. 

> > How would I go about proving whether it's useful enough or not?
> 
> Can you run the above scenario on a table whose size is ten times the memory
> on the machine?  As a simple starting point, a simple "SELECT COUNT(*) FROM
> BIGTABLE" should be sufficient, but the scans need to be separated by enough
> time to invalidate the OS I/O cache.
> 

I'll try to run a test like that this week. I will be doing this on my
home hardware (bad, consumer-grade stuff), so if I gave you a patch
against HEAD could you test it against some more real hardware (and
data)?

To open up the implementation topic: 

My current patch starts a new sequential scan on a given relation at the
page of an already-running scan. It makes no guarantees that the scans
stay together, but in practice I don't think they deviate much. To try
to enforce synchronization of scanning I fear would do more harm than
good. Thoughts?

Also, it's more of a "hint" system that uses a direct mapping of the
relations Oid to hold the position of the scan. That means that, in rare
cases, the page offset could be wrong, in which case it will degenerate
to the current performance characteristics with no cost. The benefit of
doing it this way is that it's simple code, with essentially no
performance penalty or additional locking. Also, I can use a fixed
amount of shared memory (1 page is about right).

Regards,Jeff Davis

pgsql-hackers by date:

From: "Luke Lonergan"
Date: 04 December 2006, 19:04:14
Subject: Re: old synchronized scan patch

From: Tom Lane
Date: 04 December 2006, 19:45:25
Subject: Re: old synchronized scan patch

Re: old synchronized scan patch - Mailing list pgsql-hackers

Previous

Next