Re: old synchronized scan patch - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: old synchronized scan patch |
Date | |
Msg-id | 1165275500.25371.63.camel@dogma.v10.wvs Whole thread Raw |
In response to | Re: old synchronized scan patch ("Luke Lonergan" <llonergan@greenplum.com>) |
Responses |
Re: old synchronized scan patch
Re: old synchronized scan patch |
List | pgsql-hackers |
On Mon, 2006-12-04 at 15:03 -0800, Luke Lonergan wrote: > Jeff, > > Now that 8.3 is open, I was considering a revival of this old patch: > > > > http://archives.postgresql.org/pgsql-hackers/2005-02/msg00832.php > > > > I could probably clean it up with a little help from someone on this > > list. > > > > > > Is there some interest in this patch? > > Yes. > <snip> > Where I think sync scan could have a big benefit is for multi-user business > intelligence workloads where there are a few huge fact tables of interest to > a wide audience. Example: 5 business analysts come to work at 9AM and start > ad-hoc queries expected to run in about 15 minutes each. Each query > sequential scans a 10 billion row fact table once, which takes 10 minutes of > the query runtime. With sync scan the last one completes in 35 minutes. > Without sync scan the last completes in 75 minutes. In this case sync scan > significantly improves the experience of 5 people. > Thank you for your input. > > How would I go about proving whether it's useful enough or not? > > Can you run the above scenario on a table whose size is ten times the memory > on the machine? As a simple starting point, a simple "SELECT COUNT(*) FROM > BIGTABLE" should be sufficient, but the scans need to be separated by enough > time to invalidate the OS I/O cache. > I'll try to run a test like that this week. I will be doing this on my home hardware (bad, consumer-grade stuff), so if I gave you a patch against HEAD could you test it against some more real hardware (and data)? To open up the implementation topic: My current patch starts a new sequential scan on a given relation at the page of an already-running scan. It makes no guarantees that the scans stay together, but in practice I don't think they deviate much. To try to enforce synchronization of scanning I fear would do more harm than good. Thoughts? Also, it's more of a "hint" system that uses a direct mapping of the relations Oid to hold the position of the scan. That means that, in rare cases, the page offset could be wrong, in which case it will degenerate to the current performance characteristics with no cost. The benefit of doing it this way is that it's simple code, with essentially no performance penalty or additional locking. Also, I can use a fixed amount of shared memory (1 page is about right). Regards,Jeff Davis
pgsql-hackers by date: