Re: Synchronized Scan benchmark results - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Synchronized Scan benchmark results
Date
Msg-id 1175707429.4152.64.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Synchronized Scan benchmark results  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: Synchronized Scan benchmark results
List pgsql-hackers
On Wed, 2007-04-04 at 10:40 +0100, Simon Riggs wrote:
> > That makes no sense to me, so it's probably a fluke (by which I mean
> > some other activity on the system, perhaps swapping some large
> > applications). The second two tests are consistent with all the other
> > numbers I got, but the first one took 40 seconds longer than I would
> > expect. I'll do a simple re-test tonight.
> 
> What did you set scan_recycle_buffers to? The default was zero.
> 
> I think v2 of the patch interpreted that setting as meaning attempt to
> reuse the same buffer again immediately, which probably wouldn't be
> optimal. Which is why I issued v3... I think you'll need to set
> scan_recycle_buffers = 0 (==off in v3) and scan_recycle_buffers = 32 to
> get sensible comparison figures.
> 

I used v2 with default in those tests, so I think that means it used the
same buffer.

By the way, on another test I did that results came out at 165s, which
is consistent with the other results. I think the time I ran that the
machine must have been swapping out applications or something... who
knows.

> So please can you use v3 for any further testing. Thanks.

I'll use v3 of the patch as located here:

http://archives.postgresql.org/pgsql-hackers/2007-03/msg00709.php

By the way, it might be easier to find the right one if the archives
contained filenames for the attachments. Am I missing something obvious?

> > > I would like to see some tests with different queries that have varying
> > > I/O and CPU requirements to see if they stay together too. That won't
> > > block the patch, but it will help everybody understand what the range of
> > > real world applicability there is in this. I'd guess this can benefit us
> > > sufficiently frequently in most cases that its worth it.
> > 
> > I'll do some more varied tests. The best idea I've come up with so far
> > is to do something that requires random seeking going concurrently with
> > the scans. 
> 
> No, what I mean is different kinds of scans:
> - a simple scan like count(*)

Will use my same "scan.rb" benchmark.

> - a more complex one that does buckets of cycles per tuple

I'll use a modified "scan.rb" that does a computation in the select list
(I'll call the function volatile so that it recomputes with each tuple).

> - a hash join

This is where I got stuck.

* If it's one big ( > NBuffers/2 ) table and one small table, the small
table will only serve to occupy some shared_buffers (right?)
* If it's two big tables, a join would be a major operation. I don't
think it would even choose a hash join in that situation, right?


To summarize, in the next round of testing, I will
* disable sync_seqscan_offset completely
* use recycle_buffers=0 and 32
* I'll still test against 8.2.3 for consistency in case you suggest
otherwise.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bug in UTF8-Validation Code?
Next
From: Bruce Momjian
Date:
Subject: Re: PL/Python warnings in CVS HEAD