Home > mailing lists

Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: Review: Revise parallel pg_restore's scheduling heuristic
Date	July 18, 2009 20:41:27
Msg-id	4A61ED14020000250002897C@gw.wicourts.gov Whole thread Raw
In response to	Review: Revise parallel pg_restore's scheduling heuristic ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: Review: Revise parallel pg_restore's scheduling heuristic Re: Review: Revise parallel pg_restore's scheduling heuristic Re: Review: Revise parallel pg_restore's scheduling heuristic
List	pgsql-hackers

Tree view

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests.  What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data.  (This patch didn't seem likely to make a significant
difference on smaller databases.)
My current plan is to test this on a web server class machine and a
distributed application class machine.  Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.
It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results.  Any sense what numbers of parallel jobs I should use for
tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12.  My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection. 
(This seems most likely to be what we'd be doing in real life.)  I
would run each test with the CVS trunk tip with and without the patch
applied.  The database is currently 1.1TB.
The application machine would have 2 cores and about 4GB RAM.  I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space. 
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off.  (At one I would use the -1
switch.)
In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.
Does this sound like a sane approach to testing whether this patch
actually improves performance?  Any suggestions before I start this,
to ensure most meaningful results?
-Kevin

pgsql-hackers by date:

From: Robert Haas
Date: 18 July 2009, 20:38:54
Subject: Re: Sampling profiler updated

From: Jaime Casanova
Date: 18 July 2009, 21:21:34
Subject: Re: Using results from INSERT ... RETURNING

Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

Previous

Next