Thread: Review: Revise parallel pg_restore's scheduling heuristic

Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Rebased to correct for pg_indent changes.

Applies cleanly.
Compiles cleanly.
Passes regression tests.
Comments and format look good.
No documentation changes needed.
No regression test changes needed.

Performance tests to follow in a day or two.

-Kevin

Attachment

Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests.  What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data.  (This patch didn't seem likely to make a significant
difference on smaller databases.)
My current plan is to test this on a web server class machine and a
distributed application class machine.  Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.
It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results.  Any sense what numbers of parallel jobs I should use for
tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12.  My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection. 
(This seems most likely to be what we'd be doing in real life.)  I
would run each test with the CVS trunk tip with and without the patch
applied.  The database is currently 1.1TB.
The application machine would have 2 cores and about 4GB RAM.  I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space. 
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off.  (At one I would use the -1
switch.)
In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.
Does this sound like a sane approach to testing whether this patch
actually improves performance?  Any suggestions before I start this,
to ensure most meaningful results?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Sat, Jul 18, 2009 at 4:41 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>
>> Performance tests to follow in a day or two.
>
> I'm looking to beg another week or so on this to run more tests.  What
> I can have by the end of today is pretty limited, mostly because I
> decided it made the most sense to test this with big complex
> databases, and it just takes a fair amount of time to throw around
> that much data.  (This patch didn't seem likely to make a significant
> difference on smaller databases.)

No worries.  We have a limited number of people who can test these
kinds of things and who have volunteered to serve as reviewers (two
that I'm aware of, and one of those I haven't heard from lately...).
So we'll be patient.  :-)

> My current plan is to test this on a web server class machine and a
> distributed application class machine.  Both database types have over
> 300 tables with tables with widely ranging row counts, widths, and
> index counts.
>
> It would be hard to schedule the requisite time on our biggest web
> machines, but I assume an 8 core 64GB machine would give meaningful
> results.  Any sense what numbers of parallel jobs I should use for
> tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
> 16 -- maybe keep going if 16 beats 12.  My plan here would be to have
> the dump on one machine, and run pg_restore there, and push it to a
> database on another machine through the LAN on a 1Gb connection.
> (This seems most likely to be what we'd be doing in real life.)  I
> would run each test with the CVS trunk tip with and without the patch
> applied.  The database is currently 1.1TB.
>
> The application machine would have 2 cores and about 4GB RAM.  I'm
> tempted to use Milwaukee County's database there, as it has the most
> rows per table, even though some of the counties doing a lot of
> document scanning now have bigger databases in terms of disk space.
> It's 89GB. I'd probably try job counts starting at one and going up by
> one until performance starts to drop off.  (At one I would use the -1
> switch.)
>
> In all cases I was planning on using a "conversion" postgresql.conf
> file, turning off fsync, archiving, statistics, etc.
>
> Does this sound like a sane approach to testing whether this patch
> actually improves performance?  Any suggestions before I start this,
> to ensure most meaningful results?

This all sounds reasonable to me, although it might be worth testing
with default settings too.

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Stefan Kaltenbrunner
Date:
Kevin Grittner wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
>  
>> Performance tests to follow in a day or two.
>  
> I'm looking to beg another week or so on this to run more tests.  What
> I can have by the end of today is pretty limited, mostly because I
> decided it made the most sense to test this with big complex
> databases, and it just takes a fair amount of time to throw around
> that much data.  (This patch didn't seem likely to make a significant
> difference on smaller databases.)
>  
> My current plan is to test this on a web server class machine and a
> distributed application class machine.  Both database types have over
> 300 tables with tables with widely ranging row counts, widths, and
> index counts.
>  
> It would be hard to schedule the requisite time on our biggest web
> machines, but I assume an 8 core 64GB machine would give meaningful
> results.  Any sense what numbers of parallel jobs I should use for
> tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
> 16 -- maybe keep going if 16 beats 12.  My plan here would be to have
> the dump on one machine, and run pg_restore there, and push it to a
> database on another machine through the LAN on a 1Gb connection. 
> (This seems most likely to be what we'd be doing in real life.)  I
> would run each test with the CVS trunk tip with and without the patch
> applied.  The database is currently 1.1TB.


you need to be careful here - in my latest round of benchmarking I had 
actually test with the workload generator on the same box because on 
fast boxes we can easily achive >100MB/s total load rate these days.
At these load rates you are very close or over the pratical limits of 
GigE...


Stefan


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Andrew Dunstan
Date:

Robert Haas wrote:
> On Sat, Jul 18, 2009 at 4:41 PM, Kevin
> Grittner<Kevin.Grittner@wicourts.gov> wrote:
>   
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>>
>>     
>>> Performance tests to follow in a day or two.
>>>       
>> I'm looking to beg another week or so on this to run more tests.  What
>> I can have by the end of today is pretty limited, mostly because I
>> decided it made the most sense to test this with big complex
>> databases, and it just takes a fair amount of time to throw around
>> that much data.  (This patch didn't seem likely to make a significant
>> difference on smaller databases.)
>>     
>
> No worries.  We have a limited number of people who can test these
> kinds of things and who have volunteered to serve as reviewers (two
> that I'm aware of, and one of those I haven't heard from lately...).
> So we'll be patient.  :-)
>
>   
>> My current plan is to test this on a web server class machine and a
>> distributed application class machine.  Both database types have over
>> 300 tables with tables with widely ranging row counts, widths, and
>> index counts.
>>
>> It would be hard to schedule the requisite time on our biggest web
>> machines, but I assume an 8 core 64GB machine would give meaningful
>> results.  Any sense what numbers of parallel jobs I should use for
>> tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
>> 16 -- maybe keep going if 16 beats 12.  My plan here would be to have
>> the dump on one machine, and run pg_restore there, and push it to a
>> database on another machine through the LAN on a 1Gb connection.
>> (This seems most likely to be what we'd be doing in real life.)  I
>> would run each test with the CVS trunk tip with and without the patch
>> applied.  The database is currently 1.1TB.
>>
>> The application machine would have 2 cores and about 4GB RAM.  I'm
>> tempted to use Milwaukee County's database there, as it has the most
>> rows per table, even though some of the counties doing a lot of
>> document scanning now have bigger databases in terms of disk space.
>> It's 89GB. I'd probably try job counts starting at one and going up by
>> one until performance starts to drop off.  (At one I would use the -1
>> switch.)
>>
>> In all cases I was planning on using a "conversion" postgresql.conf
>> file, turning off fsync, archiving, statistics, etc.
>>
>> Does this sound like a sane approach to testing whether this patch
>> actually improves performance?  Any suggestions before I start this,
>> to ensure most meaningful results?
>>     
>
> This all sounds reasonable to me, although it might be worth testing
> with default settings too.
>
>
>
>   

To performance test this properly you might need to devise a test that 
will use a sufficiently different order of queueing items to show the 
difference.

One thing I am particularly interested in is to see if queuing FK items 
for a table as soon as they become available, which is most likely to be 
when the referred to index is created, rather than possibly doing them 
all together (assuming they are named with the table name as a prefix) 
as TOC order would do, has a better performance or not.

cheers

andrew


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Josh Berkus
Date:
Kevin,

> It would be hard to schedule the requisite time on our biggest web
> machines, but I assume an 8 core 64GB machine would give meaningful
> results.  Any sense what numbers of parallel jobs I should use for
> tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
> 16 -- maybe keep going if 16 beats 12.

Personally, I wouldn't go over the number of cores.  But if you do find 
some gain that way, I'd be very interested to know it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Andrew Dunstan <andrew@dunslane.net> wrote:
> To performance test this properly you might need to devise a test
> that will use a sufficiently different order of queueing items to
> show the difference.
> 
> One thing I am particularly interested in is to see if queuing FK
> items for a table as soon as they become available, which is most
> likely to be when the referred to index is created, rather than
> possibly doing them all together (assuming they are named with the
> table name as a prefix) as TOC order would do, has a better
> performance or not.
Hmmm....  I'm reevaluating my database choice.  The 1.1TB database
does not have foreign key constraints as a matter of policy.  It is a
replica of the county databases, and we want to replicate whatever we
can of the county data -- failure for some reason of part of the data
to replicate should not block replication of something else, to
minimize differences.  Is there still value in using such a database
at all, or should I focus on databases in the 50GB to 90GB range with
FK constraints defined?
When you suggest devising a test to show a difference, in what way
would it be likely that I would need to modify the real-life database
to get such a test?  Our FKs do start with "<TableName>_".  We don't
have underscores in our table names, although we use similar naming
for our indexes.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: 
> it might be worth testing with default settings too.
OK.  I'll do that too, if time allows.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
>> My plan here would be to have
>> the dump on one machine, and run pg_restore there, and push it to a
>> database on another machine through the LAN on a 1Gb connection.
>> (This seems most likely to be what we'd be doing in real life.)
> you need to be careful here - in my latest round of benchmarking I
> had actually test with the workload generator on the same box
> because on fast boxes we can easily achive >100MB/s total load rate
> these days.  At these load rates you are very close or over the
> pratical limits of GigE...
Yeah, I was concerned about that.  Problem was, with the 1.1TB
database it is hard to fit a dump and an extra copy of the database
onto the drives along with the production data for which they exist. 
We would likely face the same constraints with real data if using the
parallel restore, since it requires an interim backup file (i.e., you
can't stream directly from the source database).  There's also the
issue of reading from the same RAID you're targeting with the restore,
which is sometimes not optimal.
If I'm dropping down an order of magnitude or more in the databases I
will use, I could put the backup file on a separate RAID on the same
machine.  This leaves a lot of options.  I'm not sure which
combinations of configuration, file placement, and job count yield the
most useful results.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Andrew Dunstan <andrew@dunslane.net> wrote:
> To performance test this properly you might need to devise a test
> that will use a sufficiently different order of queueing items to
> show the difference.
It would appear that I need help with devising a proper test.  So far,
all tests have shown no difference in performance based on the patch;
I get almost twice the speed as a single job running in one database
transaction either way.  Can someone explain what I should try to set
up to get a "best case" and a "worst case" for the patch?  Our
production databases don't expose any difference, but I'm willing to
try to use them to "seed" an artificial case which will.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Andrew Dunstan
Date:

Kevin Grittner wrote:
> Andrew Dunstan <andrew@dunslane.net> wrote:
>  
>   
>> To performance test this properly you might need to devise a test
>> that will use a sufficiently different order of queueing items to
>> show the difference.
>>     
>  
> It would appear that I need help with devising a proper test.  So far,
> all tests have shown no difference in performance based on the patch;
> I get almost twice the speed as a single job running in one database
> transaction either way.  Can someone explain what I should try to set
> up to get a "best case" and a "worst case" for the patch?  Our
> production databases don't expose any difference, but I'm willing to
> try to use them to "seed" an artificial case which will.
>  
>   

Does your test case have lots of foreign keys?

cheers

andrew


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Andrew Dunstan <andrew@dunslane.net> wrote:
> Does your test case have lots of foreign keys?
488 of them.
There is some variation on individual tests, but the results look to
be "in the noise."  When I add them all up, the patch comes out
0.0036% slower -- but that is so far into the noise as to be
considered "no difference" in my book.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
I wrote: 
> So far, all tests have shown no difference in performance based on
> the patch;
My testing to that point had been on a "big" machine with 16 CPUs and
128 GB RAM and dozens of spindles.  Last night I tried with a dual
core machine with 4 GB RAM and 5 spindles in RAID 5.  Still no
difference with the patch.
Any suggestions besides the foreign keys?  Should 488 FKs be enough to
matter here?  (Barring better suggestions, I'll try the small machine
again tonight with the default configuration, rather than the
optimized one.)
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Tue, Jul 28, 2009 at 10:28 AM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
> I wrote:
>
>> So far, all tests have shown no difference in performance based on
>> the patch;
>
> My testing to that point had been on a "big" machine with 16 CPUs and
> 128 GB RAM and dozens of spindles.  Last night I tried with a dual
> core machine with 4 GB RAM and 5 spindles in RAID 5.  Still no
> difference with the patch.
>
> Any suggestions besides the foreign keys?  Should 488 FKs be enough to
> matter here?  (Barring better suggestions, I'll try the small machine
> again tonight with the default configuration, rather than the
> optimized one.)

The other possibility here is that this just doesn't work.  :-)

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The other possibility here is that this just doesn't work.  :-)

That's why we wanted to test it ;-).

I don't have time to look right now, but ISTM the original discussion
that led to making that patch had ideas about scenarios where it would
be faster.  It'd be worth digging that up and seeing if the current
tests covered the case or not.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The other possibility here is that this just doesn't work.  :-)
>
> That's why we wanted to test it ;-).
>
> I don't have time to look right now, but ISTM the original discussion
> that led to making that patch had ideas about scenarios where it would
> be faster.  It'd be worth digging that up and seeing if the current
> tests covered the case or not.

This is what I've been able to find on a quick look:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php

Sounds like Kevin may want to try renaming some of his indices to
produce intermingling...

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I don't have time to look right now, but ISTM the original discussion
>> that led to making that patch had ideas about scenarios where it would
>> be faster.

> This is what I've been able to find on a quick look:
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php

> Sounds like Kevin may want to try renaming some of his indices to
> produce intermingling...

Also, the followup to that message points out that the 8.4.0 code has a
potential O(N^2) dependency on the total number of TOC items in the
dump.  So it might be interesting to check the behavior with very large
numbers of tables/indexes.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: 
> This is what I've been able to find on a quick look:
> 
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php
> 
> Sounds like Kevin may want to try renaming some of his indices to
> produce intermingling...
Thanks, I'll give that a try.  Renaming them is one thing, getting a
new dump is another, though.  I probably won't be able to test that
theory until tomorrow night.
Last night's test yielded a couple interesting results.
For one thing, while the "optimized" postgresql.conf was 2.5% faster
than the default file for a single job in one database transaction, it
was 10% *slower* than the default for multi-job restores.  I'll check
on that more later, to see what might be helping and what is hurting.
For another thing, with the default settings, the patched version ran
an additional 1% faster than the unpatched; although I don't have
enough samples to have a high degree of confidence it wasn't noise. 
I'll run another slew of tests tonight with the existing dump file to
confirm to debunk that result, while I create a new dump file to test
with name intermingling on later nights.
For the record, the "default" postgresql.conf:
port = 5678
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
escape_string_warning = off
standard_conforming_strings = on
The "optimized" file adds these:
max_connections = 100
shared_buffers = 256MB
work_mem = 50MB
maintenance_work_mem = 500MB
bgwriter_lru_maxpages = 600
bgwriter_lru_multiplier = 10.0
fsync = off
full_page_writes = off
wal_buffers = 4MB
random_page_cost = 2.0
effective_cache_size = 3GB
logging_collector = on
log_line_prefix = '[%m] %p %q<%u %d %r> '
track_counts = off
autovacuum = off
sql_inheritance = off
I'm sure that there is a wealth of opinion on which of these are
slowing things down, but I'm going to withhold any guesses in favor of
testing them.  (They all proved themselves neutral or beneficial in
objective testing for the single-job restores under 8.3.)
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> Also, the followup to that message points out that the 8.4.0 code
> has a potential O(N^2) dependency on the total number of TOC items
> in the dump.  So it might be interesting to check the behavior with
> very large numbers of tables/indexes.
I've got 431 user tables with 578 indexes.  How high should I push
this?  Can I just create a bunch of randomly named empty tables with
primary keys to provoke this effect?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> Also, the followup to that message points out that the 8.4.0 code
>> has a potential O(N^2) dependency on the total number of TOC items
>> in the dump.  So it might be interesting to check the behavior with
>> very large numbers of tables/indexes.
> I've got 431 user tables with 578 indexes.  How high should I push
> this?  Can I just create a bunch of randomly named empty tables with
> primary keys to provoke this effect?

Yeah, just add a bunch of empty tables.  Ten thousand or so, perhaps.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> with the default settings, the patched version ran an additional 1%
> faster than the unpatched; although I don't have enough samples to
> have a high degree of confidence it wasn't noise.  I'll run another
> slew of tests tonight with the existing dump file to confirm to
> debunk that result
The timings vary by up to 2.5% between runs, so that's the noise
level.  Five runs of each (alternating between the two) last night
give an average performance of 1.89% faster for the patched version. 
Combining that with yesterday's results starts to give me pretty good
confidence that the patch is beneficial for this database with this
configuration.  I haven't found any database or configuration where it
hurts.  (For most tests, adding up the results gave a net difference
measured in thousandths of a percent.)
Is that good enough, or is it still worth the effort of constructing
the artificial case where it might *really* shine?  Or should I keep
running with the "real" database a few more nights to get a big enough
sample to further increase the confidence level with this test?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The timings vary by up to 2.5% between runs, so that's the noise
> level.  Five runs of each (alternating between the two) last night
> give an average performance of 1.89% faster for the patched version. 
> Combining that with yesterday's results starts to give me pretty good
> confidence that the patch is beneficial for this database with this
> configuration.  I haven't found any database or configuration where it
> hurts.  (For most tests, adding up the results gave a net difference
> measured in thousandths of a percent.)
> Is that good enough, or is it still worth the effort of constructing
> the artificial case where it might *really* shine?  Or should I keep
> running with the "real" database a few more nights to get a big enough
> sample to further increase the confidence level with this test?

I think we've pretty much established that it doesn't make things
*worse*, so I'm sort of inclined to go ahead and apply it.  The
theoretical advantage of eliminating O(N^2) search behavior seems
like reason enough, even if it takes a ridiculous number of tables
for that to become significant.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Thu, Jul 30, 2009 at 1:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> The timings vary by up to 2.5% between runs, so that's the noise
>> level.  Five runs of each (alternating between the two) last night
>> give an average performance of 1.89% faster for the patched version.
>> Combining that with yesterday's results starts to give me pretty good
>> confidence that the patch is beneficial for this database with this
>> configuration.  I haven't found any database or configuration where it
>> hurts.  (For most tests, adding up the results gave a net difference
>> measured in thousandths of a percent.)
>
>> Is that good enough, or is it still worth the effort of constructing
>> the artificial case where it might *really* shine?  Or should I keep
>> running with the "real" database a few more nights to get a big enough
>> sample to further increase the confidence level with this test?
>
> I think we've pretty much established that it doesn't make things
> *worse*, so I'm sort of inclined to go ahead and apply it.  The
> theoretical advantage of eliminating O(N^2) search behavior seems
> like reason enough, even if it takes a ridiculous number of tables
> for that to become significant.

That makes sense to me, but OTOH if Kevin's willing to be more testing
on some artificial cases, particularly the interleaved-index-names
case, I think those results would be interesting too.  We already know
that the slowness of dump + restore is a big issue, so any data we can
gather to understand it better (and perhaps eventually design further
improvements) seems like it would be time well spent.

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> I think we've pretty much established that it doesn't make things
> *worse*, so I'm sort of inclined to go ahead and apply it.  The
> theoretical advantage of eliminating O(N^2) search behavior seems
> like reason enough, even if it takes a ridiculous number of tables
> for that to become significant.
Agreed, although I'm having some concerns about whether this should
proceed based exclusively on my benchmarks.  On a thread on the
performance list, people are talking about restores which go several
times faster with parallel restore (compared to a single job).  On my
hardware, I haven't even gotten it to run twice as fast.  This means
that parallel restore is not a good fit for servers like we have, at
least with databases like we have, which means it's probably a poor
environment to get benchmarks for this patch.  :-(
Can we get someone who has benchmarks showing parallel restore to be
eight times the speed of a single job to benchmark with this patch,
just for confirmation?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Rebased to correct for pg_indent changes.

Thanks for doing that.  Attached is a further small improvement that
gets rid of the find_ready_items() scans.  After re-reading the patch
I realized that it wasn't *really* avoiding O(N^2) behavior ... but
this version does.

            regards, tom lane


Attachment

Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Rebased to correct for pg_indent changes.
> 
> Thanks for doing that.
No problem.  I think I still owe you a few.  :-)
> Attached is a further small improvement that gets rid of the
> find_ready_items() scans.  After re-reading the patch I realized
> that it wasn't *really* avoiding O(N^2) behavior ... but this
> version does.
I'll run a fresh set of benchmarks.
By the way, I muffed the setup of last night's benchmarks, so no new
information there, except that in the process of reviewing the attempt
I discovered I was guilty of making a false assertion yesterday, based
on remembering incorrectly.  The logs show that the six hour dump to
custom format was over the LAN.  :-(  I hope I didn't waste too much
of people's time by saying otherwise.  I'll try to get some numbers on
the same-box dump soon.  (Note to self: never, ever trust your memory;
always confirm.)
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
daveg
Date:
On Thu, Jul 30, 2009 at 12:29:34PM -0500, Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>  
> > I think we've pretty much established that it doesn't make things
> > *worse*, so I'm sort of inclined to go ahead and apply it.  The
> > theoretical advantage of eliminating O(N^2) search behavior seems
> > like reason enough, even if it takes a ridiculous number of tables
> > for that to become significant.
>  
> Agreed, although I'm having some concerns about whether this should
> proceed based exclusively on my benchmarks.  On a thread on the
> performance list, people are talking about restores which go several
> times faster with parallel restore (compared to a single job).  On my
> hardware, I haven't even gotten it to run twice as fast.  This means
> that parallel restore is not a good fit for servers like we have, at
> least with databases like we have, which means it's probably a poor
> environment to get benchmarks for this patch.  :-(
>  
> Can we get someone who has benchmarks showing parallel restore to be
> eight times the speed of a single job to benchmark with this patch,
> just for confirmation?

I have a couple "spare" 32GB 4 core and 64GB 8 core servers with 15 scsi/sas
drives and dumps of production dbs in the 100GB to 500 GB range. These have
several hundred tables most with an index or few and an fkey or too many.

It will take a couple days to run a variety of tests I suppose, and I will
be away starting mid next week, but maybe I could get some done before I go.

Will the patch apply to a vanilla 8.4.0?


-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
daveg <daveg@sonic.net> writes:
> Will the patch apply to a vanilla 8.4.0?

Yeah, it should.  The line numbers in the version I just posted might
be off a little bit for 8.4.0, but patch should cope.

Be sure to "make clean" and recompile all of src/bin/pg_dump, else you
might have some issues.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
I wrote: 
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> Attached is a further small improvement that gets rid of the
>> find_ready_items() scans.  After re-reading the patch I realized
>> that it wasn't *really* avoiding O(N^2) behavior ... but this
>> version does.
>  
> I'll run a fresh set of benchmarks.
Over the weekend I ran 40 restores of Milwaukee County's production
data using Friday's snapshot with and without the patch.  I alternated
between patched and unpatched.  It appears that this latest version is
slightly slower for our production database on the same machine and
configuration where the previous patch appeared to be 1% to 2% faster
than unpatched (although I had fewer samples of that).
Although the actual runs were interleaved, I've separated them for
this email, because it seems easier to look them over this way:
8.5devel pg_restore -j2

real    77m13.737s
real    78m21.701s
real    78m21.536s
real    77m37.541s
real    79m10.068s
real    81m37.111s
real    77m52.252s
real    80m49.110s
real    76m50.093s
real    78m0.701s
real    77m30.674s
real    77m22.875s
real    80m43.914s
real    78m51.525s
real    80m46.329s
real    76m56.703s
real    78m44.960s
real    82m37.757s
real    84m12.938s
real    82m27.591s

8.5devel-alt pg_restore -j2

real    78m10.846s
real    78m5.584s
real    78m13.673s
real    79m43.939s
real    84m39.593s
real    80m25.240s
real    78m10.045s
real    82m34.320s
real    78m43.528s
real    77m12.211s
real    77m39.171s
real    79m58.421s
real    84m50.816s
real    85m56.248s
real    79m17.361s
real    79m0.778s
real    77m3.525s
real    77m22.750s
real    78m22.882s
real    78m51.617s

That's about 0.52% slower with the patch.  Because there was over 10%
variation in the numbers with the patch, I tried leaving out the four
highest outliers on both, in case it was the result of some other
activity on the system (even though this machine should have been
pretty quiet over the weekend) and the difference fell to 0.09%.
I don't know if this difference is enough to worry about; it might
depend on whether we're comparing to the unpatched version or the
previous patch.  If it comes to choosing between a 1% to 2%
performance gain for a "normal" case versus elimination of O(N^2)
behavior for a worst-case scenario, I'm not sure which should win --
especially in the absence of benchmarks showing the pessimal case. 
What would be the most productive focus for benchmarks at this point? 
The artificial pessimal case?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Andrew Dunstan
Date:
> That's about 0.52% slower with the patch.  Because there was over 10%
> variation in the numbers with the patch, I tried leaving out the four
> highest outliers on both, in case it was the result of some other
> activity on the system (even though this machine should have been
> pretty quiet over the weekend) and the difference fell to 0.09%.
>  
> I don't know if this difference is enough to worry about; it might
> depend on whether we're comparing to the unpatched version or the
> previous patch.  If it comes to choosing between a 1% to 2%
> performance gain for a "normal" case versus elimination of O(N^2)
> behavior for a worst-case scenario, I'm not sure which should win --
> especially in the absence of benchmarks showing the pessimal case. 
> What would be the most productive focus for benchmarks at this point? 
> The artificial pessimal case?
>  
>
>   

My instinct says that the variation is probably just noise, of no great 
significance. I'm personally not terribly worried about the O(n^2) case, 
but I think the patch is probably useful anyway, as a basis for other 
people to try to improve the item selection algorithm further.

cheers

andrew



Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Over the weekend I ran 40 restores of Milwaukee County's production
> data using Friday's snapshot with and without the patch.  I alternated
> between patched and unpatched.  It appears that this latest version is
> slightly slower for our production database on the same machine and
> configuration where the previous patch appeared to be 1% to 2% faster
> than unpatched (although I had fewer samples of that).

I think we can conclude that for this particular test case, the effects
of the patch are pretty much masked by noise.  I definitely see no way
that the latest version of the patch could really be slower than the
original; it has the same job-scheduling behavior and strictly less
list-munging overhead.  Now the patch could be slower than unpatched
as a result of different job-scheduling behavior ... but there's no
evidence here of a consistently measurable benefit or loss from that.

IIRC daveg was volunteering to do some tests with his own data; maybe
we should wait for those results.
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Josh Berkus
Date:
> IIRC daveg was volunteering to do some tests with his own data; maybe
> we should wait for those results.

Unfortunately, I've lost access to the client's data which was showing 
bad behaviour under the first heuristic.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
daveg
Date:
On Mon, Aug 03, 2009 at 11:21:43AM -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Over the weekend I ran 40 restores of Milwaukee County's production
> > data using Friday's snapshot with and without the patch.  I alternated
> > between patched and unpatched.  It appears that this latest version is
> > slightly slower for our production database on the same machine and
> > configuration where the previous patch appeared to be 1% to 2% faster
> > than unpatched (although I had fewer samples of that).
> 
> I think we can conclude that for this particular test case, the effects
> of the patch are pretty much masked by noise.  I definitely see no way
> that the latest version of the patch could really be slower than the
> original; it has the same job-scheduling behavior and strictly less
> list-munging overhead.  Now the patch could be slower than unpatched
> as a result of different job-scheduling behavior ... but there's no
> evidence here of a consistently measurable benefit or loss from that.
> 
> IIRC daveg was volunteering to do some tests with his own data; maybe
> we should wait for those results.

I have run extensive tests with three trials of each configuration on two
hosts with a variety of db sizes from 3GB to 142GB. These just finished,
and I will send a more detailed summary later, but at the moment I don't
see any significant difference between the patched and vanilla pg_restore.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Mon, Aug 03, 2009 at 10:03:47AM -0500, Kevin Grittner wrote:
> That's about 0.52% slower with the patch.  Because there was over 10%
> variation in the numbers with the patch, I tried leaving out the four
> highest outliers on both, in case it was the result of some other
> activity on the system (even though this machine should have been
> pretty quiet over the weekend) and the difference fell to 0.09%.

What do people do when testing this?  I think I'd look to something like
Student's t-test to check for statistical significance.  My working
would go something like:
 I assume the variance is the same because it's being tested on the same machine.
 samples = 20 stddev  = 144.26 avg1    = 4783.13 avg2    = 4758.46 t       = 0.54  ((avg1 - avg2) / (stddev *
sqrt(2/samples)))

We then have to choose how certain we want to be that they're actually
different, 90% is a reasonably easy level to hit (i.e. one part in ten,
with 95% being more commonly quoted).  For 20 samples we have 19 degrees
of freedom--giving us a cut-off[1] of 1.328.  0.54 is obviously well
below this allowing us to say that there's no "statistical significance"
between the two samples at a 90% level.

--  Sam  http://samason.me.uk/
[1] http://en.wikipedia.org/wiki/Student's_t-distribution#Table_of_selected_values


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
Sam Mason <sam@samason.me.uk> writes:
>   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))

> We then have to choose how certain we want to be that they're actually
> different, 90% is a reasonably easy level to hit (i.e. one part in ten,
> with 95% being more commonly quoted).  For 20 samples we have 19 degrees
> of freedom--giving us a cut-off[1] of 1.328.  0.54 is obviously well
> below this allowing us to say that there's no "statistical significance"
> between the two samples at a 90% level.

Hmm, so what about 95% or 99% confidence?
        regards, tom lane


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Tue, Aug 04, 2009 at 10:45:52AM -0400, Tom Lane wrote:
> Sam Mason <sam@samason.me.uk> writes:
> >   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> 
> > We then have to choose how certain we want to be that they're actually
> > different, 90% is a reasonably easy level to hit (i.e. one part in ten,
> > with 95% being more commonly quoted).  For 20 samples we have 19 degrees
> > of freedom--giving us a cut-off[1] of 1.328.  0.54 is obviously well
> > below this allowing us to say that there's no "statistical significance"
> > between the two samples at a 90% level.
> 
> Hmm, so what about 95% or 99% confidence?

The cut-off goes up to 1.729 for 95% and to 2.539 for 99%.  These values
are only really for a 20 samples with the above calculation, the link I
gave above gives a nice table for different values.

I've also realized that I did the standard deviation all wrong.  I
should have calculated them independently and then got the mean:
 stddev1 = 159.9699 stddev2 = 129.6466 stddev  = 144.8083  ((stddev1+stddev2) / 2)

Here it makes absolutely no difference, but when they were really
different distributions it would.

--  Sam  http://samason.me.uk/


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Sam Mason <sam@samason.me.uk> wrote: 
> What do people do when testing this?  I think I'd look to something
> like Student's t-test to check for statistical significance.  My
> working would go something like:
> 
>   I assume the variance is the same because it's being tested on the
>   same machine.
> 
>   samples = 20
>   stddev  = 144.26
>   avg1    = 4783.13
>   avg2    = 4758.46
>   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> 
> We then have to choose how certain we want to be that they're
> actually different, 90% is a reasonably easy level to hit (i.e. one
> part in ten, with 95% being more commonly quoted).  For 20 samples
> we have 19 degrees of freedom--giving us a cut-off[1] of 1.328. 
> 0.54 is obviously well below this allowing us to say that there's no
> "statistical significance" between the two samples at a 90% level.
Thanks for the link; that looks useful.  To confirm that I understand
what this has established (or get a bit of help putting in in
perspective), what this says to me, in the least technical jargon I
can muster, is "With this many samples and this degree of standard
deviation, the average difference is not large enough to have a 90%
confidence level that the difference is significant."  In fact,
looking at the chart, it isn't enough to reach a 75% confidence level
that the difference is significant.  Significance here would seem to
mean that at least the given percentage of the time, picking this many
samples from an infinite set with an average difference that really
was this big or bigger would generate a value for t this big or
bigger.
Am I close?
I like to be clear, because it's easy to get confused and take the
above to mean that there's a 90% confidence that there is no actual
significant difference in performance based on that sampling.  (Given
Tom's assurance that this version of the patch should have similar
performance to the last, and the samples from the prior patch went the
other direction, I'm convinced there is not a significant difference,
but if I'm going to use the referenced calculations, I want to be
clear how to interpret the results.)
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Fri, Aug 07, 2009 at 10:19:20AM -0500, Kevin Grittner wrote:
> Sam Mason <sam@samason.me.uk> wrote: 
>  
> > What do people do when testing this?  I think I'd look to something
> > like Student's t-test to check for statistical significance.  My
> > working would go something like:
> > 
> >   I assume the variance is the same because it's being tested on the
> >   same machine.
> > 
> >   samples = 20
> >   stddev  = 144.26
> >   avg1    = 4783.13
> >   avg2    = 4758.46
> >   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> > 
> > We then have to choose how certain we want to be that they're
> > actually different, 90% is a reasonably easy level to hit (i.e. one
> > part in ten, with 95% being more commonly quoted).  For 20 samples
> > we have 19 degrees of freedom--giving us a cut-off[1] of 1.328. 
> > 0.54 is obviously well below this allowing us to say that there's no
> > "statistical significance" between the two samples at a 90% level.
>  
> Thanks for the link; that looks useful.  To confirm that I understand
> what this has established (or get a bit of help putting in in
> perspective), what this says to me, in the least technical jargon I
> can muster, is "With this many samples and this degree of standard
> deviation, the average difference is not large enough to have a 90%
> confidence level that the difference is significant."  In fact,
> looking at the chart, it isn't enough to reach a 75% confidence level
> that the difference is significant.  Significance here would seem to
> mean that at least the given percentage of the time, picking this many
> samples from an infinite set with an average difference that really
> was this big or bigger would generate a value for t this big or
> bigger.
>  
> Am I close?

Yes, all that sounds as though you've got it.  Note that running the
test more times will tend to reduce the standard deviation a bit as
well, so it may well become significant.  In this case it's unlikely to
affect it much though.

> I like to be clear, because it's easy to get confused and take the
> above to mean that there's a 90% confidence that there is no actual
> significant difference in performance based on that sampling.  (Given
> Tom's assurance that this version of the patch should have similar
> performance to the last, and the samples from the prior patch went the
> other direction, I'm convinced there is not a significant difference,
> but if I'm going to use the referenced calculations, I want to be
> clear how to interpret the results.)

All we're saying is that we're less than 90% confident that there's
something "significant" going on.  All the fiddling with standard
deviations and sample sizes is just easiest way (that I know of) that
statistics currently gives us of determining this more formally than a
hand-wavy "it looks OK to me".  Science tells us that humans are liable
to say things are OK when they're not, as well as vice versa; statistics
gives us a way to work past these limitations in some common and useful
situations.

--  Sam  http://samason.me.uk/


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Sam Mason <sam@samason.me.uk> wrote: 
> Yes, all that sounds as though you've got it.
Thanks.  I read through it carefully a few times, but I was still only
80% confident that I had it more-or-less right.  ;-)
That does seem like a good test, with the advantage of being
relatively easy to calculate.  Thanks again for suggesting it.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Fri, Aug 07, 2009 at 10:39:19AM -0500, Kevin Grittner wrote:
> Sam Mason <sam@samason.me.uk> wrote: 
> > Yes, all that sounds as though you've got it.
>  
> Thanks.  I read through it carefully a few times, but I was still only
> 80% confident that I had it more-or-less right.  ;-)

And which method did you use to determine that you're 80% confident? :)

--  Sam  http://samason.me.uk/


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Sam Mason <sam@samason.me.uk> wrote: 
> All we're saying is that we're less than 90% confident that there's
> something "significant" going on.  All the fiddling with standard
> deviations and sample sizes is just easiest way (that I know of)
> that statistics currently gives us of determining this more formally
> than a hand-wavy "it looks OK to me".  Science tells us that humans
> are liable to say things are OK when they're not, as well as vice
> versa; statistics gives us a way to work past these limitations in
> some common and useful situations.
Following up, I took the advice offered in the referenced article, and
used a spreadsheet with a TDIST function for more accurate results
than available through the table included in the article.  That allows
what I think is a more meaningful number: the probability that taking
a sample that big would have resulted in a t-statistic larger than was
actually achieved if there was no real difference.
With the 20 samples from that last round of tests, the answer (rounded
to the nearest percent) is 60%, so "probably noise" is a good summary.
Combined with the 12 samples from earlier comparable runs with the
prior version of the patch, it goes to a 90% probability that noise
would generate a difference at least that large, so I think we've
gotten to "almost certainly noise".  :-)
To me, that seems more valuable for this situation than saying "we
haven't reached 90% confidence that it's a real difference."  I used
the same calculations up through the t-statistic.
The one question I have left for this technique is why you went with
((avg1 - avg2) / (stddev * sqrt(2/samples)))
instead of
((avg1 - avg2) / (stddev / sqrt(samples)))
I assume that it's because the baseline was a set of samples rather
than a fixed mark, but I couldn't pick out a specific justification
for this in the literature (although I might have just missed it), so
I'd feel more comfy if you could clarify.
Given the convenience of capturing benchmarking data in a database,
has anyone tackled implementation of something like the spreadsheet
TDIST function within PostgreSQL?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Fri, Aug 7, 2009 at 3:08 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
> Sam Mason <sam@samason.me.uk> wrote:
>
>> All we're saying is that we're less than 90% confident that there's
>> something "significant" going on.  All the fiddling with standard
>> deviations and sample sizes is just easiest way (that I know of)
>> that statistics currently gives us of determining this more formally
>> than a hand-wavy "it looks OK to me".  Science tells us that humans
>> are liable to say things are OK when they're not, as well as vice
>> versa; statistics gives us a way to work past these limitations in
>> some common and useful situations.
>
> Following up, I took the advice offered in the referenced article, and
> used a spreadsheet with a TDIST function for more accurate results
> than available through the table included in the article.  That allows
> what I think is a more meaningful number: the probability that taking
> a sample that big would have resulted in a t-statistic larger than was
> actually achieved if there was no real difference.
>
> With the 20 samples from that last round of tests, the answer (rounded
> to the nearest percent) is 60%, so "probably noise" is a good summary.
> Combined with the 12 samples from earlier comparable runs with the
> prior version of the patch, it goes to a 90% probability that noise
> would generate a difference at least that large, so I think we've
> gotten to "almost certainly noise".  :-)

So should we give up on this patch?

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Fri, Aug 07, 2009 at 02:08:21PM -0500, Kevin Grittner wrote:
> With the 20 samples from that last round of tests, the answer (rounded
> to the nearest percent) is 60%, so "probably noise" is a good summary.
> Combined with the 12 samples from earlier comparable runs with the
> prior version of the patch, it goes to a 90% probability that noise
> would generate a difference at least that large, so I think we've
> gotten to "almost certainly noise".  :-)
>  
> To me, that seems more valuable for this situation than saying "we
> haven't reached 90% confidence that it's a real difference."  I used
> the same calculations up through the t-statistic.

The stats people in our group just tend to say that things are
significant or not at a specific level; never bothered to find out why,
I'll ask someone when I get a chance.

> The one question I have left for this technique is why you went with
>  
> ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> instead of
> ((avg1 - avg2) / (stddev / sqrt(samples)))

I was just doing a literal translation of what was on the Wikipedia
page:
 http://en.wikipedia.org/wiki/Student's_t-test#Independent_two-sample_t-test

If you really want to find out, there should be much better
implementations in the pl/r language already in PG.  I'd trust R much
more than Wikipedia, but for things like this Wikipedia is reasonable.

> I assume that it's because the baseline was a set of samples rather
> than a fixed mark, but I couldn't pick out a specific justification
> for this in the literature (although I might have just missed it), so
> I'd feel more comfy if you could clarify.

Sorry, that's about my limit!  I've never studied stats, I'm a computer
science person who just happens to be around people who use stats on a
day-to-day basis and think it needs more use in the software world.  I
think you're right and you're aggregating the errors from two (assumed
independent) datasets hence you want to keep a bit more of the error in
there.  As to the formal justification (and probably proof) I've no real
idea.

> Given the convenience of capturing benchmarking data in a database,
> has anyone tackled implementation of something like the spreadsheet
> TDIST function within PostgreSQL?

Again, pl/r is what you want!

--  Sam  http://samason.me.uk/


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Sam Mason
Date:
On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote:
> On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> > With the 20 samples from that last round of tests, the answer (rounded
> > to the nearest percent) is 60%, so "probably noise" is a good summary.
> 
> So should we give up on this patch?

That's the joy of stats, it only tells you *very* precisely about the
*exact* thing you've chosen to test.  Interpreting the result is still
awkward, but it does remove one problem!

If you think the tests that've been done cover the use cases that the
new code was been designed to help with and you're not showing any
benefit I'd probably give up and put it down to a learning experience.
Sorry, but I've not been following enough to comment on this much more.

--  Sam  http://samason.me.uk/


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: 
> So should we give up on this patch?
No, this is not news; just confirmation of the earlier gut feelings
and less convincing statistics that there is no problem.  Tom's
argument that if there's no slowdown for common cases, preventing
O(N^2) behavior for extreme cases is compelling for me, and we've
beaten up on it enough for me to feel comfortable that it doesn't
break anything.
I held off on investigating the artificial extreme cases when I
thought we might possibly have a small performance problem here; but
this statistics exercise has just gotten me from "gut feel" that it
was noise and "not having 90% confident that we made things worse" to
"90% of the time noise would produce a bigger difference".  It's one
thing to require 90% confidence that an improvement is real before
accepting something, it's another to accept a change on the basis of
not having 90% confidence that there is degradation -- so I wanted to
see a more compelling statistic.
Personally, I'm happy with it being in "Ready for Committer" status. 
I remember someone else on the thread saying that besides the
elimination of O(N^2) behavior, it provided better structure for
future enhancements.
I'll run a few more benchmarks over the next few weeks to try to
characterize the improvements in extreme cases, "just for the record",
but I don't think we want to wait for that; we've got justification
enough as is.
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Robert Haas
Date:
On Fri, Aug 7, 2009 at 3:36 PM, Sam Mason<sam@samason.me.uk> wrote:
> On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote:
>> On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> > With the 20 samples from that last round of tests, the answer (rounded
>> > to the nearest percent) is 60%, so "probably noise" is a good summary.
>>
>> So should we give up on this patch?
>
> That's the joy of stats, it only tells you *very* precisely about the
> *exact* thing you've chosen to test.  Interpreting the result is still
> awkward, but it does remove one problem!
>
> If you think the tests that've been done cover the use cases that the
> new code was been designed to help with and you're not showing any
> benefit I'd probably give up and put it down to a learning experience.
> Sorry, but I've not been following enough to comment on this much more.

Yeah, I more wanted to here from Tom or Kevin or anyone else who had a
technical thought about this.  I haven't looked at the patch, but
there may be reasons other than performance to commit it - or there
may not.  Tom posted a note on the commitfest suggesting that maybe we
should give up on this, and I don't care one way or the other, except
that in my role as CommitFest manager (or Mom) I'm trying to push the
remaining patches to some sort of conclusion: either committed, or
rejected, or needs more work please resubmit for the next CommitFest.

...Robert


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
"Kevin Grittner"
Date:
I wrote: 
> I remember someone else on the thread saying [...]
> it provided better structure for future enhancements.
Found the reference:
http://archives.postgresql.org/pgsql-hackers/2009-08/msg00078.php
This was the email which I thought confirmed that the changes were
worth it, even in the absence of benchmarks showing performance
improvement.  I guess the counter-argument is that so far this has
been framed as a performance patch, and I've seen posts before which
say that we don't accept those without benchmarks to show an actual
performance improvement.  Accepting it on the basis of evidence and
comments so far would, I suppose, put it more in the category of a
refactoring for cleaner code.
Should we leave it to the code committer to make the final call?
-Kevin


Re: Review: Revise parallel pg_restore's scheduling heuristic

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Robert Haas <robertmhaas@gmail.com> wrote: 
>> So should we give up on this patch?
> No, this is not news; just confirmation of the earlier gut feelings
> and less convincing statistics that there is no problem.  Tom's
> argument that if there's no slowdown for common cases, preventing
> O(N^2) behavior for extreme cases is compelling for me, and we've
> beaten up on it enough for me to feel comfortable that it doesn't
> break anything.

Yeah.  I had hoped to see some evidence of actual improvement for common
cases, but we haven't got that.  What we do have is evidence that it's
not making common cases worse, so avoiding the possible O(N^2) behavior
for extreme cases seems like a reasonable argument for committing it
anyway.

I'll go ahead and commit it just to get it out of the queue ... we
can always revert the commit if new info surfaces.
        regards, tom lane