Thread: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
"Todd A. Cook"
Date:
Hi,

First, the numbers:

  PG Version    Load time     pg_database_size     autovac
----------------------------------------------------------
   8.2.13        179 min       92,807,992,820        on
   8.3.7         180 min       84,048,744,044        on (defaults)
   8.4b2         206 min       84,028,995,344        on (defaults)
   8.4b2         183 min       84,028,839,696        off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.  The dump file was 14GB
compressed.

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.

-- todd

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Vick Khera
Date:
On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook<tcook@blackducksoftware.com> wrote:
> The loads were all done on the same machine, with the DB going on a pair
> of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
> 8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.

What if you double or triple the number of checkpoint segments?

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Alan McKay
Date:
There was an interesting presentation at PG Con from a guy at Sun who
did a series of load tests on 8.3 vs 8.4

http://www.pgcon.org/2009/schedule/events/124.en.html

There is a link to the video from that page so you can watch it.  But
he found a strange "corner case" where 8.4 performed way worse.  After
he did a bit of digging he found a  couple of default settings that
had changed in 8.4, and when he set them back to their old 8.3 values
and re-ran the tests, there was a huge difference in outcome.



On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook<tcook@blackducksoftware.com> wrote:
> Hi,
>
> First, the numbers:
>
>  PG Version    Load time     pg_database_size     autovac
> ----------------------------------------------------------
>  8.2.13        179 min       92,807,992,820        on
>  8.3.7         180 min       84,048,744,044        on (defaults)
>  8.4b2         206 min       84,028,995,344        on (defaults)
>  8.4b2         183 min       84,028,839,696        off
>
> The bulk of the data is in 16 tables, each having about 55 million rows of
> the form (int, int, smallint, smallint, int, int, int).  Each table has a
> single partial index on one of the integer columns.  The dump file was 14GB
> compressed.
>
> The loads were all done on the same machine, with the DB going on a pair
> of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
> 8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.
>
> -- todd
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
“Don't eat anything you've ever seen advertised on TV”
         - Michael Pollan, author of "In Defense of Food"

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
"Todd A. Cook"
Date:
Vick Khera wrote:
> On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
> Cook<tcook@blackducksoftware.com> wrote:
>> The loads were all done on the same machine, with the DB going on a pair
>> of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
>> 8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.
>
> What if you double or triple the number of checkpoint segments?
>

checkpoint_segments was set to 128 for all tests.

-- todd


Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Tom Lane
Date:
"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> First, the numbers:

>   PG Version    Load time     pg_database_size     autovac
> ----------------------------------------------------------
>    8.2.13        179 min       92,807,992,820        on
>    8.3.7         180 min       84,048,744,044        on (defaults)
>    8.4b2         206 min       84,028,995,344        on (defaults)
>    8.4b2         183 min       84,028,839,696        off

> The bulk of the data is in 16 tables, each having about 55 million rows of
> the form (int, int, smallint, smallint, int, int, int).  Each table has a
> single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded.  If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

            regards, tom lane

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> "Todd A. Cook" <tcook@blackducksoftware.com> writes:
>> First, the numbers:
>
>>   PG Version    Load time     pg_database_size     autovac
>> ----------------------------------------------------------
>>    8.2.13        179 min       92,807,992,820        on
>>    8.3.7         180 min       84,048,744,044        on (defaults)
>>    8.4b2         206 min       84,028,995,344        on (defaults)
>>    8.4b2         183 min       84,028,839,696        off
>
>> The bulk of the data is in 16 tables, each having about 55 million rows of
>> the form (int, int, smallint, smallint, int, int, int).  Each table has a
>> single partial index on one of the integer columns.
>
> Given that it's multiple tables, it'd be possible for autovacuum to
> kick in and ANALYZE the data inserted into earlier tables while the
> later ones were still being loaded.  If so, the discrepancy might be
> explained by 8.4's more-aggressive statistics target, which means that
> a background ANALYZE will take about 10x more work than before.
>
> If you have time to repeat the experiments, it would be interesting to
> see what happens with consistent default_statistics_target across 8.3
> and 8.4.

given that this was likely a single-thread restore and therefor wal
logged I wonder if the 206min one might be affected by the issue
discussed here
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php


Stefan

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> If you have time to repeat the experiments, it would be interesting to
>> see what happens with consistent default_statistics_target across 8.3
>> and 8.4.

> given that this was likely a single-thread restore and therefor wal
> logged I wonder if the 206min one might be affected by the issue
> discussed here
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php

It might be, but the current theory about that (ie that it's the
bulkwrite patch doing it) doesn't explain the interaction with
autovacuum on/off.  I'd still like to hear whether changing the stats
target changes Todd's results.

            regards, tom lane

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
"Todd A. Cook"
Date:
Tom Lane wrote:
> "Todd A. Cook" <tcook@blackducksoftware.com> writes:
>> First, the numbers:
>
>>   PG Version    Load time     pg_database_size     autovac
>> ----------------------------------------------------------
>>    8.2.13        179 min       92,807,992,820        on
>>    8.3.7         180 min       84,048,744,044        on (defaults)
>>    8.4b2         206 min       84,028,995,344        on (defaults)
>>    8.4b2         183 min       84,028,839,696        off
>
>> The bulk of the data is in 16 tables, each having about 55 million rows of
>> the form (int, int, smallint, smallint, int, int, int).  Each table has a
>> single partial index on one of the integer columns.
>
> Given that it's multiple tables, it'd be possible for autovacuum to
> kick in and ANALYZE the data inserted into earlier tables while the
> later ones were still being loaded.  If so, the discrepancy might be
> explained by 8.4's more-aggressive statistics target, which means that
> a background ANALYZE will take about 10x more work than before.
>
> If you have time to repeat the experiments, it would be interesting to
> see what happens with consistent default_statistics_target across 8.3
> and 8.4.

That would seem to be it:

      8.4b2          183 min      84,028,897,040         on (defaults, default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

-- todd


>
>             regards, tom lane
> .
>


Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
"Todd A. Cook"
Date:
Todd A. Cook wrote:
> Tom Lane wrote:
>> "Todd A. Cook" <tcook@blackducksoftware.com> writes:
>>> First, the numbers:
>>>   PG Version    Load time     pg_database_size     autovac
>>> ----------------------------------------------------------
>>>    8.2.13        179 min       92,807,992,820        on
>>>    8.3.7         180 min       84,048,744,044        on (defaults)
>>>    8.4b2         206 min       84,028,995,344        on (defaults)
>>>    8.4b2         183 min       84,028,839,696        off
>>> The bulk of the data is in 16 tables, each having about 55 million rows of
>>> the form (int, int, smallint, smallint, int, int, int).  Each table has a
>>> single partial index on one of the integer columns.
>> Given that it's multiple tables, it'd be possible for autovacuum to
>> kick in and ANALYZE the data inserted into earlier tables while the
>> later ones were still being loaded.  If so, the discrepancy might be
>> explained by 8.4's more-aggressive statistics target, which means that
>> a background ANALYZE will take about 10x more work than before.
>>
>> If you have time to repeat the experiments, it would be interesting to
>> see what happens with consistent default_statistics_target across 8.3
>> and 8.4.
>
> That would seem to be it:
>
>       8.4b2          183 min      84,028,897,040         on (defaults, default_statistics_target=10)
>
> I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

The results for this are also consistent with Tom's theory:

        8.3.7           205 min      84,048,866,924         on (defaults, default_statistics_target=100)

-- todd

Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

From
Tom Lane
Date:
"Todd A. Cook" <tcook@blackducksoftware.com> writes:
> Todd A. Cook wrote:
>> Tom Lane wrote:
>>> If you have time to repeat the experiments, it would be interesting to
>>> see what happens with consistent default_statistics_target across 8.3
>>> and 8.4.
>>
>> That would seem to be it:
>> 8.4b2          183 min      84,028,897,040         on (defaults, default_statistics_target=10)
>>
>> I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

> The results for this are also consistent with Tom's theory:
>         8.3.7           205 min      84,048,866,924         on (defaults, default_statistics_target=100)

OK, thanks for following up.  So this is a different effect from the
COPY ring buffer size issue being argued about over on pgsql-hackers.
I think we can just say that this one is a price being paid
intentionally for better statistics, and if you don't need better
statistics you can back off the target setting ...

            regards, tom lane