Thread: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4
Hi, First, the numbers: PG Version Load time pg_database_size autovac ---------------------------------------------------------- 8.2.13 179 min 92,807,992,820 on 8.3.7 180 min 84,048,744,044 on (defaults) 8.4b2 206 min 84,028,995,344 on (defaults) 8.4b2 183 min 84,028,839,696 off The bulk of the data is in 16 tables, each having about 55 million rows of the form (int, int, smallint, smallint, int, int, int). Each table has a single partial index on one of the integer columns. The dump file was 14GB compressed. The loads were all done on the same machine, with the DB going on a pair of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. -- todd
On Wed, Jun 17, 2009 at 10:50 AM, Todd A. Cook<tcook@blackducksoftware.com> wrote: > The loads were all done on the same machine, with the DB going on a pair > of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and > 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. What if you double or triple the number of checkpoint segments?
There was an interesting presentation at PG Con from a guy at Sun who did a series of load tests on 8.3 vs 8.4 http://www.pgcon.org/2009/schedule/events/124.en.html There is a link to the video from that page so you can watch it. But he found a strange "corner case" where 8.4 performed way worse. After he did a bit of digging he found a couple of default settings that had changed in 8.4, and when he set them back to their old 8.3 values and re-ran the tests, there was a huge difference in outcome. On Wed, Jun 17, 2009 at 10:50 AM, Todd A. Cook<tcook@blackducksoftware.com> wrote: > Hi, > > First, the numbers: > > PG Version Load time pg_database_size autovac > ---------------------------------------------------------- > 8.2.13 179 min 92,807,992,820 on > 8.3.7 180 min 84,048,744,044 on (defaults) > 8.4b2 206 min 84,028,995,344 on (defaults) > 8.4b2 183 min 84,028,839,696 off > > The bulk of the data is in 16 tables, each having about 55 million rows of > the form (int, int, smallint, smallint, int, int, int). Each table has a > single partial index on one of the integer columns. The dump file was 14GB > compressed. > > The loads were all done on the same machine, with the DB going on a pair > of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and > 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. > > -- todd > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
Vick Khera wrote: > On Wed, Jun 17, 2009 at 10:50 AM, Todd A. > Cook<tcook@blackducksoftware.com> wrote: >> The loads were all done on the same machine, with the DB going on a pair >> of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and >> 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. > > What if you double or triple the number of checkpoint segments? > checkpoint_segments was set to 128 for all tests. -- todd
"Todd A. Cook" <tcook@blackducksoftware.com> writes: > First, the numbers: > PG Version Load time pg_database_size autovac > ---------------------------------------------------------- > 8.2.13 179 min 92,807,992,820 on > 8.3.7 180 min 84,048,744,044 on (defaults) > 8.4b2 206 min 84,028,995,344 on (defaults) > 8.4b2 183 min 84,028,839,696 off > The bulk of the data is in 16 tables, each having about 55 million rows of > the form (int, int, smallint, smallint, int, int, int). Each table has a > single partial index on one of the integer columns. Given that it's multiple tables, it'd be possible for autovacuum to kick in and ANALYZE the data inserted into earlier tables while the later ones were still being loaded. If so, the discrepancy might be explained by 8.4's more-aggressive statistics target, which means that a background ANALYZE will take about 10x more work than before. If you have time to repeat the experiments, it would be interesting to see what happens with consistent default_statistics_target across 8.3 and 8.4. regards, tom lane
Tom Lane wrote: > "Todd A. Cook" <tcook@blackducksoftware.com> writes: >> First, the numbers: > >> PG Version Load time pg_database_size autovac >> ---------------------------------------------------------- >> 8.2.13 179 min 92,807,992,820 on >> 8.3.7 180 min 84,048,744,044 on (defaults) >> 8.4b2 206 min 84,028,995,344 on (defaults) >> 8.4b2 183 min 84,028,839,696 off > >> The bulk of the data is in 16 tables, each having about 55 million rows of >> the form (int, int, smallint, smallint, int, int, int). Each table has a >> single partial index on one of the integer columns. > > Given that it's multiple tables, it'd be possible for autovacuum to > kick in and ANALYZE the data inserted into earlier tables while the > later ones were still being loaded. If so, the discrepancy might be > explained by 8.4's more-aggressive statistics target, which means that > a background ANALYZE will take about 10x more work than before. > > If you have time to repeat the experiments, it would be interesting to > see what happens with consistent default_statistics_target across 8.3 > and 8.4. given that this was likely a single-thread restore and therefor wal logged I wonder if the 206min one might be affected by the issue discussed here http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > Tom Lane wrote: >> If you have time to repeat the experiments, it would be interesting to >> see what happens with consistent default_statistics_target across 8.3 >> and 8.4. > given that this was likely a single-thread restore and therefor wal > logged I wonder if the 206min one might be affected by the issue > discussed here > http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php It might be, but the current theory about that (ie that it's the bulkwrite patch doing it) doesn't explain the interaction with autovacuum on/off. I'd still like to hear whether changing the stats target changes Todd's results. regards, tom lane
Tom Lane wrote: > "Todd A. Cook" <tcook@blackducksoftware.com> writes: >> First, the numbers: > >> PG Version Load time pg_database_size autovac >> ---------------------------------------------------------- >> 8.2.13 179 min 92,807,992,820 on >> 8.3.7 180 min 84,048,744,044 on (defaults) >> 8.4b2 206 min 84,028,995,344 on (defaults) >> 8.4b2 183 min 84,028,839,696 off > >> The bulk of the data is in 16 tables, each having about 55 million rows of >> the form (int, int, smallint, smallint, int, int, int). Each table has a >> single partial index on one of the integer columns. > > Given that it's multiple tables, it'd be possible for autovacuum to > kick in and ANALYZE the data inserted into earlier tables while the > later ones were still being loaded. If so, the discrepancy might be > explained by 8.4's more-aggressive statistics target, which means that > a background ANALYZE will take about 10x more work than before. > > If you have time to repeat the experiments, it would be interesting to > see what happens with consistent default_statistics_target across 8.3 > and 8.4. That would seem to be it: 8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10) I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend. -- todd > > regards, tom lane > . >
Todd A. Cook wrote: > Tom Lane wrote: >> "Todd A. Cook" <tcook@blackducksoftware.com> writes: >>> First, the numbers: >>> PG Version Load time pg_database_size autovac >>> ---------------------------------------------------------- >>> 8.2.13 179 min 92,807,992,820 on >>> 8.3.7 180 min 84,048,744,044 on (defaults) >>> 8.4b2 206 min 84,028,995,344 on (defaults) >>> 8.4b2 183 min 84,028,839,696 off >>> The bulk of the data is in 16 tables, each having about 55 million rows of >>> the form (int, int, smallint, smallint, int, int, int). Each table has a >>> single partial index on one of the integer columns. >> Given that it's multiple tables, it'd be possible for autovacuum to >> kick in and ANALYZE the data inserted into earlier tables while the >> later ones were still being loaded. If so, the discrepancy might be >> explained by 8.4's more-aggressive statistics target, which means that >> a background ANALYZE will take about 10x more work than before. >> >> If you have time to repeat the experiments, it would be interesting to >> see what happens with consistent default_statistics_target across 8.3 >> and 8.4. > > That would seem to be it: > > 8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10) > > I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend. The results for this are also consistent with Tom's theory: 8.3.7 205 min 84,048,866,924 on (defaults, default_statistics_target=100) -- todd
"Todd A. Cook" <tcook@blackducksoftware.com> writes: > Todd A. Cook wrote: >> Tom Lane wrote: >>> If you have time to repeat the experiments, it would be interesting to >>> see what happens with consistent default_statistics_target across 8.3 >>> and 8.4. >> >> That would seem to be it: >> 8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10) >> >> I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend. > The results for this are also consistent with Tom's theory: > 8.3.7 205 min 84,048,866,924 on (defaults, default_statistics_target=100) OK, thanks for following up. So this is a different effect from the COPY ring buffer size issue being argued about over on pgsql-hackers. I think we can just say that this one is a price being paid intentionally for better statistics, and if you don't need better statistics you can back off the target setting ... regards, tom lane