Thread: Faster pg_resore with autovacuum off?

Faster pg_resore with autovacuum off?

From
Wells Oliver
Date:
Maybe it's an old wives' tale, but I swear I've read a recommendation to turn autovacuum off when doing pg_restore into a new server.. Would it make the restore faster? I think I've restored and then run vacuumdb + analyze, but I wanted to check.

--

Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
I found this link last year when Googling how to speed up pg_restore:
https://www.databasesoup.com/2014/09/settings-for-fast-pgrestore.html

On Fri, Jul 26, 2024 at 8:58 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Maybe it's an old wives' tale, but I swear I've read a recommendation to turn autovacuum off when doing pg_restore into a new server.. Would it make the restore faster? I think I've restored and then run vacuumdb + analyze, but I wanted to check.

--

Re: Faster pg_resore with autovacuum off?

From
Laurenz Albe
Date:
On Fri, 2024-07-26 at 17:57 -0700, Wells Oliver wrote:
> Maybe it's an old wives' tale, but I swear I've read a recommendation to turn
> autovacuum off when doing pg_restore into a new server.. Would it make the
> restore faster? I think I've restored and then run vacuumdb + analyze,
> but I wanted to check.

If your machine is too weak to handle the combined workload of restore +
autovacuum, that might get you something.  Otherwise, you are just cheating:
the restore might be faster, but the database is not usable before autovacuum
and autoanalyze have proessed all restored tables.

I wouldn't recommend turning autovacuum off, because it is a dangerous thing
to do.  If you forget to enable it, your database will be in big trouble.

Yours,
Laurenz Albe



Re: Faster pg_resore with autovacuum off?

From
Wells Oliver
Date:
ha, thanks, my machine is plenty strong, fit and hale, new 16 CPU/64GB RDS instance... I was just curious if there were any performance options to change during a restore for any gains. Seems like good reasons to avoid disabling autovacuum, though.

On Sat, Jul 27, 2024 at 7:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-07-26 at 17:57 -0700, Wells Oliver wrote:
> Maybe it's an old wives' tale, but I swear I've read a recommendation to turn
> autovacuum off when doing pg_restore into a new server.. Would it make the
> restore faster? I think I've restored and then run vacuumdb + analyze,
> but I wanted to check.

If your machine is too weak to handle the combined workload of restore +
autovacuum, that might get you something.  Otherwise, you are just cheating:
the restore might be faster, but the database is not usable before autovacuum
and autoanalyze have proessed all restored tables.

I wouldn't recommend turning autovacuum off, because it is a dangerous thing
to do.  If you forget to enable it, your database will be in big trouble.

Yours,
Laurenz Albe


--

Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
On Sat, Jul 27, 2024 at 10:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-07-26 at 17:57 -0700, Wells Oliver wrote:
> Maybe it's an old wives' tale, but I swear I've read a recommendation to turn
> autovacuum off when doing pg_restore into a new server.. Would it make the
> restore faster? I think I've restored and then run vacuumdb + analyze,
> but I wanted to check.

If your machine is too weak to handle the combined workload of restore +
autovacuum, that might get you something.  Otherwise, you are just cheating:
the restore might be faster, but the database is not usable before autovacuum
and autoanalyze have proessed all restored tables.

I 100% disagree with this: autovacuum is not necessary during pg_restore (presuming there's only one database in the instance).

Heck, it might not even be vital if you carefully monitor the other databases in the instance.

I wouldn't recommend turning autovacuum off, because it is a dangerous thing
to do.  If you forget to enable it, your database will be in big trouble.
 
You can't forget to enable autovacuum when it's in the same script as the pg_restore.

Re: Faster pg_resore with autovacuum off?

From
Scott Ribe
Date:
> On Jul 27, 2024, at 1:48 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
>
> You can't forget to enable autovacuum when it's in the same script as the pg_restore.

Similar argument applies to turning off fsync, which I have found to sometimes make a significant difference (depending
onhardware). 

The other argument I've seen, that if there's a crash during restore you'll have a corrupted database, is bogus. What
areyou going to try to do with a database if there's a crash during restore??? 


Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
On Sat, Jul 27, 2024 at 4:06 PM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jul 27, 2024, at 1:48 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
>
> You can't forget to enable autovacuum when it's in the same script as the pg_restore.

Similar argument applies to turning off fsync, which I have found to sometimes make a significant difference (depending on hardware).

The other argument I've seen, that if there's a crash during restore you'll have a corrupted database, is bogus. What are you going to try to do with a database if there's a crash during restore???

 Josh Berkus' blog post, which I linked to earlier, recommended that, and a lot more, 10 years ago.  It all still holds up.  Bigger servers just means some values need enlarging.

Re: Faster pg_resore with autovacuum off?

From
Laurenz Albe
Date:
On Sat, 2024-07-27 at 15:48 -0400, Ron Johnson wrote:
> > If your machine is too weak to handle the combined workload of restore +
> > autovacuum, that might get you something.  Otherwise, you are just cheating:
> > the restore might be faster, but the database is not usable before autovacuum
> > and autoanalyze have proessed all restored tables.
>
> I 100% disagree with this: autovacuum is not necessary during pg_restore
> (presuming there's only one database in the instance).

It is not necessary during restore, but it is necessary if you want to use
the database after the restore.  Well, you can manually VACUUM and ANALYZE the
database, but why not do it automatically?

I think that there are very few good reasons to ever disable autovacuum, and
a restore is not among them.

> Heck, it might not even be vital if you carefully monitor the other databases
> in the instance.

?

Yours,
Laurenz Albe



Re: Faster pg_resore with autovacuum off?

From
Laurenz Albe
Date:
On Sat, 2024-07-27 at 14:05 -0600, Scott Ribe wrote:
> Similar argument applies to turning off fsync, which I have found to sometimes make a
> significant difference (depending on hardware).

That's bad advice.  Very bad advice.
That is, unless you are ready to delete the cluster and run a new "initdb" after an OS crash.

But why risk that, if you can get virtually the same positive effect by disabling
"synchronous_commit".  But all that shouldn't have a big effect on "pg_restore".
To tune "pg_restore", increate "max_wal_size", "checkpoint_timeout" and "maintenance_work_mem".

> The other argument I've seen, that if there's a crash during restore you'll have a
> corrupted database, is bogus. What are you going to try to do with a database if there's
> a crash during restore???

Drop it?
You are wrong: it is not the database that is broken after a crash, but the entire cluster.

Yours,
Laurenz Albe



Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
On Sun, Jul 28, 2024 at 8:34 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sat, 2024-07-27 at 15:48 -0400, Ron Johnson wrote:
> > If your machine is too weak to handle the combined workload of restore +
> > autovacuum, that might get you something.  Otherwise, you are just cheating:
> > the restore might be faster, but the database is not usable before autovacuum
> > and autoanalyze have proessed all restored tables.
>
> I 100% disagree with this: autovacuum is not necessary during pg_restore
> (presuming there's only one database in the instance).

It is not necessary during restore, but it is necessary if you want to use
the database after the restore.  Well, you can manually VACUUM and ANALYZE the
database, but why not do it automatically?

I think that there are very few good reasons to ever disable autovacuum, and
a restore is not among them.

My experience is that autovacuum=off pg_restore -Fd -j$BigNum + vacuumdb -Zj$BigNum is faster than letting autoanalyze=on pg_restore -Fd -j$BigNum.
 
> Heck, it might not even be vital if you carefully monitor the other databases
> in the instance.

?
 
Tables in a freshly vacuum database won't need vacuuming again "for a while" unless they're really, really busy.  Analyzing is a different story, but targeted manual analyzes work just as well.

Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
On Sun, Jul 28, 2024 at 8:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sat, 2024-07-27 at 14:05 -0600, Scott Ribe wrote:
> Similar argument applies to turning off fsync, which I have found to sometimes make a
> significant difference (depending on hardware).

That's bad advice.  Very bad advice.
That is, unless you are ready to delete the cluster and run a new "initdb" after an OS crash.

Which I am, if there's only one database in the cluster.
 

But why risk that, if you can get virtually the same positive effect by disabling
"synchronous_commit".  But all that shouldn't have a big effect on "pg_restore".
To tune "pg_restore", increate "max_wal_size", "checkpoint_timeout" and "maintenance_work_mem".

I do that too.  
 
> The other argument I've seen, that if there's a crash during restore you'll have a
> corrupted database, is bogus. What are you going to try to do with a database if there's
> a crash during restore???

Drop it?
You are wrong: it is not the database that is broken after a crash, but the entire cluster.

Maybe I'm spoiled by high-quality hardware and SANs, plus VMware, but crashes are damned rare in my environment.

I'll take that risk to restore a database faster in a single-database cluster.

Re: Faster pg_resore with autovacuum off?

From
Scott Ribe
Date:
> On Jul 28, 2024, at 6:40 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> That's bad advice.  Very bad advice.
> That is, unless you are ready to delete the cluster and run a new "initdb" after an OS crash.

Exactly.

> You are wrong: it is not the database that is broken after a crash, but the entire cluster.

Good clarification. I personally have never had occasion to move a partial cluster, so my use of "database" in my
questionwas sloppy, I meant "cluster". So yes, I'd delete the cluster and initdb if I ever actually had an OS crash
duringa pg_restore--which in 20 years of using PG has never happened. I suppose it might matter more if one were forced
torun one's db on an unstable platform ;-) 


Re: Faster pg_resore with autovacuum off?

From
Wells Oliver
Date:
fwiw, we have a lot of materialized views, so restoring a DB on non-vacuumed tables caused the materialization to take a lot longer than it would have with autovacuum running as normal. Seems worth experimenting though.

On Sun, Jul 28, 2024 at 6:58 AM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jul 28, 2024, at 6:40 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> That's bad advice.  Very bad advice.
> That is, unless you are ready to delete the cluster and run a new "initdb" after an OS crash.

Exactly.

> You are wrong: it is not the database that is broken after a crash, but the entire cluster.

Good clarification. I personally have never had occasion to move a partial cluster, so my use of "database" in my question was sloppy, I meant "cluster". So yes, I'd delete the cluster and initdb if I ever actually had an OS crash during a pg_restore--which in 20 years of using PG has never happened. I suppose it might matter more if one were forced to run one's db on an unstable platform ;-)



--

Re: Faster pg_resore with autovacuum off?

From
Ron Johnson
Date:
It's obvious why unanalyzed tables could cause materialization to be slow, but why would unvacuumed tables cause slowness when creating materialized views?

Also, how can you be sure that the underlying tables have been analyzed before pg_restore gets to the CREATE MATERIALIZED VIEW statements?

On Sun, Jul 28, 2024 at 12:36 PM Wells Oliver <wells.oliver@gmail.com> wrote:
fwiw, we have a lot of materialized views, so restoring a DB on non-vacuumed tables caused the materialization to take a lot longer than it would have with autovacuum running as normal. Seems worth experimenting though.
 

Re: Faster pg_resore with autovacuum off?

From
Laurenz Albe
Date:
On Sun, 2024-07-28 at 16:55 -0400, Ron Johnson wrote:
> It's obvious why unanalyzed tables could cause materialization to be slow,
> but why would unvacuumed tables cause slowness when creating materialized views?

Because of missing hint bits and visibility map?
But you are right, ANALYZE is more important for query performance.

However, if you disable autovacuum, you also disable autoanalyze...

Yours,
Laurenz Albe