Thread: vacuum performance on insert

vacuum performance on insert

From
Sean Chen
Date:
Hi, I'm curious -- does "vacuum analyze e.g. table1" improve
performance on "insert into table1 ...". I understand the vacuum
analyze helps out the query -- select, etc., but just not quite sure
on insert.

Specifically, I'm doing the following.

1, delete records ...
2, insert records ...

if I add "vacuum analyze" in-between this two steps, will it help on
the performance on the insert?
More importantly, If so, why?

Thanks,
Sean

Re: vacuum performance on insert

From
"Kevin Grittner"
Date:
Sean Chen <zyschen@gmail.com> wrote:

> 1, delete records ...
> 2, insert records ...
>
> if I add "vacuum analyze" in-between this two steps, will it help
> on the performance on the insert?

Assuming there are no long-running transactions which would still be
able to see the deleted rows, a VACUUM between those statements
would allow the INSERT to re-use the space previously occupied by
the deleted rows, rather than possibly needing to allocate new space
from the OS.

-Kevin

Re: vacuum performance on insert

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Sean Chen <zyschen@gmail.com> wrote:
>> 1, delete records ...
>> 2, insert records ...
>>
>> if I add "vacuum analyze" in-between this two steps, will it help
>> on the performance on the insert?

> Assuming there are no long-running transactions which would still be
> able to see the deleted rows, a VACUUM between those statements
> would allow the INSERT to re-use the space previously occupied by
> the deleted rows, rather than possibly needing to allocate new space
> from the OS.

But on the other side of the coin, the ANALYZE step is probably not very
helpful there.  Better to do that after you've loaded the new data.

            regards, tom lane

Re: vacuum performance on insert

From
Sean Chen
Date:
hi, thank you for the reply.

I ran a number of tests to try to make sense of this.

When I ran with or without vacuum, the number of disk io operations,
cache operations etc. gathered from pg_stat table for the insertions
are pretty much the same.

So I don't see vacuum reduce disk io operations.

Now from what you mentioned below, do you know what's the cost of
postgres requesting new disk space from OS?

I'm seeing a big performance difference with vacuum, but I need a
proof to show it's the requesting new space operation that was the
problem, not disk io, etc. since I would think disk could be expensive
as well.

Thanks,
Sean

On Thu, Aug 5, 2010 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Sean Chen <zyschen@gmail.com> wrote:
>>> 1, delete records ...
>>> 2, insert records ...
>>>
>>> if I add "vacuum analyze" in-between this two steps, will it help
>>> on the performance on the insert?
>
>> Assuming there are no long-running transactions which would still be
>> able to see the deleted rows, a VACUUM between those statements
>> would allow the INSERT to re-use the space previously occupied by
>> the deleted rows, rather than possibly needing to allocate new space
>> from the OS.
>
> But on the other side of the coin, the ANALYZE step is probably not very
> helpful there.  Better to do that after you've loaded the new data.
>
>                        regards, tom lane
>

Re: vacuum performance on insert

From
"Kevin Grittner"
Date:
Sean Chen <zyschen@gmail.com> wrote:

> Now from what you mentioned below, do you know what's the cost of
> postgres requesting new disk space from OS?

Depending on your OS and its version, your file system, your mount
options, and your disk subsystem (and its firmware revision), there
could be various effects -- the one likely to be biting you is write
barriers.  When you allocate additional space from the OS, and it
extends a file or creates a new file, there might be a write barrier
to ensure that the file system catalog entries are persisted.  This
could cause all writes (and possibly even reads) to pause until the
data is written to disk.

That's just a guess, of course.   If you have a profiler you can run
you might be able to pin it down with that.

-Kevin