Thread: explanation for seeks in VACUUM

explanation for seeks in VACUUM

From

Jeff Davis

Date:

14 December 2007, 15:30:19

"bigtable" has about 60M records, about 2M of which are dead at the time
of VACUUM. Shared_buffers are about 1GB, and the machine has 4GB of
memory.

If I run a "SELECT COUNT(*) FROM bigtable", and I ktrace that (FreeBSD)
for 10 seconds, I see only a handful of lseek calls (33), which is no
surprise since I am asking for sequential I/O. I assume those lseeks are
just to skip over pages that already happen to be in shared_buffers.

However, If I have several indexes on that table, and I run a VACUUM, I
observe a lot of seeking. In a 10 second interval, I saw about 5000
lseek calls in the ktrace to the same file descriptor (which is an
index). That's about one every 2ms, so I'm sure a large portion of the
file must have been in the OS buffer cache.

I just don't quite understand what's causing the lseeks.

My understanding is that vacuum uses maintenance_work_mem to hold the
list of dead tuples. In my case that's 2M row versions, times about 6
bytes per entry (in the list of dead tuples) equals about 12MB, which is
much less than 128MB maintenance_work_mem. So it doesn't appear that
maintenance_work_mem is too small.

Even if maintenance_work_mem was the limiting factor, wouldn't the
VACUUM still be operating mostly sequentially, even if it takes multiple
passes?

The only seeking that it seems like VACUUM would need to do in an index
file is when an index page completely empties out, but that wouldn't
account for 5000 lseeks in 10 seconds, would it?

Where am I going wrong? Are many of these lseeks no-ops or something?

Regards,
    Jeff Davis

Re: explanation for seeks in VACUUM (8.2.4)

From

Jeff Davis

Date:

14 December 2007, 15:40:50

On Fri, 2007-12-14 at 11:29 -0800, Jeff Davis wrote:
> "bigtable" has about 60M records, about 2M of which are dead at the time
> of VACUUM. Shared_buffers are about 1GB, and the machine has 4GB of
> memory.

Forgot to mention: version 8.2.4

Regards,
    Jeff Davis

Re: explanation for seeks in VACUUM

From

Tom Lane

Date:

14 December 2007, 20:05:00

Jeff Davis <pgsql@j-davis.com> writes:
> Where am I going wrong? Are many of these lseeks no-ops or something?

They're not supposed to be, but if you only tracked seeks and not
reads or writes, it's hard to be sure what's going on.

8.2's VACUUM should process a btree index (this is a btree index no?)
in physical order, so I'd expect lseeks only when a page is already in
buffers --- at least on the read side.  On the write side things might
be a great deal less predictable.  You're cleaning out about one tuple
in 30, so the odds are that nearly every index page is getting dirtied,
and they're going to need to be written sometime.

            regards, tom lane

Re: explanation for seeks in VACUUM

From

Jeff Davis

Date:

14 December 2007, 23:53:25

On Fri, 2007-12-14 at 19:04 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Where am I going wrong? Are many of these lseeks no-ops or something?
>
> They're not supposed to be, but if you only tracked seeks and not
> reads or writes, it's hard to be sure what's going on.

The seeks were comparable to reads and writes, which is what surprised
me:

$ ktrace -p 42440; sleep 5; ktrace -C
$ kdump -f ktrace.out | grep "GIO   fd 38 read" | wc -l
   11479
$ kdump -f ktrace.out | grep "GIO   fd 38 wrote" | wc -l
   11480
$ kdump -f ktrace.out | grep "lseek.0x26" | wc -l
   22960

> 8.2's VACUUM should process a btree index (this is a btree index no?)
> in physical order, so I'd expect lseeks only when a page is already in
> buffers --- at least on the read side.  On the write side things might
> be a great deal less predictable.  You're cleaning out about one tuple
> in 30, so the odds are that nearly every index page is getting dirtied,
> and they're going to need to be written sometime.

Actually, the table I was VACUUMing had both btree and GIN indexes, and
I'm not entirely sure which one was happening at the time. I will
investigate more.

My intuition tells me that, if the pages are being read and dirtied
sequentially, it would be able to write mostly sequentially (at least in
theory).

In the box that was causing the problem (which had more constrained
memory than my reproduced case), it seemed to be swamped by random I/O
-- low CPU usage, and low disk usage (about 1-2 MB/s write), yet VACUUM
would take forever and the box would appear very sluggish.

Regards,
    Jeff Davis