Thread: AW: AW: Plans for solving the VACUUM problem

AW: AW: Plans for solving the VACUUM problem

From
Zeugswetter Andreas SB
Date:
> A particular point worth making is that in the common case where you've
> updated the same row N times (without changing its index key), the above
> approach has O(N^2) runtime.  The indexscan will find all N index tuples
> matching the key ... only one of which is the one you are looking for on
> this iteration of the outer loop.

It was my understanding, that the heap xtid is part of the key now, and thus 
with a somewhat modified access, it would find the one exact row directly.
And in above case, the keys (since identical except xtid) will stick close 
together, thus caching will be good.

Andreas


Re: AW: AW: Plans for solving the VACUUM problem

From
Tom Lane
Date:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> It was my understanding, that the heap xtid is part of the key now,

It is not.

There was some discussion of doing that, but it fell down on the little
problem that in normal index-search cases you *don't* know the heap tid
you are looking for.

> And in above case, the keys (since identical except xtid) will stick close 
> together, thus caching will be good.

Even without key-collision problems, deleting N tuples out of a total of
M index entries will require search costs like this:

bulk delete in linear scan way:
O(M)        I/O costs (read all the pages)O(M log N)    CPU costs (lookup each TID in sorted list)

successive index probe way:
O(N log M)    I/O costs for probing indexO(N log M)    CPU costs for probing index (key comparisons)

For N << M, the latter looks like a win, but you have to keep in mind
that the constant factors hidden by the O() notation are a lot different
in the two cases.  In particular, if there are T indexentries per page,
the former I/O cost is really M/T * sequential read cost whereas the
latter is N log M * random read cost, yielding a difference in constant
factors of probably a thousand or two.  You get some benefit in the
latter case from caching the upper btree levels, but that's by
definition not a large part of the index bulk.  So where's the breakeven
point in reality?  I don't know but I suspect that it's at pretty small
N.  Certainly far less than one percent of the table, whereas I would
think that people would try to schedule VACUUMs at an interval where
they'd be reclaiming several percent of the table.

So, as I said to Hiroshi, this alternative looks to me like a possible
future refinement, not something we need to do in the first version.
        regards, tom lane