Home > mailing lists

Re: AW: AW: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: AW: AW: Plans for solving the VACUUM problem
Date	May 18, 2001 11:56:33
Msg-id	15440.990196536@sss.pgh.pa.us Whole thread Raw
In response to	AW: AW: Plans for solving the VACUUM problem (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>)
List	pgsql-hackers

Tree view

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
> It was my understanding, that the heap xtid is part of the key now,

It is not.

There was some discussion of doing that, but it fell down on the little
problem that in normal index-search cases you *don't* know the heap tid
you are looking for.

> And in above case, the keys (since identical except xtid) will stick close
> together, thus caching will be good.

Even without key-collision problems, deleting N tuples out of a total of
M index entries will require search costs like this:

bulk delete in linear scan way:
O(M) I/O costs (read all the pages)O(M log N) CPU costs (lookup each TID in sorted list)

successive index probe way:
O(N log M) I/O costs for probing indexO(N log M) CPU costs for probing index (key comparisons)

For N << M, the latter looks like a win, but you have to keep in mind
that the constant factors hidden by the O() notation are a lot different
in the two cases. In particular, if there are T indexentries per page,
the former I/O cost is really M/T * sequential read cost whereas the
latter is N log M * random read cost, yielding a difference in constant
factors of probably a thousand or two. You get some benefit in the
latter case from caching the upper btree levels, but that's by
definition not a large part of the index bulk. So where's the breakeven
point in reality? I don't know but I suspect that it's at pretty small
N. Certainly far less than one percent of the table, whereas I would
think that people would try to schedule VACUUMs at an interval where
they'd be reclaiming several percent of the table.

So, as I said to Hiroshi, this alternative looks to me like a possible
future refinement, not something we need to do in the first version.
regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 18 May 2001, 11:22:51
Subject: Re: Running config vars

From: teg@redhat.com (Trond Eivind Glomsrød)
Date: 18 May 2001, 12:02:27
Subject: Re: Need Postgresql ODBC Driver

Re: AW: AW: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

Previous

Next