Re: [SQL] Questions about vacuum analyze - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] Questions about vacuum analyze
Date
Msg-id 7659.939652362@sss.pgh.pa.us
Whole thread Raw
In response to Re: [SQL] Questions about vacuum analyze  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [SQL] Questions about vacuum analyze  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-sql
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> I didn't know sorting algorithms for tape and disk had different
> optimizations.  I figured the paging in of disk blocks had a similar
> penalty to tape rewinding.  None of us really knows a lot about the best
> algorithm for that job.  Nice you recognized it.

I didn't know much about it either until the comments in psort.c led me
to read the relevant parts of Knuth.  Tape rewind and disk seek times
are not remotely comparable, especially when you're talking about N tape
units versus one disk unit --- using more files (tapes) to minimize
rewind time can be a big win on tapes, but put those same files on a
disk and you're just spreading the seek time around differently.  More
to the point, though, tape-oriented algorithms tend to assume that tape
space is free.  Polyphase merge doesn't care in the least that it has
several copies of any given record laying about on different tapes, only
one of which is of interest anymore.

I know how to get the sort space requirement down to 2x the actual data
volume (just use a balanced merge instead of the "smarter" polyphase)
but I am thinking about ways to get it down to data volume + a few
percent with an extra layer of bookkeeping.  The trick is to release
and recycle space as soon as we have read in the tuples stored in it...
        regards, tom lane


pgsql-sql by date:

Previous
From: Martin Dolog
Date:
Subject: unsubscribe
Next
From: Bruce Momjian
Date:
Subject: Re: [SQL] Questions about vacuum analyze