Re: [SQL] Questions about vacuum analyze - Mailing list pgsql-sql

From Bruce Momjian
Subject Re: [SQL] Questions about vacuum analyze
Date
Msg-id 199910110312.XAA06290@candle.pha.pa.us
Whole thread Raw
In response to Re: [SQL] Questions about vacuum analyze  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [SQL] Questions about vacuum analyze
List pgsql-sql
> A further thought: 40M rows could well be more than 4gig of data.
> Although Postgres supports tables that large (by segmenting them into
> 1gig-apiece files to avoid OS limitations on file size), we currently
> do not cope with temporary sort files that exceed 4 gig, which means
> a "select distinct" on that much data will fail regardless of whether
> you have enough free disk space :-(.  We have a TODO item to fix this.
> 
> After looking at psort.c, I see a bunch of other shortcomings, one being
> that it's using a polyphase merge algorithm --- which was hot stuff back
> when people did this sort of thing on magnetic tape drives, but it's
> less than appropriate for sorting on disk.  The worst problem with it is
> that the space consumption is about 4x the actual data volume, which is
> not too cool when you're talking about a huge file to begin with...
> 
> Meanwhile, there is a completely separate implementation of external
> sorting over in nbtsort.c (where it's only used for CREATE INDEX,
> AFAICS), with a somewhat different set of deficiencies.
> 
> I think I'll add "rewrite sort code from the ground up" to my to-do
> list ;-)

I have on TODO:
Make index creation use psort code, because it is now faster(Vadim)

I didn't know sorting algorithms for tape and disk had different
optimizations.  I figured the paging in of disk blocks had a similar
penalty to tape rewinding.  None of us really knows a lot about the best
algorithm for that job.  Nice you recognized it.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] Questions about vacuum analyze
Next
From: Martin Dolog
Date:
Subject: unsubscribe