Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Compression and on-disk sorting
Date
Msg-id 20060607143542.GS45331@pervasive.com
Whole thread Raw
In response to Re: Compression and on-disk sorting  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Compression and on-disk sorting  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Wed, Jun 07, 2006 at 11:59:50AM +0100, Simon Riggs wrote:
> On Wed, 2006-06-07 at 01:33 -0500, Jim C. Nasby wrote:
> > On Fri, May 26, 2006 at 09:21:44PM +0100, Simon Riggs wrote:
> > > On Fri, 2006-05-26 at 14:47 -0500, Jim C. Nasby wrote:
> > > 
> > > > But the meat is:
> > > >                                         -- work_mem --
> > > >                         Scale           2000    20000
> > > > not compressed          150             805.7   797.7
> > > > not compressed          3000            17820   17436
> > > > compressed              150             371.4   400.1
> > > > compressed              3000            8152    8537
> > > > compressed, no headers  3000            7325    7876
> > > 
> > > Since Tom has committed the header-removing patch, we need to test
> > > 
> > >     not compressed, no headers v compressed, no headers
> > 
> >                                         -- work_mem --
> >                         Scale           2000    20000
> > not compressed          150             805.7   797.7
> > not compressed          3000            17820   17436
> > not compressed, no hdr  3000            14470   14507
> > compressed              150             371.4   400.1
> > compressed              3000            8152    8537
> > compressed, no headers  3000            7325    7876
> 
> That looks fairly conclusive. Can we try tests with data in reverse
> order, so we use more tapes? We're still using a single tape, so the
> additional overhead of compression doesn't cause any pain.

Would simply changing the ORDER BY to DESC suffice for this? FWIW:

bench=# select correlation from pg_stats where tablename='accounts' and attname='bid';correlation 
-------------          1
(1 row)

> > > There is a noticeable rise in sort time with increasing work_mem, but
> > > that needs to be offset from the benefit that in-general comes from
> > > using a large Heap for the sort. With the data you're using that always
> > > looks like a loss, but that isn't true with all input data orderings.
> > 
> > I thought that a change had been made to the on-disk sort specifically to
> > eliminate the problem of more work_mem making the sort take longer. 
> 
> There was a severe non-optimal piece of code...but the general effect
> still exists. As does the effect that having higher work_mem produces
> fewer runs which speeds up the final stages of the sort.
> 
> > I also
> > thought that there was something about that fix that was tunable.
> 
> Increasing work_mem makes *this* test case take longer. 
> 
> -- 
>   Simon Riggs             
>   EnterpriseDB   http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: That EXPLAIN ANALYZE patch still needs work
Next
From: Simon Riggs
Date:
Subject: Re: Compression and on-disk sorting