Re: [HACKERS] \dt and disk access - Mailing list pgsql-hackers

From Vadim B. Mikheev
Subject Re: [HACKERS] \dt and disk access
Date
Msg-id 3c40e99a98b2fce67dd381273dd3ec86
Whole thread Raw
In response to [HACKERS] \dt and disk access  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian wrote:
>
> > I don't fully understand what do you mean...
> > btree-build uses qsort to sort items on a page and has own
> > priority queue methods (instead of lselect.c - leftist tree
> > selection algorithm).
> > Do you want use qsort for in-memory sorts ?
>
> When I said btree-build, I meant the leftist tree sort, so I am
> considering replacing the leftist-tree selection algorithm with a qsort
> of an array of HeapTuple pointers.  Any comments on this?  The only
> performance problem I see is re-allocing the array as the number of
> entries grows, but if I double the size each time I exceed the current
> size, I don't believe it will be much of a problem.  The leftist tree
> looked very funny to me, with recursion and stuff.  I saw the new
> FASTBUILD sort used qsort, so I thought it may be a good idea.

Ok. What do you suppose to use for merging tapes - lselect.c or
btree queue methods ?

>
> >
> > BTW, I may suggest additional trick:
> > if size of sort-keys is (much?) smaller then size of tuple
> > (select _int4_field_, _big_text_field_ from ... order by _int4_field_)
> > and you are going to switch to tape-methods (i.e. you've pallocted
> > all sort-memory) then you may try to create temp-file to store
> > (unsorted) tuples there and to deal with sort-keys only. After
> > you'll got sort-keys (or better say - records which keep sort-keys
> > and pointers - temp-file offsets - to original tuples) sorted, you
> > may use these pointers to get requested tuple from temp-file.
>
...
>
> The problem I see is that the final read of sorted rows from the full
> tape file will be seeking all over the file, causing poor performance.

I see it.
BTW, using indices for sorts has the same problem!
As work around you may try to reduce seek randomness by addition offset to
sort-keys (at the end) - but this may help in the case of duplicates only,
of course.

On the other hand, when you are merging tape-files don't you do
random seeks by switching from one tape to another ?
I'm not sure but it seems that more than one pass over all
tape-files may be required...

> Now, not having to move around extra data during the sort is going to be
> good for performance, but I am not sure what the result will be.

(Note that I'm too.)

>
> I think you are correct that it will be a win to only sort the needed
> columns, but considering that any sort that uses tape files will be big,
> and hence will not fit in the file system cache, and considering the
> extra code to do this, I am uncertain about its usefulness.

Ok - select int4 ... order by int4 using 4M for in-memory sort:

1. entire-tuple method
   in-memory sort will be used for 4 000 000 / 56 =~ 71428 tuples
   in result.
2. sort-records method
   in-memory sort for 4 000 000 / 12 = 333333 tuples in result
   (12 = 8 + 4: 4 bytes to hold NULLs bitmask)
   + writing 333333*56 = 18 666 648 bytes into temp_file
     (btw, we may exclude 8 bytes of int4 + aligning from tuples
      in temp-file - we have attribute values in sort-records, -
      so it may be reduced to 15 999 984: not so big in this case,
      but may have sence for bigger sort-keys)
   + 333333 seeks over temp-file when returning results
3. entire-tuple method for 333333 tuples result
   writing of 6 tape-files with 18 666 648 bytes total size
   + ??? seeks in merging
   + writing 18 666 648 bytes of final tape-file
   is it all ? or there may be more than one pass (read/write)
   over tape-files ?

Vadim

------------------------------

pgsql-hackers by date:

Previous
From: Raymond Toy
Date:
Subject: Re: [HACKERS] TR: Problems with Large Objects !!
Next
From: Bruce Momjian
Date:
Subject: Re: Re[2]: [HACKERS] \dt and disk access