Re: [HACKERS] \dt and disk access - Mailing list pgsql-hackers
| From | Vadim B. Mikheev |
|---|---|
| Subject | Re: [HACKERS] \dt and disk access |
| Date | |
| Msg-id | 3c40e99a98b2fce67dd381273dd3ec86 Whole thread Raw |
| In response to | [HACKERS] \dt and disk access (Bruce Momjian <maillist@candle.pha.pa.us>) |
| List | pgsql-hackers |
Bruce Momjian wrote:
>
> > I don't fully understand what do you mean...
> > btree-build uses qsort to sort items on a page and has own
> > priority queue methods (instead of lselect.c - leftist tree
> > selection algorithm).
> > Do you want use qsort for in-memory sorts ?
>
> When I said btree-build, I meant the leftist tree sort, so I am
> considering replacing the leftist-tree selection algorithm with a qsort
> of an array of HeapTuple pointers. Any comments on this? The only
> performance problem I see is re-allocing the array as the number of
> entries grows, but if I double the size each time I exceed the current
> size, I don't believe it will be much of a problem. The leftist tree
> looked very funny to me, with recursion and stuff. I saw the new
> FASTBUILD sort used qsort, so I thought it may be a good idea.
Ok. What do you suppose to use for merging tapes - lselect.c or
btree queue methods ?
>
> >
> > BTW, I may suggest additional trick:
> > if size of sort-keys is (much?) smaller then size of tuple
> > (select _int4_field_, _big_text_field_ from ... order by _int4_field_)
> > and you are going to switch to tape-methods (i.e. you've pallocted
> > all sort-memory) then you may try to create temp-file to store
> > (unsorted) tuples there and to deal with sort-keys only. After
> > you'll got sort-keys (or better say - records which keep sort-keys
> > and pointers - temp-file offsets - to original tuples) sorted, you
> > may use these pointers to get requested tuple from temp-file.
>
...
>
> The problem I see is that the final read of sorted rows from the full
> tape file will be seeking all over the file, causing poor performance.
I see it.
BTW, using indices for sorts has the same problem!
As work around you may try to reduce seek randomness by addition offset to
sort-keys (at the end) - but this may help in the case of duplicates only,
of course.
On the other hand, when you are merging tape-files don't you do
random seeks by switching from one tape to another ?
I'm not sure but it seems that more than one pass over all
tape-files may be required...
> Now, not having to move around extra data during the sort is going to be
> good for performance, but I am not sure what the result will be.
(Note that I'm too.)
>
> I think you are correct that it will be a win to only sort the needed
> columns, but considering that any sort that uses tape files will be big,
> and hence will not fit in the file system cache, and considering the
> extra code to do this, I am uncertain about its usefulness.
Ok - select int4 ... order by int4 using 4M for in-memory sort:
1. entire-tuple method
in-memory sort will be used for 4 000 000 / 56 =~ 71428 tuples
in result.
2. sort-records method
in-memory sort for 4 000 000 / 12 = 333333 tuples in result
(12 = 8 + 4: 4 bytes to hold NULLs bitmask)
+ writing 333333*56 = 18 666 648 bytes into temp_file
(btw, we may exclude 8 bytes of int4 + aligning from tuples
in temp-file - we have attribute values in sort-records, -
so it may be reduced to 15 999 984: not so big in this case,
but may have sence for bigger sort-keys)
+ 333333 seeks over temp-file when returning results
3. entire-tuple method for 333333 tuples result
writing of 6 tape-files with 18 666 648 bytes total size
+ ??? seeks in merging
+ writing 18 666 648 bytes of final tape-file
is it all ? or there may be more than one pass (read/write)
over tape-files ?
Vadim
------------------------------
pgsql-hackers by date: