Re: polyphase merge? - Mailing list pgsql-hackers

From Don Marvick
Subject Re: polyphase merge?
Date
Msg-id d18e24870902032242v4d868296o39774f928ad76ff0@mail.gmail.com
Whole thread Raw
In response to polyphase merge?  (Don Marvick <donmarvick@gmail.com>)
Responses Re: polyphase merge?
Re: polyphase merge?
List pgsql-hackers
Dear All,

Since nobody replied, I would give it a try. I am going to implement the merge pattern described in Knuth Page 365 (5.4.9), essentially it is as follow:
- create initial runs using replacement selection (basically this is as in the current implementation)
- add enough dummy runs of size 0 so that the number of sorted run minus one can be divided by k-1 (k is merge fan in)
- repetitively merge k smallest runs until only 1 run left

I am new to postgresql, hence any advice would be appreciated.

Can anybody give me some advice on how it can done?
1. how a run should be represented? should I reuse the tape mechanism? e.g. 1 tape 1 run
- or should I use a temp buffer?

2. How memory should be allocated? I assume I divide the memory equally to k runs, hence each run will get M/k memory. Each read of a run would be of size M/k bytes.

3. Prefetching. Then, we can precompute the read sequence of blocks of run during the entire merge process. Based on this, we know the blocks of run to be prefetched at a point of time.

3. Is it possible to perform read/write I/O while the merge is being performed? Hence we overlap I/O and computation.

4. any other issue needs consideration?

Best regards,

Don

On Thu, Jan 29, 2009 at 4:11 PM, Don Marvick <donmarvick@gmail.com> wrote:
Dear All,

I apologize if this has been discussed before. I have tried to search to the mailing list and could not find an exact answer.

Currently, Postgres uses Knuth's Algorithm 5.4.2D to perform external merge sort. IMHO the algorithm is designed for tapes.

Why don't the simple text book merge pattern described in Knuth Page 365 (5.4.9) is used for disk based system? The same merge pattern is also described in Ramakrishnan text book and it is optimal if seek time is not counted, which of course not the real-world case.

Best regards,

Don


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Hot standby, recovery infra
Next
From: Fujii Masao
Date:
Subject: Re: Synch Replication