Re: Replacement Selection - Mailing list pgsql-hackers

From
Subject Re: Replacement Selection
Date
Msg-id BAY132-DS194113DD78CD5D3842F97E6750@phx.gbl
Whole thread Raw
In response to Autovacuum and OldestXmin  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Replacement Selection  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Replacement Selection  (<mac_man2005@hotmail.it>)
Re: Replacement Selection  (<mac_man2005@hotmail.it>)
Re: Replacement Selection  (<mac_man2005@hotmail.it>)
Re: Replacement Selection  (<mac_man2005@hotmail.it>)
List pgsql-hackers
Sorry.

I'm trying to integrate my code into PostgreSQL. At the moment I have got my 
working code, with my own main() etc etc.
The code is supposed to perform run generation during external sorting. 
That's all, my code won't do any mergesort. Just run generation.

I'm studing the code and I don't know where to put my code into. Which part 
I need to substitute and which other are absolutely "untouchables".
I admit I'm not an excellent programmer. I've always been writing my own 
codes, simple codes. Now I have got some ideas that can possibly help 
postgreSQL to get better. And for the first time I'm to integrate code into 
others code. I say it just to apologize in case some things that could be 
obvious for someone else, maybe are not for me.

Anyway... back to work.
My code has the following structure.

1) Generates a random input stream to sort.
As for this part, i just generate an integer input stream, not a stream of 
db records. I talk about stream because I'm in a general case in which the 
input source can be unknown and we cannot even know how much elements to 
sort

2)Fill the available memory with the first M elements from stream. They will 
be arranged into an heap structure.

3) Start run generation. As for this phase, I see PostgreSQL code (as Knuth 
algorithm) marks elements belonging to runs in otder to know which run they 
belong to and to know when the current heap has finished building the 
current run. I don't memorize this kind of info. I just output from heap to 
run all of the elements going into the current run. The elements supposed to 
go into the next run (I call them "dead records") are still stored into main 
memory, but as leaves of the heap. This implies reducing the heap size and 
so heapifying a smaller number of elements each time I get a dead record 
(it's not necessary to sort dead records). When the heap size is zero a new 
run is created heapifying all the dead records currently present into main 
memory.

I haven't seen something similar into tuplesort.c, apparently no heapify is 
called no new run created and stuff like this.
Do you see any parallelism between PostgreSQL code with what I said in the 
previous points?

Thanks for your attention.

--------------------------------------------------
From: "Heikki Linnakangas" <heikki@enterprisedb.com>
Sent: Monday, November 26, 2007 5:42 PM
To: <mac_man2005@hotmail.it>
Cc: <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replacement Selection

> mac_man2005@hotmail.it wrote:
>> Unfortunately I'm lost into the code... any good soul helping me to 
>> understand what should be the precise part to be modified?
>
> You haven't given any details on what you're trying to do. What are you 
> trying to do?
>
> -- 
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
> 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: maintenance_work_mem memory constraint?
Next
From: Chris Browne
Date:
Subject: Re: proposal, plpgsql, 8.4, for record in cursor