Table clustering idea - Mailing list pgsql-hackers

From Dawid Kuroczko
Subject Table clustering idea
Date
Msg-id 758d5e7f0606251648h4d518ca6k7e1c511ba316bb8b@mail.gmail.com
Whole thread Raw
List pgsql-hackers
There is a well known command called CLUSTER which organizes table<br />in specified index's order.  It has a drawback,
thatnew tuples added are<br />not in this order. Last night I had idea which could be interesting, I hope. <br /><br
/>Theidea is to make use of 'histogram_bounds' collected statistical data.<br />Instead of inserting row into first
suitablespot in a table, a table would<br />be "divided" into sections, one for each of histogram_bounds ranges. <br
/>Wheninserting, the database would try to find most suitable section<br />to insert (using the histogram_bounds), and
ifthere were free spots<br />there, would insert there.  If not, it would either look for a tuple in nearby <br
/>sections,or first suitable place.<br /><br />What would it do?  It would try to keep table somewhat organized,<br
/>keepingrows of similar values close together (within SET STATISTICS<br />resolution, so a common scenario would be 50
or100 "sections"). <br />It would make it a bit hard for a table to shrink (since new rows would<br />be added
throughoutthe table, not at the beginning).<br /><br />Other idea than using histogram_bounds would be using the
position<br/>of key inside the index to determine the "ideal" place of row inside <br />the table and find the closest
freespot there. This would be of course<br />much more precise and wouldn't rely on statistic.<br /><br />  
Regards,<br/>      Dawid<br /> 

pgsql-hackers by date:

Previous
From: "Diogo Biazus"
Date:
Subject: Re: xlog viewer proposal
Next
From: "Luke Lonergan"
Date:
Subject: Re: Table clustering idea