Thread: Most efficient way to hard-sort records

Most efficient way to hard-sort records

From
Miroslav Šulc
Date:
Hello,

I have a table with cca 100,000 records. I need to hard-sort the records
by a key from second table. By hard-sorting I mean storing position of
each row in the main table. Here is what my tables look like:

main_table: id, name, position
key_table: id, main_table_id, key, value

Here is how I need to sort the records:
SELECT * FROM main_table
INNER JOIN key_table ON main_table.id = key_table.main_table_id
WHERE key = 'param'
ORDER BY value

I currently collect all ids from main_table in sorted order and then
update the position field for each row in the main_table one-by-one. Is
there a better/faster/more efficient solution?

Thank you for your suggestions.

--
Miroslav Šulc


Attachment

Re: Most efficient way to hard-sort records

From
Markus Schaber
Date:
Hi, Miroslav,

Miroslav Šulc schrieb:

> I have a table with cca 100,000 records. I need to hard-sort the records
> by a key from second table. By hard-sorting I mean storing position of
> each row in the main table. Here is what my tables look like:
> 
> main_table: id, name, position
> key_table: id, main_table_id, key, value
> 
> Here is how I need to sort the records:
> SELECT * FROM main_table
> INNER JOIN key_table ON main_table.id = key_table.main_table_id
> WHERE key = 'param'
> ORDER BY value
> 
> I currently collect all ids from main_table in sorted order and then
> update the position field for each row in the main_table one-by-one. Is
> there a better/faster/more efficient solution?

Create an SQL function that selects the sort value from the key table
when given id as parameter, and then create a functional index on the
table, and CLUSTER the table on the index.

Scratch-Code (untested):

CREATE FUNCTION getvalue (ID int4) RETURNS int4 AS
" SELECT value FROM key_table WHERE value=$1 LIMIT 1"
LANGUAGE SQL STRICT;

CREATE INDEX main_table_order_idx ON main_table (getvalue(id));

CLUSTER main_table_order_idx ON main_table;


HTH,
Markus





Re: Most efficient way to hard-sort records

From
"Ben K."
Date:
> main_table: id, name, position
> key_table: id, main_table_id, key, value
>
> Here is how I need to sort the records:
> SELECT * FROM main_table
> INNER JOIN key_table ON main_table.id = key_table.main_table_id
> WHERE key = 'param'
> ORDER BY value
>
> I currently collect all ids from main_table in sorted order and then
> update the position field for each row in the main_table one-by-one. Is
> there a better/faster/more efficient solution?


A cheap solution if you don't care about the position value as long as 
sort order is ok.

1)
# SELECT main_table.id into temp_table FROM main_table INNER JOIN 
key_table ON main_table.id = key_table.main_table_id ORDER BY value;

2)
# update main_table set position = (select oid from temp_table where id = 
main_table.id );

I guess I'll get a set of consecutive oids by this.

You can make the number begin at arbitrary number, by

2-a)
# update main_table set position = ( (select oid::int4 from temp_table 
where id = main_table.id ) - (select min(oid::int4) from temp_table) + 1) 
;

I read that oid wraps around (after ~ billions) so you might want to check 
your current oid.




Regards,

Ben K.
Developer
http://benix.tamu.edu


Re: Most efficient way to hard-sort records

From
PFC
Date:
Is it possible to do this :
CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...)
INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN  
key_table ON main_table.id = key_table.main_table_id WHERE key = 'param'  
ORDER BY value SELECT
The SERIAL will automatically generate the order_no you want, which  
corresponds to the position in the sorted set.

Then, to get the records in-order :
SELECT * FROM sorted ORDER BY order_no
As the records have been inserted in-order in the "sorted" table, this  
table is, in fact, clustered, so a full table scan using the index on  
"order_no" will be very fast.Of course this is only interesting if this data is quite static, because  
you'll have to re-generate the table when the data changes.

There is another solution :
CREATE INDEX on key_table( key, value )
Now, the index can optimize ordering by (key,value), which is equivalent  
to ordering by value if key = constant. A bit of query manipulation might  
get you what you want ; I suppose all rows in "key_table" reference a row  
in "main_table" ; so it is faster to sort (and limit) first on key_table,  
then grab the rows from main_table :

SELECT k.value, m.* FROM key_table k LEFT JOIN main_table m ON  
m.id=k.main_table_id WHERE k.key='param' ORDER BY k.key, k.value
If key_table REFERENCES main_table, LEFT JOIN is equivalent to INNER JOIN  
; however if the planner is smart enough, it might notice that it can  
index-scan key_table in key,value order, grabbing rows from main_table in  
order and skip the sort entirely.






On Sun, 07 May 2006 08:53:46 +0200, Ben K. <bkim@coe.tamu.edu> wrote:

>> main_table: id, name, position
>> key_table: id, main_table_id, key, value
>>
>> Here is how I need to sort the records:
>> SELECT * FROM main_table
>> INNER JOIN key_table ON main_table.id = key_table.main_table_id
>> WHERE key = 'param'
>> ORDER BY value
>>
>> I currently collect all ids from main_table in sorted order and then
>> update the position field for each row in the main_table one-by-one. Is
>> there a better/faster/more efficient solution?
>
>
> A cheap solution if you don't care about the position value as long as  
> sort order is ok.
>
> 1)
> # SELECT main_table.id into temp_table FROM main_table INNER JOIN  
> key_table ON main_table.id = key_table.main_table_id ORDER BY value;
>
> 2)
> # update main_table set position = (select oid from temp_table where id  
> = main_table.id );
>
> I guess I'll get a set of consecutive oids by this.
>
> You can make the number begin at arbitrary number, by
>
> 2-a)
> # update main_table set position = ( (select oid::int4 from temp_table  
> where id = main_table.id ) - (select min(oid::int4) from temp_table)  
> + 1) ;
>
> I read that oid wraps around (after ~ billions) so you might want to  
> check your current oid.
>
>
>
>
> Regards,
>
> Ben K.
> Developer
> http://benix.tamu.edu
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings




Re: Most efficient way to hard-sort records

From
"Ben K."
Date:
>     CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...)
>     INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN 
> key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER 
> BY value SELECT
>     The SERIAL will automatically generate the order_no you want, which 
> corresponds to the position in the sorted set.
> Then, to get the records in-order :
>     SELECT * FROM sorted ORDER BY order_no

Good ... I just got myself into the habit of not recreating a table since 
I have to clean up permissions and what not. I guess it depends.

Another version along that line ?

# create sequence counterseq start 1;
-- (set/reset whenever a counter is needed)

# select main_table.*, nextval('counterseq') as position2  into sorted_main_table  from main_table, keytable where
main_table.id=  keytable.main_table_id  order by value;
 




Regards,

Ben K.
Developer
http://benix.tamu.edu


Re: Most efficient way to hard-sort records

From
PFC
Date:
> Another version along that line ?
>
> # create sequence counterseq start 1;
> -- (set/reset whenever a counter is needed)
>
> # select main_table.*, nextval('counterseq') as position2
>    into sorted_main_table
>    from main_table, keytable where main_table.id =
>    keytable.main_table_id
>    order by value;
You could also use generate_series(), but I don't know if it can generate  
unbounded series...


Re: Most efficient way to hard-sort records

From
Miroslav Šulc
Date:
Hi PFC,

the generate_series() seems to be the right thing I need in my scenario
but I didn't figure out how to create a SELECT stattement that will
contain sorted main_table.id in first column and generated series in
second column which I would use to update the main_table.position. Is it
possible? How can I do that? Or is there another approach to employ
generate_series()?

Miroslav Šulc


PFC napsal(a):
>
>> Another version along that line ?
>>
>> # create sequence counterseq start 1;
>> -- (set/reset whenever a counter is needed)
>>
>> # select main_table.*, nextval('counterseq') as position2
>>    into sorted_main_table
>>    from main_table, keytable where main_table.id =
>>    keytable.main_table_id
>>    order by value;
>
>     You could also use generate_series(), but I don't know if it can
> generate unbounded series...
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Attachment

Re: Most efficient way to hard-sort records

From
Michael Glaesemann
Date:
On May 6, 2006, at 17:13 , Miroslav Šulc wrote:

> I have a table with cca 100,000 records. I need to hard-sort the
> records
> by a key from second table. By hard-sorting I mean storing position of
> each row in the main table. Here is what my tables look like:
>
> main_table: id, name, position
> key_table: id, main_table_id, key, value
>
> Here is how I need to sort the records:
> SELECT * FROM main_table
> INNER JOIN key_table ON main_table.id = key_table.main_table_id
> WHERE key = 'param'
> ORDER BY value

I don't know about faster or more efficient, but here's an
alternative that I haven't seen yet:




Michael Glaesemann
grzm seespotcode net