Thread: Most efficient way to hard-sort records
Hello, I have a table with cca 100,000 records. I need to hard-sort the records by a key from second table. By hard-sorting I mean storing position of each row in the main table. Here is what my tables look like: main_table: id, name, position key_table: id, main_table_id, key, value Here is how I need to sort the records: SELECT * FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER BY value I currently collect all ids from main_table in sorted order and then update the position field for each row in the main_table one-by-one. Is there a better/faster/more efficient solution? Thank you for your suggestions. -- Miroslav Šulc
Attachment
Hi, Miroslav, Miroslav Šulc schrieb: > I have a table with cca 100,000 records. I need to hard-sort the records > by a key from second table. By hard-sorting I mean storing position of > each row in the main table. Here is what my tables look like: > > main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value > > I currently collect all ids from main_table in sorted order and then > update the position field for each row in the main_table one-by-one. Is > there a better/faster/more efficient solution? Create an SQL function that selects the sort value from the key table when given id as parameter, and then create a functional index on the table, and CLUSTER the table on the index. Scratch-Code (untested): CREATE FUNCTION getvalue (ID int4) RETURNS int4 AS " SELECT value FROM key_table WHERE value=$1 LIMIT 1" LANGUAGE SQL STRICT; CREATE INDEX main_table_order_idx ON main_table (getvalue(id)); CLUSTER main_table_order_idx ON main_table; HTH, Markus
> main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value > > I currently collect all ids from main_table in sorted order and then > update the position field for each row in the main_table one-by-one. Is > there a better/faster/more efficient solution? A cheap solution if you don't care about the position value as long as sort order is ok. 1) # SELECT main_table.id into temp_table FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id ORDER BY value; 2) # update main_table set position = (select oid from temp_table where id = main_table.id ); I guess I'll get a set of consecutive oids by this. You can make the number begin at arbitrary number, by 2-a) # update main_table set position = ( (select oid::int4 from temp_table where id = main_table.id ) - (select min(oid::int4) from temp_table) + 1) ; I read that oid wraps around (after ~ billions) so you might want to check your current oid. Regards, Ben K. Developer http://benix.tamu.edu
Is it possible to do this : CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...) INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER BY value SELECT The SERIAL will automatically generate the order_no you want, which corresponds to the position in the sorted set. Then, to get the records in-order : SELECT * FROM sorted ORDER BY order_no As the records have been inserted in-order in the "sorted" table, this table is, in fact, clustered, so a full table scan using the index on "order_no" will be very fast.Of course this is only interesting if this data is quite static, because you'll have to re-generate the table when the data changes. There is another solution : CREATE INDEX on key_table( key, value ) Now, the index can optimize ordering by (key,value), which is equivalent to ordering by value if key = constant. A bit of query manipulation might get you what you want ; I suppose all rows in "key_table" reference a row in "main_table" ; so it is faster to sort (and limit) first on key_table, then grab the rows from main_table : SELECT k.value, m.* FROM key_table k LEFT JOIN main_table m ON m.id=k.main_table_id WHERE k.key='param' ORDER BY k.key, k.value If key_table REFERENCES main_table, LEFT JOIN is equivalent to INNER JOIN ; however if the planner is smart enough, it might notice that it can index-scan key_table in key,value order, grabbing rows from main_table in order and skip the sort entirely. On Sun, 07 May 2006 08:53:46 +0200, Ben K. <bkim@coe.tamu.edu> wrote: >> main_table: id, name, position >> key_table: id, main_table_id, key, value >> >> Here is how I need to sort the records: >> SELECT * FROM main_table >> INNER JOIN key_table ON main_table.id = key_table.main_table_id >> WHERE key = 'param' >> ORDER BY value >> >> I currently collect all ids from main_table in sorted order and then >> update the position field for each row in the main_table one-by-one. Is >> there a better/faster/more efficient solution? > > > A cheap solution if you don't care about the position value as long as > sort order is ok. > > 1) > # SELECT main_table.id into temp_table FROM main_table INNER JOIN > key_table ON main_table.id = key_table.main_table_id ORDER BY value; > > 2) > # update main_table set position = (select oid from temp_table where id > = main_table.id ); > > I guess I'll get a set of consecutive oids by this. > > You can make the number begin at arbitrary number, by > > 2-a) > # update main_table set position = ( (select oid::int4 from temp_table > where id = main_table.id ) - (select min(oid::int4) from temp_table) > + 1) ; > > I read that oid wraps around (after ~ billions) so you might want to > check your current oid. > > > > > Regards, > > Ben K. > Developer > http://benix.tamu.edu > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
> CREATE TABLE sorted (order_no SERIAL PRIMARY KEY, other columns...) > INSERT INTO sorted (columns) SELECT * FROM main_table INNER JOIN > key_table ON main_table.id = key_table.main_table_id WHERE key = 'param' ORDER > BY value SELECT > The SERIAL will automatically generate the order_no you want, which > corresponds to the position in the sorted set. > Then, to get the records in-order : > SELECT * FROM sorted ORDER BY order_no Good ... I just got myself into the habit of not recreating a table since I have to clean up permissions and what not. I guess it depends. Another version along that line ? # create sequence counterseq start 1; -- (set/reset whenever a counter is needed) # select main_table.*, nextval('counterseq') as position2 into sorted_main_table from main_table, keytable where main_table.id= keytable.main_table_id order by value; Regards, Ben K. Developer http://benix.tamu.edu
> Another version along that line ? > > # create sequence counterseq start 1; > -- (set/reset whenever a counter is needed) > > # select main_table.*, nextval('counterseq') as position2 > into sorted_main_table > from main_table, keytable where main_table.id = > keytable.main_table_id > order by value; You could also use generate_series(), but I don't know if it can generate unbounded series...
Hi PFC, the generate_series() seems to be the right thing I need in my scenario but I didn't figure out how to create a SELECT stattement that will contain sorted main_table.id in first column and generated series in second column which I would use to update the main_table.position. Is it possible? How can I do that? Or is there another approach to employ generate_series()? Miroslav Šulc PFC napsal(a): > >> Another version along that line ? >> >> # create sequence counterseq start 1; >> -- (set/reset whenever a counter is needed) >> >> # select main_table.*, nextval('counterseq') as position2 >> into sorted_main_table >> from main_table, keytable where main_table.id = >> keytable.main_table_id >> order by value; > > You could also use generate_series(), but I don't know if it can > generate unbounded series... > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Attachment
On May 6, 2006, at 17:13 , Miroslav Šulc wrote: > I have a table with cca 100,000 records. I need to hard-sort the > records > by a key from second table. By hard-sorting I mean storing position of > each row in the main table. Here is what my tables look like: > > main_table: id, name, position > key_table: id, main_table_id, key, value > > Here is how I need to sort the records: > SELECT * FROM main_table > INNER JOIN key_table ON main_table.id = key_table.main_table_id > WHERE key = 'param' > ORDER BY value I don't know about faster or more efficient, but here's an alternative that I haven't seen yet: Michael Glaesemann grzm seespotcode net