Thanks for all the hints I've got!
What I've learned from this question is, that not everything
which seems to be a database problem should be solved using
databases.
(select * from data where <<prio is minimal per id pair>>)
The problem wasn't that there is no statement to get this data,
the problem was the large number of rows. Everything involving
a join of data with itself took plainly too long. Nested
selects only seem to be a syntactical escape.
However, there is one solution (PostgreSQL specific) using the
feature SELECT DISTINCT ON extending the SQL standard. This
select retrieves only the first occurrence obeying the implied order.
However, one has to be distinct on one *single* column.
(Feature request: SELECT DISTINCT ON (id1,id2) ... ):
alter table data add column combined text;
update data set combined = ( id1 || '|' ) || id2;
select
distinct on combined
id1,id2,<<data>>
from data
order by prio
But, the update already takes more than 30 min.
The following perl script (paraphrased) solved the same
problem in 40 sec (not counting "copy into" which I had
to do anyhow):
while( <> ) {
my( $id1, $id2, $prio, $data ) = split( /\t/, $_ );
my( $key ) = "$id1\t$id2";
my( $entry ) = $table{$key};
if( $entry eq "" ) {
$table{$key} = "$prio\t$data";
} else {
my( $oldprio, $junk ) = split( /\t/, $entry );
if( $prio < $oldprio ) {
$table{$key} = "$prio\t$data";
}
}
}
foreach $key (keys %table) {
printf "%s\t%s", $key, $table{$key};
}
Regards,
Holger Klawitter
--
Holger Klawitter +49 (0)251 484 0637
holger@klawitter.de http://www.klawitter.de/