Thread: How to do this in Postgres

How to do this in Postgres

From
Holger Klawitter
Date:
24 November 1999, 04:47:22
Thanks for all the hints I've got!

What I've learned from this question is, that not everything
which seems to be a database problem should be solved using
databases.

(select * from data where <<prio is minimal per id pair>>)

The problem wasn't that there is no statement to get this data,
the problem was the large number of rows. Everything involving
a join of data with itself took plainly too long. Nested
selects only seem to be a syntactical escape.

However, there is one solution (PostgreSQL specific) using the
feature SELECT DISTINCT ON extending the SQL standard. This
select retrieves only the first occurrence obeying the implied order.
However, one has to be distinct on one *single* column.
(Feature request: SELECT DISTINCT ON (id1,id2) ... ):

    alter table data add column combined text;
        update data set combined = ( id1 || '|' ) || id2;
        select
                distinct on combined
                id1,id2,<<data>>
                from data
                order by prio

But, the update already takes more than 30 min.

The following perl script (paraphrased) solved the same
problem in 40 sec (not counting "copy into" which I had
to do anyhow):

while( <> ) {
    my( $id1, $id2, $prio, $data ) = split( /\t/, $_ );
    my( $key ) = "$id1\t$id2";
    my( $entry ) = $table{$key};
    if( $entry eq "" ) {
        $table{$key} = "$prio\t$data";
    } else {
        my( $oldprio, $junk ) = split( /\t/, $entry );
        if( $prio < $oldprio ) {
            $table{$key} = "$prio\t$data";
        }
    }
}
foreach $key (keys %table) {
    printf "%s\t%s", $key, $table{$key};
}

Regards,
    Holger Klawitter
--
Holger Klawitter                                     +49 (0)251 484 0637
holger@klawitter.de                             http://www.klawitter.de/