It seems to me this is a simple problem, but the solution eludes me.
I have a table:
bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not
nulldefault now(), amount float not null, balance float not null, primary key(transaction_id)
)
I need to get the most recent transaction for each customer. I need only
the transaction ID, but the entire row would be best.
I have two solutions, both of which are too slow for use in my
interactive web-based interface:
Solution1: Outer left self join:
SELECT ba1.*
FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2 ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
WHERE ba2.ts IS NULL;
This query works great on tables of less than a few thousand rows. For
my 300k row table, it takes several hours.
Solution2: max-concat trick
SELECT split_part(max( extract(EPOCH from ts)::VARCHAR || ' ' || transaction_id::VARCHAR), ' ', 2)::INT
FROM bank_account
GROUP BY customer_id
This is an ugly and obviously inefficient solution, but it does the job
in about 1/2 hour. Still too long though.
I've been working on this problem for days, and consulting friends. No
elegant, fast solution is presenting itself. As I said, I feel I'm not
seeing the obvious solution in front of my face.
In the mean-time I can use this query to do the job on a per-customer
basis:
select *
from bank_account
where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
bank_account.customer_id);
However, doing this for all 40,000 customers is not workable as a manual
process. My last resort is to do it this way to pre-generate the report,
but I'd far rather do it real-time.
Help! My brain hurts!
--
Erik G. Burrows - KG6HEA www.erikburrows.com
PGP Key: http://www.erikburrows.com/egb@erikburrows.com.pgpkey