Re: SQL problem: bank account - Mailing list pgsql-sql
From | Dmitry Tkach |
---|---|
Subject | Re: SQL problem: bank account |
Date | |
Msg-id | 3EDB5F00.8080407@openratings.com Whole thread Raw |
In response to | SQL problem: bank account ("Erik G. Burrows" <eburrows@erikburrows.com>) |
List | pgsql-sql |
I am afraid, this looks even uglier then your second solution, but should work, and be quicker... -- You need this to avoid having to rescan the whole table for each customerid every time and resort the results create index customer_txstamp_idx on bank_account (customer_id, ts); select ba.* from bank_account ba where transaction_id = (select transaction_id from bank_account where customer_id = ba.customer_id order by customer_id desc, ts desc limit 1); Now, note that, if you have lots of different customers in that table, it will still take a while to fetch them all (although, it should still be a lot quicker then half an hour) - in that case, you may consider either getting them one-by-one (by adding ... and customer_id=? to the above query) or using cursors... I hope, it helps... Dima Erik G. Burrows wrote: >It seems to me this is a simple problem, but the solution eludes me. > >I have a table: > >bank_account ( > transaction_id int not null serial, > customer_id int not null references customer(id), > ts timestamp not null default now(), > amount float not null, > balance float not null, > primary key(transaction_id) >) > >I need to get the most recent transaction for each customer. I need only >the transaction ID, but the entire row would be best. > > > >I have two solutions, both of which are too slow for use in my >interactive web-based interface: > >Solution1: Outer left self join: >SELECT > ba1.* >FROM > bank_account ba1 > LEFT OUTER JOIN bank_account ba2 > ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts >WHERE > ba2.ts IS NULL; > >This query works great on tables of less than a few thousand rows. For >my 300k row table, it takes several hours. > > >Solution2: max-concat trick >SELECT > split_part(max( > extract(EPOCH from ts)::VARCHAR || ' ' || > transaction_id::VARCHAR), ' ', 2)::INT >FROM > bank_account >GROUP BY > customer_id > >This is an ugly and obviously inefficient solution, but it does the job >in about 1/2 hour. Still too long though. > > > >I've been working on this problem for days, and consulting friends. No >elegant, fast solution is presenting itself. As I said, I feel I'm not >seeing the obvious solution in front of my face. > >In the mean-time I can use this query to do the job on a per-customer >basis: > >select > * >from > bank_account >where > id = <the customer's ID> > and ts = (select max(ts) from bank_account ba2 where ba2.customer_id = >bank_account.customer_id); > > >However, doing this for all 40,000 customers is not workable as a manual >process. My last resort is to do it this way to pre-generate the report, >but I'd far rather do it real-time. > >Help! My brain hurts! > > >