Re: SQL problem: bank account - Mailing list pgsql-sql

From Dmitry Tkach
Subject Re: SQL problem: bank account
Date
Msg-id 3EDB5F00.8080407@openratings.com
Whole thread Raw
In response to SQL problem: bank account  ("Erik G. Burrows" <eburrows@erikburrows.com>)
List pgsql-sql
I am afraid, this looks even uglier then your second solution, but 
should work, and be quicker...

-- You need this to avoid having to rescan the whole table for each 
customerid every time and resort the results
create index customer_txstamp_idx on bank_account (customer_id, ts);

select ba.* from bank_account ba where
transaction_id = (select transaction_id from bank_account where 
customer_id = ba.customer_id order by customer_id desc, ts desc limit 1);

Now, note that, if you have lots of different customers in that table, 
it will still take a while to fetch them all (although, it should still 
be a lot quicker then half an hour) - in that case, you may consider 
either getting them one-by-one (by adding ... and customer_id=? to the 
above query) or using cursors...

I hope, it helps...

Dima


Erik G. Burrows wrote:

>It seems to me this is a simple problem, but the solution eludes me.
>
>I have a table:
>
>bank_account (
>  transaction_id int not null serial,
>  customer_id int not null references customer(id),
>  ts timestamp not null default now(),
>  amount float not null,
>  balance float not null,
>  primary key(transaction_id)
>)
>
>I need to get the most recent transaction for each customer. I need only
>the transaction ID, but the entire row would be best.
>
>
>
>I have two solutions, both of which are too slow for use in my
>interactive web-based interface:
>
>Solution1: Outer left self join:
>SELECT
>  ba1.*
>FROM
>  bank_account ba1
>  LEFT OUTER JOIN bank_account ba2
>    ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
>WHERE
>  ba2.ts IS NULL;
>
>This query works great on tables of less than a few thousand rows. For
>my 300k row table, it takes several hours.
>
>
>Solution2: max-concat trick
>SELECT
>  split_part(max(
>      extract(EPOCH from ts)::VARCHAR || '    ' ||
>      transaction_id::VARCHAR), '    ', 2)::INT
>FROM
>  bank_account
>GROUP BY
>  customer_id
>
>This is an ugly and obviously inefficient solution, but it does the job
>in about 1/2 hour. Still too long though.
>
>
>
>I've been working on this problem for days, and consulting friends. No
>elegant, fast solution is presenting itself. As I said, I feel I'm not
>seeing the obvious solution in front of my face.
>
>In the mean-time I can use this query to do the job on a per-customer
>basis:
>
>select
>  *
>from
>  bank_account
>where
>  id = <the customer's ID>
>  and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
>bank_account.customer_id);
>
>
>However, doing this for all 40,000 customers is not workable as a manual
>process. My last resort is to do it this way to pre-generate the report,
>but I'd far rather do it real-time.
>
>Help! My brain hurts!
>
>  
>




pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: SQL problem: bank account
Next
From: Dmitry Tkach
Date:
Subject: Re: Using & - operator