Re: Need help building this query - Mailing list pgsql-sql

From David Johnston
Subject Re: Need help building this query
Date
Msg-id 00e401cd4fdd$589b1e60$09d15b20$@yahoo.com
Whole thread Raw
In response to Need help building this query  (Rihad <rihad@stream.az>)
Responses Re: Need help building this query  (rihad@stream.az)
Re: Need help building this query  (rihad@stream.az)
List pgsql-sql
> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-
> owner@postgresql.org] On Behalf Of Rihad
> Sent: Thursday, June 21, 2012 1:49 PM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] Need help building this query
> 
> Hi, folks. I currently need to join two tables that lack primary keys, and
> columns used to distinguish each record can be duplicated. I need to build
> statistics over the data in those tables. Consider this:
> 
> 
> TableA:
> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date row
2:
> foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date row 3: foo:
123,
> bar: 456, baz: 789, amount: 10.99, date_of_op: date
> 
> TableB:
> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
> 
> 
> Columns foo + bar + baz are used to distinguish a performed "operation":
> TableA.date_of_op isn't, because it can lag behind TableB.
> 
> Not all different "operations" are in table B.
> Table B is just there so we know which "operations" are complete, so to
> speak (happening under external means and not under any of my control).
> 
> Now, for each operation (foo+bar+baz) in table A, only *one* row should be
> matched in table B, because it only has one matching row there.
> The other two in TableA should be considered unmatched.
> 
> Now the query should be able to get count(*) and sum(amount) every day
> for that day, considering that matched and unmatched operations should be
> counted separately. The report would look something like this:
> 
> TableA.date_of_op  TableB.date_of_op
> 2012-06-21            [empty]                  [count(*) and sum(amount)
> of all data in TableA for this day unmatched in TableB]
> 2012-06-21            2012-06-20            [count(*) and sum(amount) of
> all data in TableA matched in TableB for the 20-th]
> 2012-06-21            2012-06-19            [count(*) and sum(amount) of
> all data in TableA matched in TableB for the 19-th]
> 
> 
> Can this awkward thing be done in pure SQL, or I'd be better off using
> programming for this?
> 
> Thanks, I hope I could explain this.
> 


You seem to be describing a straight reconciliation between two tables.  My
current means of doing this are programmatically but for the simple case
pure SQL should be doable.  The main thing is that you have to distinguish
between "duplicate" records first and then match them up:

TableA Keys:

AA
AA
AA
AB
AB
AC

TableB Keys:
AA
AA
AB

First you use "ROW_NUMBER() OVER (PARTITION BY key)" to assign an integer
"sub-id" to every set of possible keys in both tables:

TableA-Sub:
AA-1
AA-2
AA-3
AB-1
AB-2
AC-1

TableB-Sub:
AA-1
AA-2
AB-1

Now, with these newly constructed key+sub-key values in place, you can
perform a simple LEFT (or possibly FULL) JOIN between tables A & B.

This makes no allowances for any of kind of desired date restriction on the
matching nor does it consider the eventual report that you wish to generate.
What this gives you is a listing of ALL rows in both tables with matched
records joined together into a single (NULL-less) row while unmatched
records will have one of the two resultant columns NULLed

SELECT tableA.subid_a, tableB.subid_b
FROM tableA FULL OUTER JOIN tableB ON (tableA.subid_a = tableB.subid_b)

Requires at least version 8.4

David J.




pgsql-sql by date:

Previous
From: Rihad
Date:
Subject: Need help building this query
Next
From: "Oliver d'Azevedo Christina"
Date:
Subject: Re: Need help building this query