Greetings,
I have 3 tables. Two of which (ta and tb) are different
"attributes" the third table (tc) can have.
tc is allowed up to three of each kind of "attributes".
e.g., 3 ta values and 2 tb values.
By assigning ta and tb attributes to each entry in tc you
are also "matching" tb attributes with corresponding ta
ones.
e.g.,
c_id = 8 ta1 = 1 <-- match --> tb1 = 3 ta2 = 3 <-- match --> tb2 = 2 ta3 = 13 <-- match
--> tb3 = 20
c_id = 9 ta1 = 2 <-- match --> tb1 = 3 ta2 = 5 <-- match --> tb2 = 3 ta3 = 9 <-- match
--> tb3 = 2
c_id = 10 ta1 = 3 <-- match --> tb1 = 2 ta2 = 1 <-- match --> tb2 = 3 ta3 = 9 <-- match
--> tb3 = 2
The task at hand is to figure out the number of ta
attributes each tb attribute corresponds with.
For above example I want a result set showing me:
tb.b_id ta.a_id count -------------------------- 2 2 -- matched w/ta.a_id = 3, 9 3
3 -- matched w/ta.a_id = 1, 2, 5 20 1 -- only matched w/ta.a_id = 13
-- tables
create table ta ( a_id int unique );
create table tb ( b_id int unique );
create table tc ( c_id int unique, ta1 int references ta( a_id ), ta2 int references ta(
a_id), ta3 int references ta( a_id ), tb1 int references tb( b_id ), tb2 int references tb(
b_id), tb3 int references tb( b_id ) );
-- sequences
create sequence ta_seq;
create sequence tb_seq;
create sequence tc_seq;
-- populate tables with some data
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into ta values ( NEXTVAL( 'ta_seq' ) );
insert into tb values ( NEXTVAL( 'tb_seq' ) );
insert into tb values ( NEXTVAL( 'tb_seq' ) );
insert into tb values ( NEXTVAL( 'tb_seq' ) );
insert into tb values ( NEXTVAL( 'tb_seq' ) );
-- ...
insert into tb values ( NEXTVAL( 'tb_seq' ) );
insert into tb values ( NEXTVAL( 'tb_seq' ) );
Don't read too much into the data as i don't think the
values chosen have anything to do with the problem. I
am just using a set of data sequences from my live
database.
Though, i'm willing to be surprised if someone's willing
to show correlation between the problem and the values
used.
-- ta tb
-- 1 2 3 1 2 3
insert into tc values ( NEXTVAL( 'tc_seq' ), 1, 1, 1, 1, 3, 2 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 3, 3, 3, 2, 1, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 4, 4, 4, 2, 1, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 5, 5, 5, 1, 2, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 5, 5, 5, 1, 2, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 1, 1, 1, 1, 2, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 4, 4, 4, 2, 1, 3 );
insert into tc values ( NEXTVAL( 'tc_seq' ), 3, 3, 3, 2, 1, 3 );
/*
insert into tc values ( NEXTVAL( 'tc_seq' ), 1, null, null, 1, null, null );
*/
This last insert is to show it is possible to have null
values for either of the ta or tb attributes in table tc.
But it is omitted to simplify the problem for now.
I am no SQL expert, as in I don't do it for a living.
Therefore, I would not doubt that there exist more
efficient ways to do this but this is one way I came up
with and it is quite possible that a SQL mistake is
causing the strange outputs I am observing.
But let me describe in English what i am attempting to
do in the SQL.
As I said earlier the task is to determine the unique
count of ta entries each tb.b_id is associated with in
the tc table.
I'm selecting on the tb table and doing sub selects on
the tc table where I attempt to count unique counts of
tb to ta matches.
First sub-select (aa) does a distinct count of tc.ta1
where tc.tb1 equals tb.b_id and the corresponding tc.ta1
field is not null.
Second sub-select (bb) I'm again doing a count of
distinct tc.ta2 where tc.tb2 is equal to tb.b_id, tc.ta2
is not null and tc.ta2 was not counted in the aa
sub-select.
Next (cc), once again do a count of distinct tc.ta3
where tc.tb3 is equal to tb.b_id, tc.ta3 is not null and
not in either of the two previous sub-selects (aa, bb).
The sum of sub-selects aa, bb and cc should be the value
I'm interested in (not shown in SQL statement).
The problem:
The result of the select is not the expected counts of ta
to tb.
The twist:
If the select statement is provided a where-clause where
each tb.b_id is restricted to individual tb.b_id values
then the correct/expected count of ta to tb is obtained
(See examples below).
-- selects
select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id
and tc.ta1 is not null ) as aa, ( select count( distinct tc.ta2 )
from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and
tc.ta2 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null ) )
asbb, ( select count( distinct tc.ta3 ) from tc where tc.tb3 = tb.b_id and
tc.ta3 is not null and tc.ta3 not in ( select
distincttc.ta1 from tc where tc.tb1 = tb.b_id and
tc.ta1 is not null ) and tc.ta3 not in (
select distinct tc.ta2 from tc where tc.tb2 = tb.b_id
and tc.ta2 is not null ) ) as cc from tb
;
-- results from selects
=# select * from tc order by c_id;c_id | ta1 | ta2 | ta3 | tb1 | tb2 | tb3
------+-----+-----+-----+-----+-----+----- 1 | 1 | 1 | 1 | 1 | 3 | 2 2 | 3 | 3 | 3 | 2 | 1 |
3 3 | 4 | 4 | 4 | 2 | 1 | 3 4 | 5 | 5 | 5 | 1 | 2 | 3 5 | 5 | 5 | 5 | 1 | 2 |
3 6 | 1 | 1 | 1 | 1 | 2 | 3 7 | 4 | 4 | 4 | 2 | 1 | 3 8 | 3 | 3 | 3 | 2 | 1 |
3
(8 rows)
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id < 4;b_id | aa | bb | cc
------+----+----+---- 1 | 2 | 2 | 0 2 | 2 | 0 | 0 3 | 0 | 0 | 0
(3 rows)
Obviously not the correct result for rows 2 and 3.
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id = 1;b_id | aa | bb | cc
------+----+----+---- 1 | 2 | 2 | 0
(1 row)
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id = 2;b_id | aa | bb | cc
------+----+----+---- 2 | 2 | 2 | 0
(1 row)
This is what we should've gotten in the first query for
the second row.
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id = 3;b_id | aa | bb | cc
------+----+----+---- 3 | 0 | 1 | 3
(1 row)
This is what we should've gotten in the first query for
the third row.
Further strangeness:
I would think that the following two select statements
should yeild the same results.
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id = 1 or tb.b_id = 2 or tb.b_id = 3;b_id | aa | bb | cc
------+----+----+---- 1 | 2 | 2 | 0 2 | 2 | 0 | 0 3 | 0 | 0 | 0
(3 rows)
Incorrect output. Same as the very first query.
=# select tb.b_id, ( select count( distinct tc.ta1 ) from tc where tc.tb1 = tb.b_id and tc.ta1 is not null) as
aa,( select count( distinct tc.ta2 ) from tc where tc.tb2 = tb.b_id and tc.ta2 is not null and tc.ta2 not in (
select distinct tc.ta1 from tc where tc.tb1 = tb.b_id and tc.ta1 is not null)) as bb, ( select count( distinct
tc.ta3) from tc where tc.tb3 = tb.b_id and tc.ta3 is not null and tc.ta3 not in ( select distinct tc.ta1 from tc
where tc.tb1 = tb.b_id and tc.ta1 is not null) and tc.ta3 not in ( select distinct tc.ta2 from tc where tc.tb2 =
tb.b_idand tc.ta2 is not null)) as cc from tb where tb.b_id = 3 or tb.b_id = 2 or tb.b_id = 1;b_id | aa | bb | cc
------+----+----+---- 3 | 0 | 1 | 3 2 | 2 | 2 | 0 1 | 2 | 2 | 0
(3 rows)
Strangely, this is the result we were expecting from our
original query!
Is it possible that the sub-selects are somehow
affecting the result sets?
Seen on:
% postmaster --version
postmaster (PostgreSQL) 7.1.3
and
% postmaster --version
postmaster (PostgreSQL) 7.1.2
-- drops
drop sequence tc_seq;
drop sequence tb_seq;
drop sequence ta_seq;
drop table tc;
drop table tb;
drop table ta;
I'd appreciate any comments you may have regarding this
post :-)
sidster
--
They who would sacrifice freedom for security will have neither. -Ben Franklin