Home > mailing lists

Thread: SQL Question

SQL Question

From

Ketema Harris

Date:

08 February 2006, 15:43:27

Hey guys I have a SQL teaser I could use a little assistance with.

Imagine a table like below:

rowid|typeid|personid

---------------------------

1 3 1

2 3 1

3 1 2

4 1 1

5 3 2

my desired output is:

person|# of rows of type 3|# of total rows|% of type 3

----------------------------------------------------------------

1 2 3 66.7

2 1 2 50

how can I achieve this output with one single query? sub selects are fine.

Thanks

Re: SQL Question

From

Jeffrey Melloy

Date:

08 February 2006, 16:56:34

Ketema Harris wrote:

> Hey guys I have a SQL teaser I could use a little assistance with.
>
> Imagine a table like below:
>
> rowid|typeid|personid
> ---------------------------
> 1                3            1
> 2                3            1
> 3                1            2
> 4                1            1
> 5                3            2
>
> my desired output is:
>
> person|# of rows of type 3|# of total rows|% of type 3
> ----------------------------------------------------------------
> 1                            2
> 3                        66.7
> 2                            1
> 2                        50
>
> how can I achieve this output with one single query?  sub selects are
> fine.
>
> Thanks

select person_id, (select count(*) from table b where typeid = 3 and
b.personid = a.personid) as type3, count(*) as total, type3 / total
from table a
group by person_id, (select count(*) from table b where typeid = 3 and
b.personid = a.personid);

Should do the trick.

And this, slightly strange version, should also do the trick.  Possibly
a little faster.

SELECT ev.person_id,
    count(*) - count(ev2.person_id),
    count(*),
    (count(*) - count(ev2.person_id)) / count(*)
FROM per.employment_v ev LEFT JOIN per.employment_v ev2
  ON(ev.person_id = ev2.person_id
     AND ev.position_category_id = ev2.position_category_id
     AND ev2.typeid <> 3)
group by ev.person_id;

Jeff

SQL Query gone wrong.. please help

From

Srinivas Iyyer

Date:

09 February 2006, 13:08:51

Dear group,

I have three tables:


table: sequence
seq_id | seq_name | seq_refseq
     1 | EI24     | NM_004879.3
     2 | NDRG2    | NM_201540.1
     3 | RARRES2  | NM_002889.2



table: gos
gos_id | go_id | go_term | go_class
11041 | GO:0006807 | nitrogen compound metabolism |P
11049 | GO:0004263 | chymotrypsin activity| F

table : go_seq


gos_id | seq_id
  13819 |  11408
  13819 |  11950
  13819 |  12822
  13819 |  11350




I want to query go_seq table, giving seq_name = 'EI24'


select seq_name, go_term
from sequence, gos, go_seq


where  go_seq.seq_id = go_seq.gos_id and
           sequence.seq_id = sequence.seq_name and
            sequence_seq_name = 'EI24';



I am wrong here,  unable to get it correctly.  Could
you please help here.


thanks
srini

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: SQL Query gone wrong.. please help

From

Murat Tasan

Date:

09 February 2006, 13:24:51

On 9 Feb 2006, at 12:08, Srinivas Iyyer wrote:

> Dear group,
>
> I have three tables:
>
>
> table: sequence
> seq_id | seq_name | seq_refseq
>      1 | EI24     | NM_004879.3
>      2 | NDRG2    | NM_201540.1
>      3 | RARRES2  | NM_002889.2
>
>
>
> table: gos
> gos_id | go_id | go_term | go_class
> 11041 | GO:0006807 | nitrogen compound metabolism |P
> 11049 | GO:0004263 | chymotrypsin activity| F
>
> table : go_seq
>
>
> gos_id | seq_id
>   13819 |  11408
>   13819 |  11950
>   13819 |  12822
>   13819 |  11350
>
>
>
>
> I want to query go_seq table, giving seq_name = 'EI24'
>
>
> select seq_name, go_term
> from sequence, gos, go_seq
>
>
> where  go_seq.seq_id = go_seq.gos_id and
>            sequence.seq_id = sequence.seq_name and
>             sequence_seq_name = 'EI24';
>

you are not joining over the relationship table (go_seq) correctly.
you want:

WHERE
    sequence.seq_name = 'EI24' AND
    sequence.seq_id = go_seq.seq_id AND
    go_seq.gos_id = gos.gos_id;

what you previously had was wrong in two locations:
"sequence.seq_id = sequence.seq_name" (i assume no sequence ID will
ever be quite like it's name) and
"go_seq.seq_id = go_seq.gos_id" (i assume no GO id will have the same
ID as a sequence).

>
> I am wrong here,  unable to get it correctly.  Could
> you please help here.
>

unfortunately (by the looks of your query), your understanding of SQL
is fundamentally flawed, which will lead to future confusion and
trouble.  i'd recommend picking up a decent book on SQL, or checking
out some online tutorials.

> thanks
> srini

hope that helped a bit!

murat

Re: SQL Query gone wrong.. please help

From

Srinivas Iyyer

Date:

09 February 2006, 13:50:37

Hi,
thank you for correcting me.  I agree, I am very weak
in connecting tables, I did not digest the connecting
logic, although based on my progra. experience i am
able to write plpgsql functions etc not relating to
sql query part.
I am trying to learn from Bruce momjian's book.

I have one more question, hope that is not problem for
you to help me out again.

How do I put another conditional that gos.go_cat = 'F'
only.

In the corrected query, I am getting all gos.go_term
but I want only if gos.go_cat is 'F'.


My version:

select seq_name , go_term,go_cat from sequence, gos,
go_seq where
            sequence.seq_name = 'EI24' AND
            gos.go_cat = 'F'AND
            sequence.seq_id = go_seq.seq_id AND
            go_seq.gos_id = gos.go_id AND
            gos.gos_id = gos.go_cat;




Thanks.
Sri





--- Murat Tasan <murat.tasan@cwru.edu> wrote:

>
> On 9 Feb 2006, at 12:08, Srinivas Iyyer wrote:
>
> > Dear group,
> >
> > I hav  e three tables:
> >
> >
> > table: sequence
> > seq_id | seq_name | seq_refseq
> >      1 | EI24     | NM_004879.3
> >      2 | NDRG2    | NM_201540.1
> >      3 | RARRES2  | NM_002889.2
> >
> >
> >
> > table: gos
> > gos_id | go_id | go_term | go_class
> > 11041 | GO:0006807 | nitrogen compound metabolism
> |P
> > 11049 | GO:0004263 | chymotrypsin activity| F
> >
> > table : go_seq
> >
> >
> > gos_id | seq_id
> >   13819 |  11408
> >   13819 |  11950
> >   13819 |  12822
> >   13819 |  11350
> >
> >
> >
> >
> > I want to query go_seq table, giving seq_name =
> 'EI24'
> >
> >
> > select seq_name, go_term
> > from sequence, gos, go_seq
> >
> >
> > where  go_seq.seq_id = go_seq.gos_id and
> >            sequence.seq_id = sequence.seq_name and
> >             sequence_seq_name = 'EI24';
> >
>
> you are not joining over the relationship table
> (go_seq) correctly.
> you want:
>
> WHERE
>     sequence.seq_name = 'EI24' AND
>     sequence.seq_id = go_seq.seq_id AND
>     go_seq.gos_id = gos.gos_id;
>
> what you previously had was wrong in two locations:
> "sequence.seq_id = sequence.seq_name" (i assume no
> sequence ID will
> ever be quite like it's name) and
> "go_seq.seq_id = go_seq.gos_id" (i assume no GO id
> will have the same
> ID as a sequence).
>
>
> >
> > I am wrong here,  unable to get it correctly.
> Could
> > you please help here.
> >
>
> unfortunately (by the looks of your query), your
> understanding of SQL
> is fundamentally flawed, which will lead to future
> confusion and
> trouble.  i'd recommend picking up a decent book on
> SQL, or checking
> out some online tutorials.
>
>
> > thanks
> > srini
>
>
> hope that helped a bit!
>
> murat
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: SQL Query gone wrong.. please help

From

Murat Tasan

Date:

09 February 2006, 15:00:15

On 9 Feb 2006, at 12:50, Srinivas Iyyer wrote:

> Hi,
> thank you for correcting me.  I agree, I am very weak
> in connecting tables, I did not digest the connecting
> logic, although based on my progra. experience i am
> able to write plpgsql functions etc not relating to
> sql query part.
> I am trying to learn from Bruce momjian's book.
>
> I have one more question, hope that is not problem for
> you to help me out again.
>
> How do I put another conditional that gos.go_cat = 'F'
> only.
>
> In the corrected query, I am getting all gos.go_term
> but I want only if gos.go_cat is 'F'.
>
>
> My version:
>
> select seq_name , go_term,go_cat from sequence, gos,
> go_seq where
>             sequence.seq_name = 'EI24' AND
>             gos.go_cat = 'F'AND
>             sequence.seq_id = go_seq.seq_id AND
>             go_seq.gos_id = gos.go_id AND
>             gos.gos_id = gos.go_cat;
>

remove that last line "gos.gos_id = gos.go_cat".  you've already done
the appropriate selection with "gos.go_cat = 'F'".

as you have it written now, i would suspect you are getting zero
results, because it's doubtful that the gos_id will ever equal the
go_cat.

Re: SQL Question

From

Ketema Harris

Date:

09 February 2006, 22:11:58

Thanks I eventually came to the same conclusion as your first
suggested answer, but don't really understand the second one.  Can you
explain why the second one works?  It is faster.  I looked at the
query plans and the second does seem like it would be much faster.

First:
Sort  (cost=99164.54..99165.04 rows=200 width=2)
  Sort Key: personid
  ->  HashAggregate  (cost=82905.40..99156.90 rows=200 width=2)
        ->  Seq Scan on test a  (cost=0.00..82885.00 rows=2040 width=2)
              SubPlan
                ->  Aggregate  (cost=40.61..40.62 rows=1 width=0)
                      ->  Seq Scan on test b  (cost=0.00..40.60 rows=1 width=0)
                            Filter: ((typeid = 3) AND (personid = $0))
        SubPlan
          ->  Aggregate  (cost=40.61..40.62 rows=1 width=0)
                ->  Seq Scan on test b  (cost=0.00..40.60 rows=1 width=0)
                      Filter: ((typeid = 3) AND (personid = $0))

Second:
Sort  (cost=158.59..159.09 rows=200 width=4)
  Sort Key: ev.personid
  ->  HashAggregate  (cost=142.45..150.95 rows=200 width=4)
        ->  Hash Left Join  (cost=45.65..106.75 rows=2040 width=4)
              Hash Cond: (("outer".personid = "inner".personid) AND
("outer".rowid = "inner".rowid))
              ->  Seq Scan on test ev  (cost=0.00..30.40 rows=2040 width=4)
              ->  Hash  (cost=35.50..35.50 rows=2030 width=4)
                    ->  Seq Scan on test ev2  (cost=0.00..35.50
rows=2030 width=4)
                          Filter: (typeid <> 3)

Ketema J. Harris
www.ketema.net
ketema@ketema.net

Re: SQL Question

From

Todd Lewis

Date:

10 February 2006, 01:49:46

Short answer the first query is in essence performing 3 queries to group
the data, you take a big performance hit having the select within a
select. In most cases you can drop the select within a select down to
the where section, either by a sub select or in this case a
relationship. Counting elements based upon a relationship that are true
is a lot quicker.

If you were dealing with a larger data set (say 1 million records) you
would see a real big difference where the 1st query could take hours or
never complete because it has sucked up all database resources. The
second one would still complete within minutes.

Ketema Harris wrote:
> Thanks I eventually came to the same conclusion as your first
> suggested answer, but don't really understand the second one.  Can you
> explain why the second one works?  It is faster.  I looked at the
> query plans and the second does seem like it would be much faster.
>
> First:
> Sort  (cost=99164.54..99165.04 rows=200 width=2)
>   Sort Key: personid
>   ->  HashAggregate  (cost=82905.40..99156.90 rows=200 width=2)
>         ->  Seq Scan on test a  (cost=0.00..82885.00 rows=2040 width=2)
>               SubPlan
>                 ->  Aggregate  (cost=40.61..40.62 rows=1 width=0)
>                       ->  Seq Scan on test b  (cost=0.00..40.60 rows=1 width=0)
>                             Filter: ((typeid = 3) AND (personid = $0))
>         SubPlan
>           ->  Aggregate  (cost=40.61..40.62 rows=1 width=0)
>                 ->  Seq Scan on test b  (cost=0.00..40.60 rows=1 width=0)
>                       Filter: ((typeid = 3) AND (personid = $0))
>
> Second:
> Sort  (cost=158.59..159.09 rows=200 width=4)
>   Sort Key: ev.personid
>   ->  HashAggregate  (cost=142.45..150.95 rows=200 width=4)
>         ->  Hash Left Join  (cost=45.65..106.75 rows=2040 width=4)
>               Hash Cond: (("outer".personid = "inner".personid) AND
> ("outer".rowid = "inner".rowid))
>               ->  Seq Scan on test ev  (cost=0.00..30.40 rows=2040 width=4)
>               ->  Hash  (cost=35.50..35.50 rows=2030 width=4)
>                     ->  Seq Scan on test ev2  (cost=0.00..35.50
> rows=2030 width=4)
>                           Filter: (typeid <> 3)
>
> Ketema J. Harris
> www.ketema.net
> ketema@ketema.net
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>