Thread: rewrite in to exists?

rewrite in to exists?

From
LN Cisneros
Date:
I'm on 7.3.4 and this query gets horrible performance.  Is there a way to rewrite it with an exists or some way to get
betterperformance? 

select code, id, name, date_of_service
  from tbl
where date_of_service in
      (select date_of_service
         from tbl
        where xxx >= '29800'
          and xxx <= '29909'
          and code = 'XX')
  and client_code = 'XX'
order by  id, date_of_service;

Thanks!




Re: rewrite in to exists?

From
"Christopher Kings-Lynne"
Date:
> I'm on 7.3.4 and this query gets horrible performance.  Is there a way to
rewrite it with an exists or some way to get better performance?
>
> select code, id, name, date_of_service
>   from tbl
> where date_of_service in
>       (select date_of_service
>          from tbl
>         where xxx >= '29800'
>           and xxx <= '29909'
>           and code = 'XX')
>   and client_code = 'XX'
> order by  id, date_of_service;

????

Why can't you just go:

select code, id, name, date_of_service from tbl where xxx <= 29800 and xx >=
29909 and code='XX' and client_code='XX' order by id, date_of_service;

Or use a between clause is nice:

select code, id, name, date_of_service from tbl where xxx between 29800 and
29909 and code='XX' and client_code='XX' order by id, date_of_service;

But seriously - your query above is referencing 'tbl' twice - is that
correct, or is the tbl in the subselect supposed to be something different?

Chris


Many fields in one table or many tables?

From
"Alexander Priem"
Date:
Hi guys,

I am in the process of creating a database design in which LOTS of data need
to be modelled.

For instance, I need to store data about products. Every product has LOTS of
properties, well over a hundred.

So I'm wondering. What's the best approach here, performance wise? Just
create one Product table with well over a hundred columns? Or would it be
better to divide this over more tables and link them together via ID's? I
could for instance create tables Product, PriceInfo, Logistics, StorageInfo,
PackagingInfo and link them together via the same ID. This would be easier
to document (try to visualize a 100+ column table in a document!), but would
it impact performance? I tihnk maybe it would impact Select performance, but
Updating of products would maybe speed up a little...

All info about a product is unique for this product so records in PriceInfo,
Logistics, StorageInfo, PackagingInfo tables would map one to one to records
in the Product table.

Do any of you know if and how PostgreSQL would prefer one approach over the
other?

Thanks in advance,
Alexander Priem.


Re: rewrite in to exists?

From
Manfred Koizar
Date:
On Thu, 18 Sep 2003 13:23:37 +0800, "Christopher Kings-Lynne"
<chriskl@familyhealth.com.au> wrote:
>Why can't you just go:
>
>select code, id, name, date_of_service from tbl where xxx <= 29800 and xx >=
>29909 and code='XX' and client_code='XX' order by id, date_of_service;

Because (ignoring conditions on code and client_code for a moment) if
for a given date there is at least one row satisfying the condition on
xxx, the original query returns *all* rows having this date,
regardless of their xxx value.  For example:

 id |  date      |  xxx
----+------------+-------
  1 | 2003-01-01 | 10000        *
  2 | 2003-01-01 | 29800   *    *
  3 | 2003-01-01 | 30000        *
  4 | 2003-02-02 | 20000
  5 | 2003-03-03 | 29900   *    *


>> select code, id, name, date_of_service
>>   from tbl
>> where date_of_service in
>>       (select date_of_service
>>          from tbl
>>         where xxx >= '29800'
>>           and xxx <= '29909'
>>           and code = 'XX')
>>   and client_code = 'XX'
>> order by  id, date_of_service;

To the original poster:  You did not provide a lot of information, but
the following suggestions might give you an idea ...

SELECT code, id, date_of_service
  FROM tbl
 WHERE EXISTS (SELECT *
                 FROM tbl t2
                WHERE t2.xxx >= '29800' AND t2.xxx <= '29909'
                  AND t2.code = 'XX'
                  AND tbl.date_of_service = t2.date_of_service)
   AND client_code = 'XX'
 ORDER BY id, date_of_service;

SELECT t1.code, t1.id, t1.date_of_service
  FROM tbl t1 INNER JOIN
       (SELECT DISTINCT date_of_service
          FROM tbl
         WHERE xxx >= '29800' AND xxx <= '29909'
           AND code = 'XX'
       ) AS t2 ON (t1.date_of_service = t2.date_of_service)
 WHERE t1.client_code = 'XX'
 ORDER BY id, date_of_service;

SELECT DISTINCT t1.code, t1.id, t1.date_of_service
  FROM tbl AS t1 INNER JOIN tbl AS t2
       ON (t1.date_of_service = t2.date_of_service
       AND t2.xxx >= '29800' AND t2.xxx <= '29909'
       AND t2.code = 'XX')
 WHERE t1.client_code = 'XX'  -- might as well put this
                              -- condition into the ON clause
 ORDER BY id, date_of_service;

The last one assumes that there are no duplicates on code, id,
date_of_service in the desired result.

Servus
 Manfred

Re: rewrite in to exists?

From
LN Cisneros
Date:
On Thu, 18 Sep 2003 13:23:37 +0800, "Christopher Kings-Lynne"
<snip>

>To the original poster:  You did not provide a lot of information, but
>the following suggestions might give you an idea ...
<snip>
>

Yes, sorry about that. But in my query  for a set of dates returned from the subquery I would then like to get all
recordsthat match this set of dates (ordered). 

I believe this query will work and hopefully speed it up (the "IN" query is extremely slow)...I give this one a try:

>SELECT t1.code, t1.id, t1.date_of_service
> FROM tbl t1 INNER JOIN
>       (SELECT DISTINCT date_of_service
>         FROM tbl
>         WHERE xxx >= '29800' AND xxx <= '29909'
>          AND code = 'XX'
>      ) AS t2 ON (t1.date_of_service = t2.date_of_service)
> WHERE t1.client_code = 'XX'
> ORDER BY id, date_of_service;

A question I have is is the "DISTINCT" really going to help or is it just going to throw another sort into the mix
makingit slower? 

Thanks for the help!

Laurette



Re: Many fields in one table or many tables?

From
Josh Berkus
Date:
Alexander,

> I am in the process of creating a database design in which LOTS of data
> need to be modelled.
>
> For instance, I need to store data about products. Every product has LOTS
> of properties, well over a hundred.
<snip>
> Do any of you know if and how PostgreSQL would prefer one approach over the
> other?

Queston 1:  Do all products have all of these properties, or do some/many/most
not have some properties?   If the answer is the former, then a single table,
however broad, is the logical construct.  If the latter, than several tables
makes more sense: why create NULL columns for stuff you could just leave out?

Question 2: Is it true that some properties will be updated *much* (100x) more
frequently than others?   If so, it would make sense from a
performance/postgresql standpoint to isolate those properties to related
table(s).  Keep in mind that this recommendation is strictly performance
related, and is not necessarily the best relational design.

Suggestion 3: There was an issue in 7.3 with table rows which are overly broad
-- some problems with PSQL, I believe.   It would be worth searching for, as
I cannot remember what the limit is where problems occurred.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: rewrite in to exists?

From
Josh Berkus
Date:
Laurette,

> >SELECT t1.code, t1.id, t1.date_of_service
> > FROM tbl t1 INNER JOIN
> >       (SELECT DISTINCT date_of_service
> >         FROM tbl
> >         WHERE xxx >= '29800' AND xxx <= '29909'
> >          AND code = 'XX'
> >      ) AS t2 ON (t1.date_of_service = t2.date_of_service)
> > WHERE t1.client_code = 'XX'
> > ORDER BY id, date_of_service;
>
> A question I have is is the "DISTINCT" really going to help or is it just
> going to throw another sort into the mix making it slower?

It's required if you expect the subquery to return multiple rows for each
date_of_service match.  Of course, you can also put the DISTINCT in the main
query instead; it depends on how many results you expect the subquery to
have.

Still, I'd suggest trying the EXISTS version first .... under most
circumstances, DISTINCT is pretty slow.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Find one record

From
Joseph Bove
Date:
Dear list,

I hope this to be a simple question. I have need to simply read the first
row in a given table. Right now,  I have some legacy code that selects all
rows in a table just to see if the first row has a certain value.

The code is seeking to see if an update has been run or not. A hypothetical
scenario would be: has an update been run to populate data into a new
column in a table. Neither the data nor any of the rows are consistently
known. So the test selects all rows, tests the first row and then ends if
the column has a value.

Does anyone have a better way to do this?

Regards


Re: Find one record

From
Josh Berkus
Date:
Joseph,

> I hope this to be a simple question. I have need to simply read the first
> row in a given table. Right now,  I have some legacy code that selects all
> rows in a table just to see if the first row has a certain value.

Your problem is conceptual: in SQL, there is no "first" row.

If you want to just pick a single row at random, do
SELECT * FROM table LIMIT 1;

Or if you have a primary key id, you could for example return the row with the
lowest id:

SELECT * FROM table ORDER BY id LIMIT 1;

> The code is seeking to see if an update has been run or not. A hypothetical
> scenario would be: has an update been run to populate data into a new
> column in a table. Neither the data nor any of the rows are consistently
> known. So the test selects all rows, tests the first row and then ends if
> the column has a value.

I'd write an ON UPDATE trigger, personally, to fire and write data somewhere
else whenever the table is updated.  Much more reliable ....

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Many fields in one table or many tables?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Suggestion 3: There was an issue in 7.3 with table rows which are overly broad
> -- some problems with PSQL, I believe.

Not sure about PSQL, but I think there still are some performance issues
in the backend with SELECTs involving more than a couple hundred
targetlist entries.  These are probably fixable at not-very-large effort
but we haven't made any consistent push to find and fix the trouble
spots.  The issues that I recall are O(N^2) problems (doubly nested
loops) so the performance with ~100 entries is no problem but it gets
rapidly worse above that.  You could hit this even with ~100-column
tables if you try to select all columns from a join of two or more.

            regards, tom lane

Re: rewrite in to exists?

From
LN Cisneros
Date:
Thanks Josh!

But, the EXISTS version doesn't really give me what I want...all rows in tbl that match the date of the subquery.

But, using the DISTINCT does make sense.

Thanks again to all who helped!

-----Original Message-----
From: Josh Berkus <josh@agliodbs.com>
Sent: Sep 18, 2003 10:34 AM
To: LN Cisneros <chulat@mail.com>, LN Cisneros <lnsea@earthlink.net>,
    Manfred Koizar <mkoi-pg@aon.at>,
    Christopher Kings-Lynne <chriskl@familyhealth.com.au>
Cc: LN Cisneros <chulat@mail.com>, pgsql-performance@postgresql.org
Subject: Re: [PERFORM] rewrite in to exists?

Laurette,

> >SELECT t1.code, t1.id, t1.date_of_service
> > FROM tbl t1 INNER JOIN
> >       (SELECT DISTINCT date_of_service
> >         FROM tbl
> >         WHERE xxx >= '29800' AND xxx <= '29909'
> >          AND code = 'XX'
> >      ) AS t2 ON (t1.date_of_service = t2.date_of_service)
> > WHERE t1.client_code = 'XX'
> > ORDER BY id, date_of_service;
>
> A question I have is is the "DISTINCT" really going to help or is it just
> going to throw another sort into the mix making it slower?

It's required if you expect the subquery to return multiple rows for each
date_of_service match.  Of course, you can also put the DISTINCT in the main
query instead; it depends on how many results you expect the subquery to
have.

Still, I'd suggest trying the EXISTS version first .... under most
circumstances, DISTINCT is pretty slow.

--
Josh Berkus
Aglio Database Solutions
San Francisco




Re: rewrite in to exists?

From
Manfred Koizar
Date:
On Thu, 18 Sep 2003 12:27:23 -0700 (GMT-07:00), LN Cisneros
<lnsea@earthlink.net> wrote:
>But, the EXISTS version doesn't

Laurette,
looking at that SELECT statement again I can't see what's wrong with
it.  One of us is missing something ;-)

> really give me what I want...

Can you elaborate?

SELECT code, id, date_of_service
  FROM tbl
 WHERE EXISTS (SELECT *
                 FROM tbl t2
                WHERE t2.xxx >= '29800' AND t2.xxx <= '29909'
                  AND t2.code = 'XX'
                  AND tbl.date_of_service = t2.date_of_service) -- (!)
   AND client_code = 'XX'
 ORDER BY id, date_of_service;

>all rows in tbl that
 ^^^
Well, all that have client_code = 'XX', as in your original query.

> match the date of the subquery.

The matching is done by the line with the (!) comment.

Servus
 Manfred