Thread: rewrite in to exists?
I'm on 7.3.4 and this query gets horrible performance. Is there a way to rewrite it with an exists or some way to get betterperformance? select code, id, name, date_of_service from tbl where date_of_service in (select date_of_service from tbl where xxx >= '29800' and xxx <= '29909' and code = 'XX') and client_code = 'XX' order by id, date_of_service; Thanks!
> I'm on 7.3.4 and this query gets horrible performance. Is there a way to rewrite it with an exists or some way to get better performance? > > select code, id, name, date_of_service > from tbl > where date_of_service in > (select date_of_service > from tbl > where xxx >= '29800' > and xxx <= '29909' > and code = 'XX') > and client_code = 'XX' > order by id, date_of_service; ???? Why can't you just go: select code, id, name, date_of_service from tbl where xxx <= 29800 and xx >= 29909 and code='XX' and client_code='XX' order by id, date_of_service; Or use a between clause is nice: select code, id, name, date_of_service from tbl where xxx between 29800 and 29909 and code='XX' and client_code='XX' order by id, date_of_service; But seriously - your query above is referencing 'tbl' twice - is that correct, or is the tbl in the subselect supposed to be something different? Chris
Hi guys, I am in the process of creating a database design in which LOTS of data need to be modelled. For instance, I need to store data about products. Every product has LOTS of properties, well over a hundred. So I'm wondering. What's the best approach here, performance wise? Just create one Product table with well over a hundred columns? Or would it be better to divide this over more tables and link them together via ID's? I could for instance create tables Product, PriceInfo, Logistics, StorageInfo, PackagingInfo and link them together via the same ID. This would be easier to document (try to visualize a 100+ column table in a document!), but would it impact performance? I tihnk maybe it would impact Select performance, but Updating of products would maybe speed up a little... All info about a product is unique for this product so records in PriceInfo, Logistics, StorageInfo, PackagingInfo tables would map one to one to records in the Product table. Do any of you know if and how PostgreSQL would prefer one approach over the other? Thanks in advance, Alexander Priem.
On Thu, 18 Sep 2003 13:23:37 +0800, "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> wrote: >Why can't you just go: > >select code, id, name, date_of_service from tbl where xxx <= 29800 and xx >= >29909 and code='XX' and client_code='XX' order by id, date_of_service; Because (ignoring conditions on code and client_code for a moment) if for a given date there is at least one row satisfying the condition on xxx, the original query returns *all* rows having this date, regardless of their xxx value. For example: id | date | xxx ----+------------+------- 1 | 2003-01-01 | 10000 * 2 | 2003-01-01 | 29800 * * 3 | 2003-01-01 | 30000 * 4 | 2003-02-02 | 20000 5 | 2003-03-03 | 29900 * * >> select code, id, name, date_of_service >> from tbl >> where date_of_service in >> (select date_of_service >> from tbl >> where xxx >= '29800' >> and xxx <= '29909' >> and code = 'XX') >> and client_code = 'XX' >> order by id, date_of_service; To the original poster: You did not provide a lot of information, but the following suggestions might give you an idea ... SELECT code, id, date_of_service FROM tbl WHERE EXISTS (SELECT * FROM tbl t2 WHERE t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX' AND tbl.date_of_service = t2.date_of_service) AND client_code = 'XX' ORDER BY id, date_of_service; SELECT t1.code, t1.id, t1.date_of_service FROM tbl t1 INNER JOIN (SELECT DISTINCT date_of_service FROM tbl WHERE xxx >= '29800' AND xxx <= '29909' AND code = 'XX' ) AS t2 ON (t1.date_of_service = t2.date_of_service) WHERE t1.client_code = 'XX' ORDER BY id, date_of_service; SELECT DISTINCT t1.code, t1.id, t1.date_of_service FROM tbl AS t1 INNER JOIN tbl AS t2 ON (t1.date_of_service = t2.date_of_service AND t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX') WHERE t1.client_code = 'XX' -- might as well put this -- condition into the ON clause ORDER BY id, date_of_service; The last one assumes that there are no duplicates on code, id, date_of_service in the desired result. Servus Manfred
On Thu, 18 Sep 2003 13:23:37 +0800, "Christopher Kings-Lynne" <snip> >To the original poster: You did not provide a lot of information, but >the following suggestions might give you an idea ... <snip> > Yes, sorry about that. But in my query for a set of dates returned from the subquery I would then like to get all recordsthat match this set of dates (ordered). I believe this query will work and hopefully speed it up (the "IN" query is extremely slow)...I give this one a try: >SELECT t1.code, t1.id, t1.date_of_service > FROM tbl t1 INNER JOIN > (SELECT DISTINCT date_of_service > FROM tbl > WHERE xxx >= '29800' AND xxx <= '29909' > AND code = 'XX' > ) AS t2 ON (t1.date_of_service = t2.date_of_service) > WHERE t1.client_code = 'XX' > ORDER BY id, date_of_service; A question I have is is the "DISTINCT" really going to help or is it just going to throw another sort into the mix makingit slower? Thanks for the help! Laurette
Alexander, > I am in the process of creating a database design in which LOTS of data > need to be modelled. > > For instance, I need to store data about products. Every product has LOTS > of properties, well over a hundred. <snip> > Do any of you know if and how PostgreSQL would prefer one approach over the > other? Queston 1: Do all products have all of these properties, or do some/many/most not have some properties? If the answer is the former, then a single table, however broad, is the logical construct. If the latter, than several tables makes more sense: why create NULL columns for stuff you could just leave out? Question 2: Is it true that some properties will be updated *much* (100x) more frequently than others? If so, it would make sense from a performance/postgresql standpoint to isolate those properties to related table(s). Keep in mind that this recommendation is strictly performance related, and is not necessarily the best relational design. Suggestion 3: There was an issue in 7.3 with table rows which are overly broad -- some problems with PSQL, I believe. It would be worth searching for, as I cannot remember what the limit is where problems occurred. -- Josh Berkus Aglio Database Solutions San Francisco
Laurette, > >SELECT t1.code, t1.id, t1.date_of_service > > FROM tbl t1 INNER JOIN > > (SELECT DISTINCT date_of_service > > FROM tbl > > WHERE xxx >= '29800' AND xxx <= '29909' > > AND code = 'XX' > > ) AS t2 ON (t1.date_of_service = t2.date_of_service) > > WHERE t1.client_code = 'XX' > > ORDER BY id, date_of_service; > > A question I have is is the "DISTINCT" really going to help or is it just > going to throw another sort into the mix making it slower? It's required if you expect the subquery to return multiple rows for each date_of_service match. Of course, you can also put the DISTINCT in the main query instead; it depends on how many results you expect the subquery to have. Still, I'd suggest trying the EXISTS version first .... under most circumstances, DISTINCT is pretty slow. -- Josh Berkus Aglio Database Solutions San Francisco
Dear list, I hope this to be a simple question. I have need to simply read the first row in a given table. Right now, I have some legacy code that selects all rows in a table just to see if the first row has a certain value. The code is seeking to see if an update has been run or not. A hypothetical scenario would be: has an update been run to populate data into a new column in a table. Neither the data nor any of the rows are consistently known. So the test selects all rows, tests the first row and then ends if the column has a value. Does anyone have a better way to do this? Regards
Joseph, > I hope this to be a simple question. I have need to simply read the first > row in a given table. Right now, I have some legacy code that selects all > rows in a table just to see if the first row has a certain value. Your problem is conceptual: in SQL, there is no "first" row. If you want to just pick a single row at random, do SELECT * FROM table LIMIT 1; Or if you have a primary key id, you could for example return the row with the lowest id: SELECT * FROM table ORDER BY id LIMIT 1; > The code is seeking to see if an update has been run or not. A hypothetical > scenario would be: has an update been run to populate data into a new > column in a table. Neither the data nor any of the rows are consistently > known. So the test selects all rows, tests the first row and then ends if > the column has a value. I'd write an ON UPDATE trigger, personally, to fire and write data somewhere else whenever the table is updated. Much more reliable .... -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Suggestion 3: There was an issue in 7.3 with table rows which are overly broad > -- some problems with PSQL, I believe. Not sure about PSQL, but I think there still are some performance issues in the backend with SELECTs involving more than a couple hundred targetlist entries. These are probably fixable at not-very-large effort but we haven't made any consistent push to find and fix the trouble spots. The issues that I recall are O(N^2) problems (doubly nested loops) so the performance with ~100 entries is no problem but it gets rapidly worse above that. You could hit this even with ~100-column tables if you try to select all columns from a join of two or more. regards, tom lane
Thanks Josh! But, the EXISTS version doesn't really give me what I want...all rows in tbl that match the date of the subquery. But, using the DISTINCT does make sense. Thanks again to all who helped! -----Original Message----- From: Josh Berkus <josh@agliodbs.com> Sent: Sep 18, 2003 10:34 AM To: LN Cisneros <chulat@mail.com>, LN Cisneros <lnsea@earthlink.net>, Manfred Koizar <mkoi-pg@aon.at>, Christopher Kings-Lynne <chriskl@familyhealth.com.au> Cc: LN Cisneros <chulat@mail.com>, pgsql-performance@postgresql.org Subject: Re: [PERFORM] rewrite in to exists? Laurette, > >SELECT t1.code, t1.id, t1.date_of_service > > FROM tbl t1 INNER JOIN > > (SELECT DISTINCT date_of_service > > FROM tbl > > WHERE xxx >= '29800' AND xxx <= '29909' > > AND code = 'XX' > > ) AS t2 ON (t1.date_of_service = t2.date_of_service) > > WHERE t1.client_code = 'XX' > > ORDER BY id, date_of_service; > > A question I have is is the "DISTINCT" really going to help or is it just > going to throw another sort into the mix making it slower? It's required if you expect the subquery to return multiple rows for each date_of_service match. Of course, you can also put the DISTINCT in the main query instead; it depends on how many results you expect the subquery to have. Still, I'd suggest trying the EXISTS version first .... under most circumstances, DISTINCT is pretty slow. -- Josh Berkus Aglio Database Solutions San Francisco
On Thu, 18 Sep 2003 12:27:23 -0700 (GMT-07:00), LN Cisneros <lnsea@earthlink.net> wrote: >But, the EXISTS version doesn't Laurette, looking at that SELECT statement again I can't see what's wrong with it. One of us is missing something ;-) > really give me what I want... Can you elaborate? SELECT code, id, date_of_service FROM tbl WHERE EXISTS (SELECT * FROM tbl t2 WHERE t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX' AND tbl.date_of_service = t2.date_of_service) -- (!) AND client_code = 'XX' ORDER BY id, date_of_service; >all rows in tbl that ^^^ Well, all that have client_code = 'XX', as in your original query. > match the date of the subquery. The matching is done by the line with the (!) comment. Servus Manfred