Thread: SQL Help
Hello,
I already tried this same basic question with no response.... maybe I was too wordy. So here it is simplified.... what's the best way to write this query? I'm open to using stored procedures, but even then I don't know how I would conditionally populate a resultset (refcursor). Notice that in the first three cases, the expression is the exact same, only the return value is different. This seems inefficient....
select
(case when column1 = column2 then column3 end) as alias1,
(case when column1 = column2 then column4 end) as alias2,
(case when column1 = column2 then column5 end) as alias3,
(case when column6 = column7 then column8 end) as alias4
from
mytable
;
Any ideas?
Thanks!
Do you Yahoo!?
Free online calendar with sync to Outlook(TM).
On Fri, May 30, 2003 at 08:47:03 -0700, C F <tacnaboyz@yahoo.com> wrote: > Hello, > I already tried this same basic question with no response.... maybe I was too wordy. So here it is simplified.... what'sthe best way to write this query? I'm open to using stored procedures, but even then I don't know how I would conditionallypopulate a resultset (refcursor). Notice that in the first three cases, the expression is the exact same, onlythe return value is different. This seems inefficient.... > > > select > (case when column1 = column2 then column3 end) as alias1, > (case when column1 = column2 then column4 end) as alias2, > (case when column1 = column2 then column5 end) as alias3, > (case when column6 = column7 then column8 end) as alias4 > from > mytable > ; That seems pretty efficient. I wouldn't expect the repeated column1 = column2 tests to add much overhead. Unless there is more to the story this looks to be what you want to do.
CF, > select > (case when column1 = column2 then column3 end) as alias1, > (case when column1 = column2 then column4 end) as alias2, > (case when column1 = column2 then column5 end) as alias3, > (case when column6 = column7 then column8 end) as alias4 > from > mytable Given the information you've given us (including your first e-mail), I can't think of a better way to get the output you want than the above. You could work through subselects, etc, but I think that would be even less efficient. One thing you might want to think about is your whole approach to designing this application; it may be that you can simplify your queries by changing your table design. -- Josh Berkus Aglio Database Solutions San Francisco
On Friday 30 May 2003 4:47 pm, C F wrote: > Hello, > I already tried this same basic question with no response.... maybe I was > too wordy. I think it's more a case of nobody seeing a better way. > select > (case when column1 = column2 then column3 end) as alias1, > (case when column1 = column2 then column4 end) as alias2, > (case when column1 = column2 then column5 end) as alias3, > (case when column6 = column7 then column8 end) as alias4 You could write a set returning function, but you'd just end up doing the same thing. Can you explain what it is you're trying to acheive - real fields/schemas etc? -- Richard Huxton
If your concern is speed, the thing here is that you will have as many records as there are in "mytable", most of them (I think) with NULLs for alias1, alias2, alias3 and alias4. This way, there is no condition to filter any record, so postgreSQL will do a sequential scan over the whole table. If you are ok fetching the records that match and assumming that the all the others don't match, then the following will work: --column1=column2 SELECT column3 AS alias1, column4 AS alias2, column5 AS alias3, NULL AS alias4 FROM myTable WHERE column1=column2 UNION --column6=column7 SELECT NULL AS alias1, NULL AS alias2, NULL AS alias3, column8 AS alias4 FROM myTable WHERE column6=column7 Of course, you will need the necesary indexes. If this didn't give you a hint, please post a message with a link to your original message, so I can get a better idea of what you need. On Friday 30 May 2003 12:47, C F wrote: > Hello, > I already tried this same basic question with no response.... maybe I was > too wordy. So here it is simplified.... what's the best way to write this > query? I'm open to using stored procedures, but even then I don't know how > I would conditionally populate a resultset (refcursor). Notice that in the > first three cases, the expression is the exact same, only the return value > is different. This seems inefficient.... > > > select > (case when column1 = column2 then column3 end) as alias1, > (case when column1 = column2 then column4 end) as alias2, > (case when column1 = column2 then column5 end) as alias3, > (case when column6 = column7 then column8 end) as alias4 > from > mytable > ; > > Any ideas? > Thanks! > > > --------------------------------- > Do you Yahoo!? > Free online calendar with sync to Outlook(TM).
Richard Huxton <dev@archonet.com> writes: > On Friday 30 May 2003 4:47 pm, C F wrote: >> select >> (case when column1 = column2 then column3 end) as alias1, >> (case when column1 = column2 then column4 end) as alias2, >> (case when column1 = column2 then column5 end) as alias3, >> (case when column6 = column7 then column8 end) as alias4 > Can you explain what it is you're trying to acheive - real > fields/schemas etc? Yeah. I can't help feeling that this problem really means you've chosen a bad database schema. Given the problem as posed, I can only offer one suggestion: you could avoid writing out the case conditions N times by using a sub-select to factor out common subexpressions: select (case when cond1 then column3 end) as alias1, (case when cond1 then column4 end) as alias2, (case when cond1 then column5 end) as alias3, (case when cond2 then column8 end) as alias4 from (select column1 = column2 as cond1, column6 = column7 as cond2, column3, column4, ...from ... rest of query as in original... ) ss; One should not mistake this for an efficiency improvement, since more than likely the planner will flatten it into the same querytree as the original form. But it might be more readable or easier to code this way. But give us some details about your table setup and why you find yourself needing to do this in the first place. Maybe there's a better design. regards, tom lane
I was afraid someone was going to ask that :)
Okay, I'll do my best at explaining where I'm coming from....
I'm working on a mapping application it is user-configurable. What this means (as it pertains to this disucssion) is that the through a configuration file, the user is able to define the rules that the application will use to determine which geometries *and* attributes to pull from PG at various scales. These 'rules' that the user defines also contain other specifics such as how to symbolize the geometry, how to label the geometry etc. All of these parameters can either be hard coded by the user into the configuration file, or they can define an expression that will be used to dynamically pull it from the database. On top of all of this, we have two more levels of queries.
So... at the very top level, the user can define an expression that will determine that everything queried from this table will match these criteria... this is my WHERE clause.
Then below this level, various rules can be defined... each rule can have another definition that evaluates into a SQL expression. Now, I could make each rule an entirely separate query, but for one, they all share the exact same top level WHERE clause, and two, there could potentially be many many rules which I would think would cause severe performance issues. Let me give you an example...
Let's say we're mapping cities of the United States based on population... In other words, I want to symbolize the cities on the map based on population (larger symbol for larger populations, smaller symbol for smaller populations, etc). I also want to show the city names of the larger cities *only*. So, what the client application (client to PostgreSQL) needs is; the city location, which rules evaluate to true, and the city names of those larger cities (defined by a rule).
We have a table of cities of the world. So the top level filter (that all rules will share) is, "COUNTRY = 'USA'".
Rule 1 says that cities with a population over 1,000,000 will have a large symbol and be labeled with the city name. So the sql could look like this...
select longitude, latitude, city_name from city where country = 'USA' and population > 1000000;
... seems easy enough, but remember we can have an infinite number of rules (not really inifinite, but you get the point). So....
Rule2 says that cities with a population under 1,000,000 will have a small symbol (note, we do not care about the city name here). So, by itself, the SQL could look like this...
select longitude, latitude from city where country = 'USA' and population < 1000000;
Okay, for this simple example, I would have no problem doing two different queries (this example is extremely simplified compared to what is possible/likely). But what if the user wanted to give a different symbol for every population in 100,000 increments? If our range of populations was 100,000 to 5,000,000 that would be 50 queries! Not only would it be 50 queries, but it would be 50 queries using a nearly identical WHERE clause. So I thought it would be more efficient to combine the queries into something like the following...
select
longitude,
latitutde,
(case when population > 1000000 then true else false end) as rule1,
(case when population > 1000000 then city_name end) as label1,
(case when population < 1000000 then true else false end) as rule2
where
country = 'USA'
;
I could just only concern the SQL with the boolean values for the rules, and return all city names, and let the application simply discard them, but that seems like not a good thing to do for very large resultsets (and again, this is overly simplified, we could have many such columns full of uncessary data being returned). And by the way, that query cannot be written as something like...
(case when population > 1000000 then 'rule1' when population < 1000000 then 'rule2' end) as rules
... because the rules are NOT mutually exclusive, there can many positives.
Anyway, hopefully I didn't leave anything important out. It sounds like there's no obvious solution to avoiding multiple evaluations of the test expressions. The rules are relatively static once the config files are read in, so I could conceivably create stored procedures with a bunch of IF statements at that time. However, I'm not sure if in PG there is a way to dynamically populate the resulting recordset on the fly. I can think of 10 different ways accomplish what I'm trying to do, but hopefully someone has some ideas on what would be the best performing.
Sorry if it's information overload, but you tried to answer my questions, so I thought I should at least try to answer yours :)
Any thoughts much appreciated.
You could write a set returning function, but you'd just end up doing the same
thing. Can you explain what it is you're trying to acheive - real
fields/schemas etc?
--
Richard Huxton
Do you Yahoo!?
Free online calendar with sync to Outlook(TM).
On Friday 30 May 2003 7:01 pm, C F wrote: > I was afraid someone was going to ask that :) > Okay, I'll do my best at explaining where I'm coming from.... [snip long but detailed description I asked for -thanks] Right - I've done something similar to this before, and I ended up building my SQL on the fly with something like: rule_where_cond:rs | rulename | tblname | colname | test | test_value ----+----------|---------+---------+------+----------- 1 | us-only | city | country | = | USA Note the "rs" is "rule-set" where all criteria for a specific query have the same rs value. rule_sizing_cond:rs | rulename | tblname | colname | minval | maxval | resultval ----+----------+---------+------------+---------+---------+--------------- 1 | cityicon | city | population | 0| 1000000 | small_dot.jpg 1 | cityicon | city | population | 1000001 |99999999 | large_dot.jpg 1 | showname | city| population | 0 | 1000000 | false 1 | showname | city | population | 1000001 |99999999 | true This one gains you a lot because you just join to it rather than using CASE clauses. rule_select_cols:rs | rulename | tblname | colname | aliasname ----+----------+---------+-----------+----------- 1 | showname | city | city_name | label1 So you're query will now return the city_name as "label1" but you can get a flag from the "rule_sizing_cond" to say whether to show it or not. Now, it depends how complicated your conditions can be as to how complicated the setup of these tables is and also how complicated the query-builder can be. However, I have used something similar to build queries myself and it does provide you with a lot of flexibility. HTH -- Richard Huxton