Thread: query speed joining tables
<p>I have 4 tables that I need to query... 3 of the tables are links by the field userid.<p>below are table sql fragments.as well as the query. The subquery causes very high <p>CPU usages. It typically returns ~3000 matches. Is thereanother way to rewrite this? <p>SELECT user_login.userid FROM user_login,user_details_p,user_match_details<br /> WHEREuser_login.userid = user_details_p.userid AND<br />user_details_p.userid = user_match_details.userid AND<br />user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age >=18 AND <br />user_details_p.age<=50 AND<br />user_match_details.min_age <= 30 AND<br />user_match_details.max_age >= 30 AND<br/>user_details_p.ethnictype = 'Caucasian (White)' AND<br />strpos(user_match_details.ethnicity,'Asian') !=0 AND<br/>user_details_p.zipcode in (select zips_max.destination from zips_max where zips_max.origin='90210' )<br />order byuser_login.last_login desc;<p> <p> Table "public.user_login"<br /> Column | Type | Modifiers <br />------------+--------------------------+---------------<br /> userid | charactervarying(30) | not null<br /> password | character varying(30) | not null<br /> email | character varying(50) | not null<br /> last_login | timestamp with time zone | not null<br /> Indexes: user_login_pkey primary keybtree (userid),<br /> <p><br /> Table "public.user_details_p"<br /> Column | Type | Modifiers <br />-----------------+--------------------------+---------------<br /> userid | character varying(30) | not null<br /> gender | character varying(1) | <br /> age | integer | <br /> height | character varying(10) | <br /> ethnicty pe | character varying(30) | <br /> education | character varying(30) | <br /> createdate | timestampwith time zone | default now()<br /> zipcode | character varying(5) | <br /> birthdate | date | default now()<br /> zodiac | character varying(40) | <br /> seekgender | charactervarying(2) | <br />Indexes: user_details_p_pkey primary key btree (userid),<br /> user_details_p_age_idxbtree (age),<br /> &nb sp; user_details_p_ethnic_idx btree (ethnictype),<br /> user_details_p_gender_idxbtree (gender),<br /> user_details_p_last_login_idx btree (last_login),<br /> user_details_p_seekgender_idxbtree (seekgender),<br /> user_details_p_state_idx btree (state)<br />Foreign Key constraints:$1 FOREIGN KEY (userid) REFERENCES user_login(userid) ON UPDATE NO ACTI<br />ON ON DELETE CASCADE<p><br /> Table "public.user_match_details"<br /> Column | Type | Modifiers <br />------------------+------------------------+-----------<br/> userid | character varying(30) | not null<br /> soughtmate | character varying(200) | <br /> ethnicity | character varying(200) | <br /> marital_status | character varying(200) | <br /> min_age | integer | <br /> max_age | integer &nbs p; | <br /> city | character varying(50) | <br /> state | character varying(2) | <br /> zipcode | integer | <br /> match_distance | integer | <br />Indexes: user_match_details_pkey primary key btree (userid)<br />ForeignKey constraints: $1 FOREIGN KEY (userid) REFERENCES user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE<p> Table "public.zips_300"<br /> Column | Type | Modifiers <br />-------------+----------------------+-----------<br/> origin | character varying(5) | <br /> destination | charactervarying(5) | <br />Indexes: zips_300_origin_idx btree (origin)<p><br /> <p><br /><hr size="1" />Do you Yahoo!?<br/><a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo! Mail Plus</a> - Powerful. Affordable.<a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Sign up now</a>
Chris, Here are probably your two main query problems: > strpos(user_match_details.ethnicity,'Asian') !=0 AND It is impossible for Postgres to use an index for the above as it is written. I would suggest one of two improvements; 1) Modify the schema so that multiple ethnicity details are kept in a sub-table rather than a free-form text field you have to search, or: 2) Create a cachable function for "contains_asian" and index on that. > user_details_p.zipcode in (select zips_max.destination from zips_max where zips_max.origin='90210' ) > order by user_login.last_login desc; Use a "WHERE EXISTS" clause instead of "IN". -- -Josh BerkusAglio Database SolutionsSan Francisco
<p>I understand cachable functions but your proposed application is a little unclear. <p>is it possible to see an example?<p>thanks in advance. <p> <b><i>Josh Berkus <josh@agliodbs.com></i></b> wrote: <blockquote style="PADDING-LEFT:5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><br />Chris,<br /><br />Here are probably yourtwo main query problems:<br /><br />> strpos(user_match_details.ethnicity,'Asian') !=0 AND<br /><br />It is impossiblefor Postgres to use an index for the above as it is written. <br />I would suggest one of two improvements;<br/>1) Modify the schema so that multiple ethnicity details are kept in a <br />sub-table rather than a free-formtext field you have to search, or:<br />2) Create a cachable function for "contains_asian" and index on that.<br/><br />> user_details_p.zipcode in (select zips_max.destination from zips_max where <br />zips_max.origin='90210')<br />> order by user_login.last_login desc;<br /><br />Use a "WHERE EXISTS" clause insteadof "IN". <br /><br />-- <br />-Josh Berkus<br />Aglio Database Solutions<br />San Francisco<br /><br /><br />---------------------------(endof broadcast)---------------------------<br />TIP 3: if posting/reading through Usenet,please send an appropriate<br />subscribe-nomail command to majordomo@postgresql.org so that your<br />message canget through to the mailing list cleanly</blockquote><p><br /><hr size="1" />Do you Yahoo!?<br /><a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo!Mail Plus</a> - Powerful. Affordable. <a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Signup now</a>
Christopher Smith wrote: > I have 4 tables that I need to query... 3 of the tables are links by the > field userid. > > below are table sql fragments. as well as the query. The subquery > causes very high > > CPU usages. It typically returns ~3000 matches. Is there another way to > rewrite this? > > SELECT user_login.userid FROM user_login,user_details_p,user_match_details > WHERE user_login.userid = user_details_p.userid AND > user_details_p.userid = user_match_details.userid AND > user_details_p.gender ='W' AND > user_details_p.seekgender ='M' AND > user_details_p.age >=18 AND > user_details_p.age <=50 AND > user_match_details.min_age <= 30 AND > user_match_details.max_age >= 30 AND > user_details_p.ethnictype = 'Caucasian (White)' AND > strpos(user_match_details.ethnicity,'Asian') !=0 AND > user_details_p.zipcode in (select zips_max.destination from zips_max > where zips_max.origin='90210' ) > order by user_login.last_login desc; explicit joins show better idea of your query and helps postgres choosing indexing. select userid from user_login join user_details using (userid) join user_match_details using (userid) where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and user_match_details.min_age<= 30 AND user_match_details.max_age >= 30 AND user_details_p.ethnictype = 'Caucasian (White)'AND strpos(user_match_details.ethnicity,'Asian') !=0 AND user_details_p.zipcode in (select zips_max.destination fromzips_max where zips_max.origin='90210' ) order by user_login.last_login desc; How can I help you with subquery if you didn't write even zips_max definition?!? If origin is unique value in that table, you can change subquery into join on "from" list. > Table "public.user_login" > Column | Type | Modifiers > ------------+--------------------------+--------------- > userid | character varying(30) | not null > password | character varying(30) | not null > email | character varying(50) | not null > last_login | timestamp with time zone | not null > Indexes: user_login_pkey primary key btree (userid), Do you really need userid as varchar? indexing on int4 or int8 would be much faster than varchar Why do you have 3 tables? It looks like only one table would be enough. Remember, that null values don't take too much space. > > > > > Table "public.user_details_p" > Column | Type | Modifiers > -----------------+--------------------------+--------------- > userid | character varying(30) | not null > gender | character varying(1) | > age | integer | > height | character varying(10) | > ethnicty pe | character varying(30) | > education | character varying(30) | > createdate | timestamp with time zone | default now() > zipcode | character varying(5) | > birthdate | date | default now() > zodiac | character varying(40) | > seekgender | character varying(2) | > Indexes: user_details_p_pkey primary key btree (userid), > user_details_p_age_idx btree (age), > &nb sp; user_details_p_ethnic_idx btree (ethnictype), > user_details_p_gender_idx btree (gender), > > user_details_p_last_login_idx btree (last_login), > user_details_p_seekgender_idx btree (seekgender), > user_details_p_state_idx btree (state) There is too many indexes - index on gender (2 possible values) is useless, index on ethnic (how many values - I think not too many?) is possibly useless Consider creating single index on several fields for queries like this: select ... where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and index on (age,seekgender,gender) > > Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES > user_login(userid) ON UPDATE NO ACTI > ON ON DELETE CASCADE > > > Table "public.user_match_details" > Column | Type | Modifiers > ------------------+------------------------+----------- > userid | character varying(30) | not null > soughtmate | character varying(200) | > ethnicity | character varying(200) | > marital_status | character varying(200) | > min_age | integer | > max_age | integer &nbs p; | > city | character varying(50) | > state | character varying(2) | > zipcode | integer | > match_distance | integer | > Indexes: user_match_details_pkey primary key btree (userid) > Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES > user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE > > Table "public.zips_300" > Column | Type | Modifiers > -------------+----------------------+----------- > origin | character varying(5) | > destination | character varying(5) | > Indexes: zips_300_origin_idx btree (origin) If you need more specific answer, you have to add more information - how many records do you have in your tables and how many possible values do you use for example for zipcodes, ethnicity etc. Regards, Tomasz Myrta
<p>my mistakes, zips_max should be zips_300. <p> <b><i>Tomasz Myrta <jasiek@klaster.net></i></b> wrote: <blockquotestyle="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">Christopher Smith wrote:<br /><br/>> I have 4 tables that I need to query... 3 of the tables are links by the<br />> field userid.<br />><br/>> below are table sql fragments. as well as the query. The subquery<br />> causes very high<br />><br/>> CPU usages. It typically returns ~3000 matches. Is there another way to<br />> rewrite this?<br />><br/>> SELECT user_login.userid FROM user_login,user_details_p,user_match_details<br />> WHERE user_login.userid= user_details_p.userid AND<br />> user_details_p.userid = user_match_details.userid AND<br />> user_details_p.gender='W' AND<br />> user_details_p.seekgender ='M' AND<br />> user_details_p.age >=18 AND<br />>user_details_p.age <=50 AND<br />> user_match_details.min_age <= 30 AND<br />> user_match_details.max_age>= 30 AND<br />> user_details_p.ethnictype = 'Caucasian (White)' AND<br />> strpos(user_match_details.ethnicity,'Asian') !=0 AND<br />> user_details_p.zipcode in (select zips_max.destination fromzips_max<br />> where zips_max.origin='90210' )<br />> order by user_login.last_login desc;<br /><br /><br />explicitjoins show better idea of your query and helps postgres <br />choosing indexing.<br /><br />select userid<br />from<br/>user_login<br />join user_details using (userid)<br />join user_match_details using (userid)<br />where<br />user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age between 18 and 50 and<br/>user_match_details.min_age <= 30 AND<br />user_match_details.max_age >= 30 AND<br />user_details_p.ethnictype= 'Caucasian (White)' AND<br />strpos(user_match_details.ethnicity,'Asian') !=0 AND<br />user_details_p.zipcodein (select zips_max.destination from zips_max<br />where zips_max.origin='90210' )<br />order byuser_login.last_login desc;<br /><br /><br />How can I help you with subquery if you didn't write even zips_max <br />definition?!?<br/><b r="R">If origin is unique value in that table, you can change subquery into <br />join on "from" list.<br/><br />> Table "public.user_login"<br />> Column | Type | Modifiers<br />> ------------+--------------------------+---------------<br/>> userid | character varying(30) | not null<br />> password| character varying(30) | not null<br />> email | character varying(50) | not null<br />> last_login | timestampwith time zone | not null<br />> Indexes: user_login_pkey primary key btree (userid),<br /><br />Do you reallyneed userid as varchar?<br />indexing on int4 or int8 would be much faster than varchar<br /><br />Why do you have3 tables? It looks like only one table would be enough. <br />Remember, that null values don't take too much space.<br/><br />><br />><br />><br />><br />> Table "public.user_details_p"<br />> Column | Type | Modifiers<br/>> -----------------+--------------------------+---------------<br />> userid | character varying(30)| not null<br />> gender | character varying(1) |<br />> age | integer |<br />> height | character varying(10)|<br />> ethnicty pe | character varying(30) |<br />> education | character varying(30) |<br />> createdate| timestamp with time zone | default now()<br />> zipcode | character varying(5) |<br />> birthdate | date| default now()<br />> zodiac | character varying(40) |<br />> seekgender | character varying(2) |<br />> Indexes:user_details_p_pkey primary key btree (userid),<br />> user_details_p_age_idx btree (age),<br />> &nb sp;user_details_p_ethnic_idx btree (ethnictype),<br />> user_details_p_gender_idx btree (gender),<br /><br />><br />>user_details_p_last_login_idx btree (last_login),<br />> user_details_p_seekgender_idx btree (seekgender),<br />>user_details_p_state_idx btree (state)<br /><br />There is too many indexes -<br />index on gender (2 possible values)is useless,<br />index on ethnic (how many values - I think not too many?) is possibly <br />useless<br /><br />Consider creating single index on several fields for queries like this:<br />select<br />...<br />where<br />user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age between 18 and 50 and<br/><br />index on (age,seekgender,gender)<br /><br />><br />> Foreign Key constraints: $1 FOREIGN KEY (userid)REFERENCES<br />> user_login(userid) ON UPDATE NO ACTI<br />> ON ON DELETE CASCADE<br />><br />><br />>Table "public.user_match_details"<br />> Column | Type | Modifiers<br />> ------------------+------------------------+-----------<br/>> userid | character varying(30) | not null<br />> soughtmate| character varying(200) |<br />> ethnicity | character varying(200) |<br />> marital_status | charactervarying(200) |<br />> min_age | integer |<br />> max_age | integer &nbs p; |<br />> city | charactervarying(50) |<br />> state | character varying(2) |<br />> zipcode | integer |<br />> match_distance |integer |<br />> Indexes: user_match_ details_pkey primary key btree (userid)<br />> Foreign Key constraints: $1 FOREIGNKEY (userid) REFERENCES<br />> user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE<br />><br />> Table"public.zips_300"<br />> Column | Type | Modifiers<br />> -------------+----------------------+-----------<br/>> origin | character varying(5) |<br />> destination | charactervarying(5) |<br />> Indexes: zips_300_origin_idx btree (origin)<br /><br /><br />If you need more specific answer,you have to add more information - how <br />many records do you have in your tables and how many possible valuesdo <br />you use for example for zipcodes, ethnicity etc.<br /><br />Regards,<br />Tomasz Myrta<br /></b></blockquote><p><br/><hr size="1" />Do you Yahoo!?<br /><a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo!Mail Plus</a> - Powerful. Affordable. <a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Signup now</a>
Christopher Smith wrote: >my mistakes, zips_max should be zips_300.>and>in my zip code table there are 120 million rows, example of the records >are>>origin destination>===================>>90210 90222>90210 90234>90210 96753 1.try to create index on both fields on zips_300 - origin and destination zips_300_ind(origin,destination) 2.if you have only unique pairs in zips_300, this query should noticable speed up you example: select userid from user_login UL join user_details_p UD using (userid) join user_match_details UM using (userid) join zips_300 Z on (Z.destination=UM.zipcode and Z.origin='90210') where UD.gender ='W' AND UD.seekgender ='M' AND UD.age between 18 and 50 and UMD.min_age <= 30 AND UMD.max_age>= 30 AND UD.ethnictype = 'Caucasian (White)' AND strpos(UMD.ethnicity,'Asian') !=0 order by user_login.last_logindesc; Next step to speed up your query is answering such question: - How many values do I get if I ask one question. Example: gender='W' - 50% rows seekgender='M' - 50% rows ethnictype='Caucasian (White)' - 5% Start indexing your tables on smallest values - in this situation - ethnictype. Consider using multi-column indexes. Regards, Tomasz Myrta
Vernon, > In regarding of recomposing multivalued field as a separated table, I> have observed some advantages and > disadvantages of the approach. Good on search as you have pointed out> and bad on updating data, two operations > needed: deletion and insertion. A query may need to join a lot of> table together. In Christ's personal application, for > example, there are many mulitvalued fields such as relationship> status other then ethnicity. There will be some very long > and complex queries. Hey, it's your database. In my 8-year experience as a professionalDBA, few considerations ever outweigh normalization ina relationaldatabase. You are merely trading the immediate inconvenience of havingto construct complex queries and data-savingfunctions for the eventualhuge inconvenience (or possibly disaster) of having your data corruptedor at least having to modify it by hand, row-by-row. (Pardon me if I'm a little strident, but I've spend a good portion ofmy career cleaning up other's, and sometimes my own,database designmistakes and I had to see a disaster-in-the-making repeated) To put it another way: Your current strategy is saving a penny now inorder to pay a dollar tommorrow. For example, you currently store multiple ethnicities in a free-formtext field. What happens when:1) Your organization decidesthey need to split "Asian" into "Chinese"and "Other Asian"?2) Someone types "aisan" by mistake?3) You stop trackinganother ethnicity, and want to purge it from thedatabase?4) Your administrator decides that Ethnicity needs to beordered as"primary ethnicity" and "other ethnicities"?5) You need to do complex queries like (Asian and/or Caucasian butnotHispanic or African)? Your current strategy would require 4 seperatefunctional indexes to support that query, or doa table scan with 4row-by-row fuzzy text matches ... slow and memory-intensive either way. As I said, it's your database, and if it's a low-budget projectdestined to be thrown away in 3 months, then go for it. If,however,you expect this database to be around for a while, you owe it toyourself and your co-workers to design it right. If you want an education on database normalization, pick up FabianPascal's "Practical Issues in Database Design". -Josh Berkus
Hi, Josh, I appreciate you share your experience here. I definitely don't have that many years' DBA experience behind, and are learning to get DB design right at the first place. What I stated is my observation on my project with over twenty multivalued detail tables. I have a selection query contained 200 characters, involving 10 tables, and using subquery. The performance is not bad after properly indexing, least than 3 second (what the planner says). I will have longer queries later and hope they won't have any performance problem. Thank you for recommending another DB book after the "Database Design For Mere Mortals". I will read the book. Vernon 15/01/2003 9:50:22 AM, "Josh Berkus" <josh@agliodbs.com> wrote: >Vernon, > > > In regarding of recomposing multivalued field as a separated table, >I > > have observed some advantages and >> disadvantages of the approach. Good on search as you have pointed out > > and bad on updating data, two operations >> needed: deletion and insertion. A query may need to join a lot of > > table together. In Christ's personal application, for >> example, there are many mulitvalued fields such as relationship > > status other then ethnicity. There will be some very long >> and complex queries. > > Hey, it's your database. In my 8-year experience as a professional > DBA, few considerations ever outweigh normalization in a relational > database. You are merely trading the immediate inconvenience of having > to construct complex queries and data-saving functions for the >eventual > huge inconvenience (or possibly disaster) of having your data >corrupted > or at least having to modify it by hand, row-by-row. > >(Pardon me if I'm a little strident, but I've spend a good portion of > my career cleaning up other's, and sometimes my own, database design > mistakes and I had to see a disaster-in-the-making repeated) > >To put it another way: Your current strategy is saving a penny now in > order to pay a dollar tommorrow. > > For example, you currently store multiple ethnicities in a free-form > text field. What happens when: > 1) Your organization decides they need to split "Asian" into "Chinese" > and "Other Asian"? > 2) Someone types "aisan" by mistake? > 3) You stop tracking another ethnicity, and want to purge it from the > database? > 4) Your administrator decides that Ethnicity needs to be ordered as > "primary ethnicity" and "other ethnicities"? > 5) You need to do complex queries like (Asian and/or Caucasian but not > Hispanic or African)? Your current strategy would require 4 seperate > functional indexes to support that query, or do a table scan with 4 > row-by-row fuzzy text matches ... slow and memory-intensive either >way. > >As I said, it's your database, and if it's a low-budget project > destined to be thrown away in 3 months, then go for it. If, however, > you expect this database to be around for a while, you owe it to > yourself and your co-workers to design it right. > >If you want an education on database normalization, pick up Fabian > Pascal's "Practical Issues in Database Design". > >-Josh Berkus >
Vernon, > What I stated is my observation on my project with over twenty multivalued detail tables. I have a selection query > contained 200 characters, involving 10 tables, and using subquery. The performance is not bad after properly indexing, > least than 3 second (what the planner says). I will have longer queries later and hope they won't have any performance > problem. One trick for you is to create a custom aggregate for string contination for each detail table, and that will allow you to list the values in the detail table as if they were a continuous text string. A concat aggregate is even fast on PostgreSQL. CREATE FUNCTION "comma_cat" (text,text) RETURNS text AS ' SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1WHEN $1 IS NULL OR $1 = '''' THEN $2ELSE $1 || '', '' || $2END ' LANGUAGE 'sql'; CREATE FUNCTION "br_cat" (text, text) RETURNS text AS ' SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1 WHEN $1 IS NULL OR $1 = '''' THEN $2 ELSE $1 || ''<br>'' || $2 END ' LANGUAGE 'sql'; --create aggregate with html <breaks> between items CREATE AGGREGATE br_list ( BASETYPE = text, SFUNC = br_cat, STYPE = text, INITCOND = '' ); --create aggregate with commas between items CREATE AGGREGATE comma_list ( BASETYPE = text, SFUNC = comma_cat, STYPE = text, INITCOND = '' ); -Josh Berkus
Vernon, > What I stated is my observation on my project with over twenty > multivalued detail tables. I have a selection query > contained 200 characters, involving 10 tables, and using subquery. > The performance is not bad after properly indexing, > least than 3 second (what the planner says). I will have longer > queries later and hope they won't have any performance > problem. Keep in mind that the complexity is all on your end, not the users'.You can construct VIEWs, FUNCTIONs, TRIGGERs and RULEswhich will make the actual sophistication (i.e., 20 "detail tables") appear to the user exactly as if there was only one flatfile table. Frequently in database design, the design which is good for efficiency and data integrity ... the "nromalized" design ... is baffling to users. Fortunately, both SQL92-99 and PostgreSQL give us a whole toolkit to let us "abstract" the normalized design into something the users can handle. In fact, this is job #2 for the DBA in an applications-development team (#1 is making sure all data is stored and protected from corruption). > Thank you for recommending another DB book after the "Database Design > For Mere Mortals". I will read the book. That's a great book, too. Don't start on Pascal until *after* you have finished "database design". -Josh Berkus
16/01/2003 9:46:30 AM, "Josh Berkus" <josh@agliodbs.com> wrote: >Vernon, > >> What I stated is my observation on my project with over twenty >> multivalued detail tables. I have a selection query >> contained 200 characters, involving 10 tables, and using subquery. >> The performance is not bad after properly indexing, >> least than 3 second (what the planner says). I will have longer >> queries later and hope they won't have any performance >> problem. > >Keep in mind that the complexity is all on your end, not the users'. > You can construct VIEWs, FUNCTIONs, TRIGGERs and RULEs which will make >the actual sophistication (i.e., 20 "detail tables") appear to the user >exactly as if there was only one flatfile table. > Well, my current position is a DB design as well as a DB user. I'm doing J2EE development without EJB. I currently have two ways of building a query. One is to set up a query string as a static string. This method is similar with the View in DB, but in application layer (Date Access Object). I apply this type of query strings on insertion, selection, updating, and deletion operations of a DB table. The other way to build a query string is used on selection operation for multiple table joined and/or involved. A query string is built dynmically due to whether or not any fields are examined. The characteristic of the application is that among of many fields a user may only want search on a few selected fields. I think this approach is better than to have all fields listed and fill in "%" for fields the user doesn't want to search on. (Please correct me if I'm wrong). But the building query function is as long as more than one hundred lines. >Frequently in database design, the design which is good for efficiency >and data integrity ... the "nromalized" design ... is baffling to >users. Fortunately, both SQL92-99 and PostgreSQL give us a whole >toolkit to let us "abstract" the normalized design into something the >users can handle. In fact, this is job #2 for the DBA in an >applications-development team (#1 is making sure all data is stored and >protected from corruption). > Please elaborate the above statement. I don't know any 'toolkit to let us "abstract" the normalized design into something the users can handle', other than something like View. >> Thank you for recommending another DB book after the "Database Design >> For Mere Mortals". I will read the book. > >That's a great book, too. Don't start on Pascal until *after* you >have finished "database design". I waited for the book from the local library for more than a month, but only took me less than a hour to scan over the whole book and grip the multivalued table design idea. > >-Josh Berkus > Vernon
Vernon, > The other way to build a query string is used on selection operation for multiple table joined and/or involved. A query > string is built dynmically due to whether or not any fields are examined. The characteristic of the application is that > among of many fields a user may only want search on a few selected fields. I think this approach is better than to have > all fields listed and fill in "%" for fields the user doesn't want to search on. (Please correct me if I'm wrong). You're correct. >But the > building query function is as long as more than one hundred lines. Sure. It's a question of whether you want to spend your DBA time during the design phase, or when you're using and administering it in production. My general experience is that every extra hour well spent on good DB design saves you 20-40 hours of admin, data rescue, and by-hand correction when the database is in production. > Please elaborate the above statement. I don't know any 'toolkit to let us "abstract" the normalized design into > something the users can handle', other than something like View. VIEWs, TRIGGERs, RULEs and FUNCTIONs. WIth 7.3.1, SCHEMA as well. Using only these structures, I have been able to build entire applications where my PHP programmer never needs to know the intracacies of the database. Instead, he is given an API for views and data manipulation functions. > I waited for the book from the local library for more than a month, but only took me less than a hour to scan over the > whole book and grip the multivalued table design idea. Hmmm. I'll need to look at it again. If he's suggesting that it's a good idea to put a delimited list in a field, I'll need to stop recommending that book. -- -Josh BerkusAglio Database SolutionsSan Francisco