Thread: query speed joining tables

query speed joining tables

From

Christopher Smith

Date:

13 January 2003, 16:20:54

<p>I have 4 tables that I need to query... 3 of the tables are links by the field userid.<p>below are table sql
fragments.as well as the query.  The subquery causes very high <p>CPU usages.  It typically returns ~3000 matches. Is
thereanother way to rewrite this?  <p>SELECT user_login.userid FROM user_login,user_details_p,user_match_details<br
/> WHEREuser_login.userid = user_details_p.userid AND<br />user_details_p.userid = user_match_details.userid AND<br
/>user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age >=18 AND <br
/>user_details_p.age<=50 AND<br />user_match_details.min_age <= 30 AND<br />user_match_details.max_age >= 30
AND<br/>user_details_p.ethnictype = 'Caucasian (White)' AND<br />strpos(user_match_details.ethnicity,'Asian') !=0
AND<br/>user_details_p.zipcode in (select zips_max.destination from zips_max where zips_max.origin='90210' )<br />order
byuser_login.last_login desc;<p> <p>               Table "public.user_login"<br />   Column   |          
Type          |   Modifiers   <br />------------+--------------------------+---------------<br /> userid     |
charactervarying(30)    | not null<br /> password   | character varying(30)    | not null<br /> email      | character
varying(50)   | not null<br /> last_login | timestamp with time zone | not null<br /> Indexes: user_login_pkey primary
keybtree (userid),<br />         <p><br />               Table "public.user_details_p"<br />     Column     
|          Type           |   Modifiers   <br />-----------------+--------------------------+---------------<br
/> userid         | character varying(30)    | not null<br /> gender          | character varying(1)     | <br
/> age            | integer                  | <br /> height          | character varying(10)    | <br /> ethnicty
pe     | character varying(30)    | <br /> education       | character varying(30)    | <br /> createdate      |
timestampwith time zone | default now()<br /> zipcode         | character varying(5)     | <br /> birthdate       |
date                    | default now()<br /> zodiac          | character varying(40)    | <br /> seekgender      |
charactervarying(2)     | <br />Indexes: user_details_p_pkey primary key btree (userid),<br />        
user_details_p_age_idxbtree (age),<br />      &nb sp;  user_details_p_ethnic_idx btree (ethnictype),<br />        
user_details_p_gender_idxbtree (gender),<br />         user_details_p_last_login_idx btree (last_login),<br />        
user_details_p_seekgender_idxbtree (seekgender),<br />         user_details_p_state_idx btree (state)<br />Foreign Key
constraints:$1 FOREIGN KEY (userid) REFERENCES user_login(userid) ON UPDATE NO ACTI<br />ON ON DELETE CASCADE<p><br
/>          Table "public.user_match_details"<br />      Column      |          Type          | Modifiers <br
/>------------------+------------------------+-----------<br/> userid           | character varying(30)  | not null<br
/> soughtmate      | character varying(200) | <br /> ethnicity        | character varying(200) | <br
/> marital_status  | character varying(200) | <br /> min_age          | integer                | <br
/> max_age         | integer   &nbs p;            | <br /> city             | character varying(50)  | <br
/> state           | character varying(2)   | <br /> zipcode          | integer                | <br
/> match_distance  | integer                | <br />Indexes: user_match_details_pkey primary key btree (userid)<br
/>ForeignKey constraints: $1 FOREIGN KEY (userid) REFERENCES user_login(userid) ON UPDATE NO ACTION ON DELETE
CASCADE<p>           Table "public.zips_300"<br />   Column    |         Type         | Modifiers <br
/>-------------+----------------------+-----------<br/> origin      | character varying(5) | <br /> destination |
charactervarying(5) | <br />Indexes: zips_300_origin_idx btree (origin)<p><br /> <p><br /><hr size="1" />Do you
Yahoo!?<br/><a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo! Mail Plus</a> - Powerful.
Affordable.<a href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Sign up now</a>

Re: query speed joining tables

From

Josh Berkus

Date:

13 January 2003, 16:44:37

Chris,

Here are probably your two main query problems:

> strpos(user_match_details.ethnicity,'Asian') !=0 AND

It is impossible for Postgres to use an index for the above as it is written.
I would suggest one of two improvements;
1) Modify the schema so that multiple ethnicity details are kept in a
sub-table rather than a free-form text field you have to search, or:
2) Create a cachable function for "contains_asian" and index on that.

> user_details_p.zipcode in (select zips_max.destination from zips_max where
zips_max.origin='90210' )
> order by user_login.last_login desc;

Use a "WHERE EXISTS" clause instead of "IN".

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: query speed joining tables

From

Christopher Smith

Date:

13 January 2003, 16:56:45

<p>I understand cachable functions but your proposed application is a little unclear. <p>is it possible to see an
example?<p>thanks in advance. <p> <b><i>Josh Berkus <josh@agliodbs.com></i></b> wrote: <blockquote
style="PADDING-LEFT:5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><br />Chris,<br /><br />Here are probably
yourtwo main query problems:<br /><br />> strpos(user_match_details.ethnicity,'Asian') !=0 AND<br /><br />It is
impossiblefor Postgres to use an index for the above as it is written. <br />I would suggest one of two
improvements;<br/>1) Modify the schema so that multiple ethnicity details are kept in a <br />sub-table rather than a
free-formtext field you have to search, or:<br />2) Create a cachable function for "contains_asian" and index on
that.<br/><br />> user_details_p.zipcode in (select zips_max.destination from zips_max where <br
/>zips_max.origin='90210')<br />> order by user_login.last_login desc;<br /><br />Use a "WHERE EXISTS" clause
insteadof "IN". <br /><br />-- <br />-Josh Berkus<br />Aglio Database Solutions<br />San Francisco<br /><br /><br
/>---------------------------(endof broadcast)---------------------------<br />TIP 3: if posting/reading through
Usenet,please send an appropriate<br />subscribe-nomail command to majordomo@postgresql.org so that your<br />message
canget through to the mailing list cleanly</blockquote><p><br /><hr size="1" />Do you Yahoo!?<br /><a
href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo!Mail Plus</a> - Powerful. Affordable. <a
href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Signup now</a>

Re: query speed joining tables

From

Tomasz Myrta

Date:

13 January 2003, 16:58:28

Christopher Smith wrote:

> I have 4 tables that I need to query... 3 of the tables are links by the
> field userid.
>
> below are table sql fragments. as well as the query.  The subquery
> causes very high
>
> CPU usages.  It typically returns ~3000 matches. Is there another way to
> rewrite this?
>
> SELECT user_login.userid FROM user_login,user_details_p,user_match_details
>  WHERE user_login.userid = user_details_p.userid AND
> user_details_p.userid = user_match_details.userid AND
> user_details_p.gender ='W' AND
> user_details_p.seekgender ='M' AND
> user_details_p.age >=18 AND
> user_details_p.age <=50 AND
> user_match_details.min_age <= 30 AND
> user_match_details.max_age >= 30 AND
> user_details_p.ethnictype = 'Caucasian (White)' AND
> strpos(user_match_details.ethnicity,'Asian') !=0 AND
> user_details_p.zipcode in (select zips_max.destination from zips_max
> where zips_max.origin='90210' )
> order by user_login.last_login desc;


explicit joins show better idea of your query and helps postgres 
choosing indexing.

select userid
from user_login join user_details using (userid) join user_match_details using (userid)
where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and
user_match_details.min_age<= 30 AND user_match_details.max_age >= 30 AND user_details_p.ethnictype = 'Caucasian
(White)'AND strpos(user_match_details.ethnicity,'Asian') !=0 AND user_details_p.zipcode in (select zips_max.destination
fromzips_max where zips_max.origin='90210' ) order by user_login.last_login desc;
 


How can I help you with subquery if you didn't write even zips_max 
definition?!?

If origin is unique value in that table, you can change subquery into 
join on "from" list.

>                Table "public.user_login"
>    Column   |           Type           |   Modifiers
> ------------+--------------------------+---------------
>  userid     | character varying(30)    | not null
>  password   | character varying(30)    | not null
>  email      | character varying(50)    | not null
>  last_login | timestamp with time zone | not null
>  Indexes: user_login_pkey primary key btree (userid),

Do you really need userid as varchar?
indexing on int4 or int8 would be much faster than varchar

Why do you have 3 tables? It looks like only one table would be enough. 
Remember, that null values don't take too much space.

>
>
>
>
>                Table "public.user_details_p"
>      Column      |           Type           |   Modifiers
> -----------------+--------------------------+---------------
>  userid          | character varying(30)    | not null
>  gender          | character varying(1)     |
>  age             | integer                  |
>  height          | character varying(10)    |
>  ethnicty pe      | character varying(30)    |
>  education       | character varying(30)    |
>  createdate      | timestamp with time zone | default now()
>  zipcode         | character varying(5)     |
>  birthdate       | date                     | default now()
>  zodiac          | character varying(40)    |
>  seekgender      | character varying(2)     |
> Indexes: user_details_p_pkey primary key btree (userid),
>          user_details_p_age_idx btree (age),
>       &nb sp;  user_details_p_ethnic_idx btree (ethnictype),
>          user_details_p_gender_idx btree (gender),

>
>          user_details_p_last_login_idx btree (last_login),
>          user_details_p_seekgender_idx btree (seekgender),
>          user_details_p_state_idx btree (state)

There is too many indexes -
index on gender (2 possible values) is useless,
index on ethnic (how many values - I think not too many?) is possibly 
useless

Consider creating single index on several fields for queries like this:
select
...
where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and

index on (age,seekgender,gender)

>
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTI
> ON ON DELETE CASCADE
>
>
>            Table "public.user_match_details"
>       Column      |          Type          | Modifiers
> ------------------+------------------------+-----------
>  userid           | character varying(30)  | not null
>  soughtmate       | character varying(200) |
>  ethnicity        | character varying(200) |
>  marital_status   | character varying(200) |
>  min_age          | integer                |
>  max_age          | integer   &nbs p;            |
>  city             | character varying(50)  |
>  state            | character varying(2)   |
>  zipcode          | integer                |
>  match_distance   | integer                |
> Indexes: user_match_details_pkey primary key btree (userid)
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE
>
>             Table "public.zips_300"
>    Column    |         Type         | Modifiers
> -------------+----------------------+-----------
>  origin      | character varying(5) |
>  destination | character varying(5) |
> Indexes: zips_300_origin_idx btree (origin)


If you need more specific answer, you have to add more information - how 
many records do you have in your tables and how many possible values do 
you use for example for zipcodes, ethnicity etc.

Regards,
Tomasz Myrta

Re: query speed joining tables

From

Christopher Smith

Date:

13 January 2003, 17:03:14

<p>my mistakes, zips_max should be zips_300. <p> <b><i>Tomasz Myrta <jasiek@klaster.net></i></b> wrote:
<blockquotestyle="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">Christopher Smith wrote:<br
/><br/>> I have 4 tables that I need to query... 3 of the tables are links by the<br />> field userid.<br
/>><br/>> below are table sql fragments. as well as the query. The subquery<br />> causes very high<br
/>><br/>> CPU usages. It typically returns ~3000 matches. Is there another way to<br />> rewrite this?<br
/>><br/>> SELECT user_login.userid FROM user_login,user_details_p,user_match_details<br />> WHERE
user_login.userid= user_details_p.userid AND<br />> user_details_p.userid = user_match_details.userid AND<br />>
user_details_p.gender='W' AND<br />> user_details_p.seekgender ='M' AND<br />> user_details_p.age >=18 AND<br
/>>user_details_p.age <=50 AND<br />> user_match_details.min_age <= 30 AND<br />>
user_match_details.max_age>= 30 AND<br />> user_details_p.ethnictype = 'Caucasian (White)' AND<br />>
strpos(user_match_details.ethnicity,'Asian') !=0 AND<br />> user_details_p.zipcode in (select zips_max.destination
fromzips_max<br />> where zips_max.origin='90210' )<br />> order by user_login.last_login desc;<br /><br /><br
/>explicitjoins show better idea of your query and helps postgres <br />choosing indexing.<br /><br />select userid<br
/>from<br/>user_login<br />join user_details using (userid)<br />join user_match_details using (userid)<br />where<br
/>user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age between 18 and 50
and<br/>user_match_details.min_age <= 30 AND<br />user_match_details.max_age >= 30 AND<br
/>user_details_p.ethnictype= 'Caucasian (White)' AND<br />strpos(user_match_details.ethnicity,'Asian') !=0 AND<br
/>user_details_p.zipcodein (select zips_max.destination from zips_max<br />where zips_max.origin='90210' )<br />order
byuser_login.last_login desc;<br /><br /><br />How can I help you with subquery if you didn't write even zips_max <br
/>definition?!?<br/><b r="R">If origin is unique value in that table, you can change subquery into <br />join on "from"
list.<br/><br />> Table "public.user_login"<br />> Column | Type | Modifiers<br />>
------------+--------------------------+---------------<br/>> userid | character varying(30) | not null<br />>
password| character varying(30) | not null<br />> email | character varying(50) | not null<br />> last_login |
timestampwith time zone | not null<br />> Indexes: user_login_pkey primary key btree (userid),<br /><br />Do you
reallyneed userid as varchar?<br />indexing on int4 or int8 would be much faster than varchar<br /><br />Why do you
have3 tables? It looks like only one table would be enough. <br />Remember, that null values don't take too much
space.<br/><br />><br />><br />><br />><br />> Table "public.user_details_p"<br />> Column | Type |
Modifiers<br/>> -----------------+--------------------------+---------------<br />> userid | character
varying(30)| not null<br />> gender | character varying(1) |<br />> age | integer |<br />> height | character
varying(10)|<br />> ethnicty pe | character varying(30) |<br />> education | character varying(30) |<br />>
createdate| timestamp with time zone | default now()<br />> zipcode | character varying(5) |<br />> birthdate |
date| default now()<br />> zodiac | character varying(40) |<br />> seekgender | character varying(2) |<br />>
Indexes:user_details_p_pkey primary key btree (userid),<br />> user_details_p_age_idx btree (age),<br />> &nb
sp;user_details_p_ethnic_idx btree (ethnictype),<br />> user_details_p_gender_idx btree (gender),<br /><br />><br
/>>user_details_p_last_login_idx btree (last_login),<br />> user_details_p_seekgender_idx btree (seekgender),<br
/>>user_details_p_state_idx btree (state)<br /><br />There is too many indexes -<br />index on gender (2 possible
values)is useless,<br />index on ethnic (how many values - I think not too many?) is possibly <br />useless<br /><br
/>Consider creating single index on several fields for queries like this:<br />select<br />...<br />where<br
/>user_details_p.gender='W' AND<br />user_details_p.seekgender ='M' AND<br />user_details_p.age between 18 and 50
and<br/><br />index on (age,seekgender,gender)<br /><br />><br />> Foreign Key constraints: $1 FOREIGN KEY
(userid)REFERENCES<br />> user_login(userid) ON UPDATE NO ACTI<br />> ON ON DELETE CASCADE<br />><br />><br
/>>Table "public.user_match_details"<br />> Column | Type | Modifiers<br />>
------------------+------------------------+-----------<br/>> userid | character varying(30) | not null<br />>
soughtmate| character varying(200) |<br />> ethnicity | character varying(200) |<br />> marital_status |
charactervarying(200) |<br />> min_age | integer |<br />> max_age | integer &nbs p; |<br />> city |
charactervarying(50) |<br />> state | character varying(2) |<br />> zipcode | integer |<br />> match_distance
|integer |<br />> Indexes: user_match_ details_pkey primary key btree (userid)<br />> Foreign Key constraints: $1
FOREIGNKEY (userid) REFERENCES<br />> user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE<br />><br />>
Table"public.zips_300"<br />> Column | Type | Modifiers<br />>
-------------+----------------------+-----------<br/>> origin | character varying(5) |<br />> destination |
charactervarying(5) |<br />> Indexes: zips_300_origin_idx btree (origin)<br /><br /><br />If you need more specific
answer,you have to add more information - how <br />many records do you have in your tables and how many possible
valuesdo <br />you use for example for zipcodes, ethnicity etc.<br /><br />Regards,<br />Tomasz Myrta<br
/></b></blockquote><p><br/><hr size="1" />Do you Yahoo!?<br /><a
href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Yahoo!Mail Plus</a> - Powerful. Affordable. <a
href="http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com">Signup now</a>

Re: query speed joining tables

From

Tomasz Myrta

Date:

14 January 2003, 02:08:44

Christopher Smith wrote:
>my mistakes, zips_max should be zips_300.>and>in my zip code table there are 120 million rows, example of the 
records >are>>origin           destination>===================>>90210           90222>90210           90234>90210
   96753
 


1.try to create index on both fields on zips_300 - origin and destination
zips_300_ind(origin,destination)
2.if you have only unique pairs in zips_300, this query should noticable 
speed up you example:

select userid    from    user_login UL    join user_details_p UD using (userid)    join user_match_details UM using
(userid)   join zips_300 Z on (Z.destination=UM.zipcode and Z.origin='90210')
 
where    UD.gender ='W' AND    UD.seekgender ='M' AND    UD.age between 18 and 50 and    UMD.min_age <= 30 AND
UMD.max_age>= 30 AND    UD.ethnictype = 'Caucasian (White)' AND    strpos(UMD.ethnicity,'Asian') !=0    order by
user_login.last_logindesc;
 

Next step to speed up your query is answering such question:
- How many values do I get if I ask one question.
Example:
gender='W' - 50% rows
seekgender='M' - 50% rows
ethnictype='Caucasian (White)' - 5%

Start indexing your tables on smallest values - in this situation - 
ethnictype. Consider using multi-column indexes.

Regards,
Tomasz Myrta

Re: query speed joining tables

From

"Josh Berkus"

Date:

15 January 2003, 13:43:11

Vernon, > In regarding of recomposing multivalued field as a separated table,
I> have observed some advantages and 
> disadvantages of the approach. Good on search as you have pointed out> and bad on updating data, two operations 
> needed: deletion and insertion. A query may need to join a lot of> table together. In Christ's personal application,
for
 
> example, there are many mulitvalued fields such as relationship> status other then ethnicity. There will be some very
long
 
> and complex queries. 
Hey, it's your database. In my 8-year experience as a professionalDBA, few considerations ever outweigh normalization
ina relationaldatabase. You are merely trading the immediate inconvenience of havingto construct complex queries and
data-savingfunctions for the
 
eventualhuge inconvenience (or possibly disaster) of having your data
corruptedor at least having to modify it by hand, row-by-row.
(Pardon me if I'm a little strident, but I've spend a good portion ofmy career cleaning up other's, and sometimes my
own,database designmistakes and I had to see a disaster-in-the-making repeated)
 
To put it another way: Your current strategy is saving a penny now inorder to pay a dollar tommorrow. 
For example, you currently store multiple ethnicities in a free-formtext field. What happens when:1) Your organization
decidesthey need to split "Asian" into "Chinese"and "Other Asian"?2) Someone types "aisan" by mistake?3) You stop
trackinganother ethnicity, and want to purge it from thedatabase?4) Your administrator decides that Ethnicity needs to
beordered as"primary ethnicity" and "other ethnicities"?5) You need to do complex queries like (Asian and/or Caucasian
butnotHispanic or African)? Your current strategy would require 4 seperatefunctional indexes to support that query, or
doa table scan with 4row-by-row fuzzy text matches ... slow and memory-intensive either
 
way.
As I said, it's your database, and if it's a low-budget projectdestined to be thrown away in 3 months, then go for it.
If,however,you expect this database to be around for a while, you owe it toyourself and your co-workers to design it
right.
If you want an education on database normalization, pick up FabianPascal's "Practical Issues in Database Design".
-Josh Berkus

Re: query speed joining tables

From

Vernon Wu

Date:

15 January 2003, 14:28:11

Hi, Josh,

I appreciate you share your experience here. I definitely don't have that many years' DBA experience behind, and are 
learning to get DB design right at the first place. 

What I stated is my observation on my project with over twenty multivalued detail tables. I have a selection query 
contained 200 characters, involving 10 tables, and using subquery. The performance is not bad after properly indexing,

least than 3 second (what the planner says). I will have longer queries later and hope they won't have any performance

problem.

Thank you for recommending another DB book after the "Database Design For Mere Mortals". I will read the book.

Vernon   


15/01/2003 9:50:22 AM, "Josh Berkus" <josh@agliodbs.com> wrote:

>Vernon,
>  
> > In regarding of recomposing multivalued field as a separated table,
>I
> > have observed some advantages and 
>> disadvantages of the approach. Good on search as you have pointed out
> > and bad on updating data, two operations 
>> needed: deletion and insertion. A query may need to join a lot of
> > table together. In Christ's personal application, for 
>> example, there are many mulitvalued fields such as relationship
> > status other then ethnicity. There will be some very long 
>> and complex queries. 
>
> Hey, it's your database. In my 8-year experience as a professional
> DBA, few considerations ever outweigh normalization in a relational
> database. You are merely trading the immediate inconvenience of having
> to construct complex queries and data-saving functions for the
>eventual
> huge inconvenience (or possibly disaster) of having your data
>corrupted
> or at least having to modify it by hand, row-by-row.
> 
>(Pardon me if I'm a little strident, but I've spend a good portion of
> my career cleaning up other's, and sometimes my own, database design
> mistakes and I had to see a disaster-in-the-making repeated)
> 
>To put it another way: Your current strategy is saving a penny now in
> order to pay a dollar tommorrow. 
>
> For example, you currently store multiple ethnicities in a free-form
> text field. What happens when:
> 1) Your organization decides they need to split "Asian" into "Chinese"
> and "Other Asian"?
> 2) Someone types "aisan" by mistake?
> 3) You stop tracking another ethnicity, and want to purge it from the
> database?
> 4) Your administrator decides that Ethnicity needs to be ordered as
> "primary ethnicity" and "other ethnicities"?
> 5) You need to do complex queries like (Asian and/or Caucasian but not
> Hispanic or African)? Your current strategy would require 4 seperate
> functional indexes to support that query, or do a table scan with 4
> row-by-row fuzzy text matches ... slow and memory-intensive either
>way.
> 
>As I said, it's your database, and if it's a low-budget project
> destined to be thrown away in 3 months, then go for it. If, however,
> you expect this database to be around for a while, you owe it to
> yourself and your co-workers to design it right.
> 
>If you want an education on database normalization, pick up Fabian
> Pascal's "Practical Issues in Database Design".
> 
>-Josh Berkus
>

Re: query speed joining tables

From

Josh Berkus

Date:

15 January 2003, 15:34:32

Vernon,

> What I stated is my observation on my project with over twenty multivalued
detail tables. I have a selection query
> contained 200 characters, involving 10 tables, and using subquery. The
performance is not bad after properly indexing,
> least than 3 second (what the planner says). I will have longer queries
later and hope they won't have any performance
> problem.

One trick for you is to create a custom aggregate for string contination for
each detail table, and that will allow you to list the values in the detail
table as if they were a continuous text string.   A concat aggregate is even
fast on PostgreSQL.


CREATE FUNCTION "comma_cat" (text,text) RETURNS text AS '
SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1WHEN $1 IS NULL OR $1 = '''' THEN $2ELSE $1 || '', '' || $2END
' LANGUAGE 'sql';

CREATE FUNCTION "br_cat" (text, text) RETURNS text AS '
SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1
WHEN $1 IS NULL OR $1 = '''' THEN $2
ELSE $1 || ''<br>'' || $2
END
' LANGUAGE 'sql';

--create aggregate with html <breaks> between items
CREATE AGGREGATE br_list ( BASETYPE = text, SFUNC = br_cat, STYPE = text,
INITCOND = '' );

--create aggregate with commas between items
CREATE AGGREGATE comma_list ( BASETYPE = text, SFUNC = comma_cat, STYPE =
text,
INITCOND = '' );

-Josh Berkus

Re: query speed joining tables

From

"Josh Berkus"

Date:

16 January 2003, 12:46:26

Vernon,

> What I stated is my observation on my project with over twenty
> multivalued detail tables. I have a selection query 
> contained 200 characters, involving 10 tables, and using subquery.
> The performance is not bad after properly indexing, 
> least than 3 second (what the planner says). I will have longer
> queries later and hope they won't have any performance 
> problem.

Keep in mind that the complexity is all on your end, not the users'.You can construct VIEWs, FUNCTIONs, TRIGGERs and
RULEswhich will make
 
the actual sophistication (i.e., 20 "detail tables") appear to the user
exactly as if there was only one flatfile table.

Frequently in database design, the design which is good for efficiency
and data integrity ... the "nromalized" design ... is baffling to
users.   Fortunately, both SQL92-99 and PostgreSQL give us a whole
toolkit to let us "abstract" the normalized design into something the
users can handle.  In fact, this is job #2 for the DBA in an
applications-development team (#1 is making sure all data is stored and
protected from corruption).

> Thank you for recommending another DB book after the "Database Design
> For Mere Mortals". I will read the book.

That's a great book, too.   Don't start on Pascal until *after* you
have finished "database design".

-Josh Berkus

Re: query speed joining tables

From

Vernon Wu

Date:

16 January 2003, 18:30:58

16/01/2003 9:46:30 AM, "Josh Berkus" <josh@agliodbs.com> wrote:

>Vernon,
>
>> What I stated is my observation on my project with over twenty
>> multivalued detail tables. I have a selection query 
>> contained 200 characters, involving 10 tables, and using subquery.
>> The performance is not bad after properly indexing, 
>> least than 3 second (what the planner says). I will have longer
>> queries later and hope they won't have any performance 
>> problem.
>
>Keep in mind that the complexity is all on your end, not the users'.
> You can construct VIEWs, FUNCTIONs, TRIGGERs and RULEs which will make
>the actual sophistication (i.e., 20 "detail tables") appear to the user
>exactly as if there was only one flatfile table.
>

Well, my current position is a DB design as well as a DB user. I'm doing J2EE development without EJB. I currently 
have two ways of building a query. One is to set up  a query string as a static string. This method is similar with the
View

in DB, but in application layer (Date Access Object). I apply this type of query strings on insertion, selection,
updating,

and deletion operations of a DB table. 

The other way to build a query string is used on selection operation for multiple table joined and/or involved. A query

string is built dynmically due to whether or not any fields are examined. The characteristic of the application is that

among of many fields a user may only want search on a few selected fields. I think this approach is better than to have

all fields listed and fill in "%" for fields the user doesn't want to search on. (Please correct me if I'm wrong). But
the

building query function is as long as more than one hundred lines.

>Frequently in database design, the design which is good for efficiency
>and data integrity ... the "nromalized" design ... is baffling to
>users.   Fortunately, both SQL92-99 and PostgreSQL give us a whole
>toolkit to let us "abstract" the normalized design into something the
>users can handle.  In fact, this is job #2 for the DBA in an
>applications-development team (#1 is making sure all data is stored and
>protected from corruption).
>

Please elaborate the above statement. I don't know any 'toolkit to let us "abstract" the normalized design into 
something the users can handle', other than something like View.

>> Thank you for recommending another DB book after the "Database Design
>> For Mere Mortals". I will read the book.
>
>That's a great book, too.   Don't start on Pascal until *after* you
>have finished "database design".

I waited for the book from the local library for more than a month, but only took me less than a hour to scan over the

whole book and grip the multivalued table design idea.

>
>-Josh Berkus
>

Vernon

Re: query speed joining tables

From

Josh Berkus

Date:

16 January 2003, 18:43:27

Vernon,

> The other way to build a query string is used on selection operation for
multiple table joined and/or involved. A query
> string is built dynmically due to whether or not any fields are examined.
The characteristic of the application is that
> among of many fields a user may only want search on a few selected fields. I
think this approach is better than to have
> all fields listed and fill in "%" for fields the user doesn't want to search
on. (Please correct me if I'm wrong).

You're correct.

>But the
> building query function is as long as more than one hundred lines.

Sure.  It's a question of whether you want to spend your DBA time during the
design phase, or when you're using and administering it in production.  My
general experience is that every extra hour well spent on good DB design
saves you 20-40 hours of admin, data rescue, and by-hand correction when the
database is in production.

> Please elaborate the above statement. I don't know any 'toolkit to let us
"abstract" the normalized design into
> something the users can handle', other than something like View.

VIEWs, TRIGGERs, RULEs and FUNCTIONs.  WIth 7.3.1, SCHEMA as well.  Using only
these structures, I have been able to build entire applications where my PHP
programmer never needs to know the intracacies of the database.  Instead, he
is given an API for views and data manipulation functions.

> I waited for the book from the local library for more than a month, but only
took me less than a hour to scan over the
> whole book and grip the multivalued table design idea.

Hmmm.  I'll need to look at it again.  If he's suggesting that it's a good
idea to put a delimited list in a field, I'll need to stop recommending that
book.

--
-Josh BerkusAglio Database SolutionsSan Francisco