Thread: Cost: Big Tables vs. Organized Separation of Data
Hey all, I've been developing an electronic inventory system for a used clothing store using Postgres and PHP3, with Netscape as my graphical interface. And, as the database has grown, my queries (as one would logically expect) have been getting slower. In particular, I have a set of queries that allow the store employees to analyze the current inventory and the sales inventory. Some of these queries are required to do joins on several tables. And, they are also trying to do some simple calculations (table.price - table.cost). As I attempt to "fix" the slowness of the queries, I have come to the following cross-road: 1. If I re-organize the data, I would be able to perform my queries without executing joins on multiple tables per query. 2. As I re-organize the data, the database becomes less and less intuitive and (seemingly) less "normal". So, I guess my question is: how costly are joins? I've heard that Postgres pretty much "pukes" (in terms of speed) when you're trying to do anything more than 6 table joins in one query. This leads me to believe that joins are fairly costly... ???? Does anyone have any words of advice for me as I battle this? PS... I reallize that if I do the calculations with my scripting language and simply fetch data with my queries, I will cut down on the cost of the queries a little bit. Thanks, Bob
> 1. If I re-organize the data, I would be able to perform my queries > without executing joins on multiple tables per query. > > 2. As I re-organize the data, the database becomes less and less > intuitive and (seemingly) less "normal". > > So, I guess my question is: how costly are joins? I've heard that > Postgres pretty much "pukes" (in terms of speed) when you're trying > to do anything more than 6 table joins in one query. This leads > me to believe that joins are fairly costly... ???? > > Does anyone have any words of advice for me as I battle this? We are working speeding up large table joins right now. Try doing SET GEQO to set the value lower and see if that helps. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > > 1. If I re-organize the data, I would be able to perform my queries > > without executing joins on multiple tables per query. > > > > 2. As I re-organize the data, the database becomes less and less > > intuitive and (seemingly) less "normal". > > > > So, I guess my question is: how costly are joins? I've heard that > > Postgres pretty much "pukes" (in terms of speed) when you're trying > > to do anything more than 6 table joins in one query. This leads > > me to believe that joins are fairly costly... ???? > > > > Does anyone have any words of advice for me as I battle this? > > We are working speeding up large table joins right now. Try doing SET > GEQO to set the value lower and see if that helps. i've noticed a pretty drastic slowdown going from a 4 table join (instantaneous) to a 5 table join (15-20 seconds) i don't think i've tried 6 tables yet with the same database. is there a lower limit to where using GEQO doesn't really make sense in terms of speed (i.e., would setting GEQO to 5 be a toss up and 4 not make sense?) i'm guessing that the number of plans the optimizer checks without GEQO goes up factorially, while GEQO goes up fairly linearly (from my limited knowledge of genetic algorithms, basically you make a series of first guesses and gradually refine them and throw away the losers until you end up with the right one.) if my join is based purely on primary keys, shouldn't just about any plan work well, or at least well enough that it doesn't pay to make an exhaustive search of the plan space, making GEQO the best choice? i guess my question is "is there a rule of thumb for setting GEQO?" is there a reason it was set to 8 by default? does GEQO work better in some cases than others? (small tables, joins on non-indexed fields, etc.) it seems like i'm always learning something new with postgres -- i've never thought about this before; it's just one more thing for me to play around with...
Do you try 'vacuum analyze' and create indexes for those keys which are _most important_? It's very important. I can't talk just about PSQL parser/optimiser, but it can't choose appropriate joining order withouth knowing the key distribution. Well projected indexes and 'vacuum analyze' can improve your join dramatically. Simple rules: - create indexes for all UNIQ attributes you are used for the joining. - don't create any index for the attributes if this attribute have a lot of duplicated records for every attribute instance. For example: CREATE TABLE employee (enum int, branch int); CREATE TABLE empinfo (enum int, surname text, sex int); Create index for the branch (if you should select employee by the branch) and for the empinfo.enum. The worst thing you can do is creating index on 'empinfo.sex' because the frauded system can think there is 10,000 different sexes when there is only two of them. And use 'explain' statement to check if the system's behaviour is right. On Wed, 3 Feb 1999, Jeff Hoffmann wrote: > Date: Wed, 03 Feb 1999 09:14:37 -0600 > From: Jeff Hoffmann <jeff@remapcorp.com> > To: Bruce Momjian <maillist@candle.pha.pa.us> > Cc: pgsql-general@postgreSQL.org > Subject: Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data > > Bruce Momjian wrote: > > > > > 1. If I re-organize the data, I would be able to perform my queries > > > without executing joins on multiple tables per query. > > > > > > 2. As I re-organize the data, the database becomes less and less > > > intuitive and (seemingly) less "normal". > > > > > > So, I guess my question is: how costly are joins? I've heard that > > > Postgres pretty much "pukes" (in terms of speed) when you're trying > > > to do anything more than 6 table joins in one query. This leads > > > me to believe that joins are fairly costly... ???? > > > > > > Does anyone have any words of advice for me as I battle this? > > > > We are working speeding up large table joins right now. Try doing SET > > GEQO to set the value lower and see if that helps. > > i've noticed a pretty drastic slowdown going from a 4 table join > (instantaneous) to a 5 table join (15-20 seconds) i don't think i've > tried 6 tables yet with the same database. is there a lower limit to > where using GEQO doesn't really make sense in terms of speed (i.e., > would setting GEQO to 5 be a toss up and 4 not make sense?) i'm > guessing that the number of plans the optimizer checks without GEQO goes > up factorially, while GEQO goes up fairly linearly (from my limited > knowledge of genetic algorithms, basically you make a series of first > guesses and gradually refine them and throw away the losers until you > end up with the right one.) if my join is based purely on primary keys, > shouldn't just about any plan work well, or at least well enough that it > doesn't pay to make an exhaustive search of the plan space, making GEQO > the best choice? i guess my question is "is there a rule of thumb for > setting GEQO?" is there a reason it was set to 8 by default? does GEQO > work better in some cases than others? (small tables, joins on > non-indexed fields, etc.) > > it seems like i'm always learning something new with postgres -- i've > never thought about this before; it's just one more thing for me to play > around with... > > Aleksei Roudnev, Network Operations Center, Relcom, Moscow (+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 230-41-41, N 13729 (pager) (+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)
> i've noticed a pretty drastic slowdown going from a 4 table join > (instantaneous) to a 5 table join (15-20 seconds) i don't think i've > tried 6 tables yet with the same database. is there a lower limit to > where using GEQO doesn't really make sense in terms of speed (i.e., > would setting GEQO to 5 be a toss up and 4 not make sense?) i'm > guessing that the number of plans the optimizer checks without GEQO goes > up factorially, while GEQO goes up fairly linearly (from my limited > knowledge of genetic algorithms, basically you make a series of first > guesses and gradually refine them and throw away the losers until you > end up with the right one.) if my join is based purely on primary keys, > shouldn't just about any plan work well, or at least well enough that it > doesn't pay to make an exhaustive search of the plan space, making GEQO > the best choice? i guess my question is "is there a rule of thumb for > setting GEQO?" is there a reason it was set to 8 by default? does GEQO > work better in some cases than others? (small tables, joins on > non-indexed fields, etc.) The bottom line is that GEQO in 6.4.* and previous releases counted the number of tables as an indicator. The new 6.5, which will be going into beta soon, will use the number of tables _and_ indexes, making the GEQO start location much more reliable. The current default GEQO of 8 is too high for some queries, and too low for others, because it did not take into account the number of indexes involved. So, count the number of indexes and tables, and try a proper GEQO setting for that sum. Let me know what you find as a good GEQO number. This testing will require you to enable/disable GEQO with SET GEQO TO 'ON=1' and SET GEQO TO 'ON=9999' just before each query. What sum of tables and indexes seems to make GEQO a win? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026