Thread: query execution question
We are making some performance measurements, we are trying to determine query execution behavior. Lets say we have 4 tablesT1, T2, T3 and T4 and the query has the form: select * from T1, T2, T3, T4 where (T1.a = T2.b and T2.c = T3.dT3.e = T4.f) where a,b,c,d,e,f are properties of the respective tables. Lets say that the cost based optimizer determines that the order of the joins should be T1.a=T2.b followed by T2.c = T3.dfollowed by T3.e = T4.f the question we have is during query execution are the joins evaluated completely one by one in that order, or the firstjoin is evaluated completely and generates an intermediate table which is then utilized to perform the next join....thismeans that for such a query we will need space for all the intermediate tables, which if they are very largetables as they are in our case can significantly alter the cost of the operations... thanks -Ashish
2011/2/9 amit sehas <cun23@yahoo.com>: > Lets say that the cost based optimizer determines that the order of the > joins should be T1.a=T2.b followed by T2.c = T3.d followed by T3.e = T4.f > > the question we have is during query execution are the joins evaluated > completely one by one in that order, or the first join is evaluated > completely and generates an intermediate table which is then utilized > to perform the next join....this means that for such a query we will need > space for all the intermediate tables, which if they are very large tables > as they are in our case can significantly alter the cost of the operations... [ This is a question more appropriate for pgsql-performance. ] The optimizer doesn't only determine the order (or "tree" actually) in which to perform the joins, but also how to perform them: nested loop, merge, or hash join. Depending on those physical join types, something might need to be materialized (merge: the intermediate sort "tapes"; hash: the full outer operand's contents) or not (nested loop). Please see the EXPLAIN statement if you want to know how the query would be executed. Nicolas
amit sehas <cun23@yahoo.com> wrote: > We are making some performance measurements, we are trying to > determine query execution behavior. I hope you're not doing this on an un-tuned server or "toy" tables. There are a number of configuration parameters which should be tuned for your particular server, which will affect the plans used on that server. Also, the plans the optimizer will choose for small tables are generally quite different for those chosen for large tables. > the question we have is during query execution are the joins > evaluated completely one by one in that order, or the first join > is evaluated completely and generates an intermediate table which > is then utilized to perform the next join....this means that for > such a query we will need space for all the intermediate tables, > which if they are very large tables as they are in our case can > significantly alter the cost of the operations... The query looks at the estimated cost of various plans, and uses the one with the lowest estimated cost. The plan consists of a set of steps, with each step pulling rows from one or more lower steps; so the process is driven from the top down -- you try to pull a result row, and if the top level step needs something from a lower row it tries to pull that, etc. Some plans involve "materializing" a set of rows, some pull through indexes or heap scans as rows are requested from them. Depending on the query, the indexes, the table sizes, etc., a SELECT statement with joins like you describe might use nested index joins and never have more than a few rows in RAM at a time without ever writing anything to disk. By the way, this is the wrong list for this question. The -hackers list is for discussion about work on developing the product. Questions like this about how it works are better posted to the -general or -novice list. -Kevin