Thread: about new join syntax performance
Hello, we are working with v7.0.2 but to upgrade to the recent version of postgres regarding to the full join syntax most demandedby our web-application. I'd like to get some tips about how to rewrite our queries to achieve better performance. [ from the manual 2.1.1.2. Subqueries in FROM clause ] " More interesting cases, which can't be reduced to a plain join, arise when the subquery involves grouping or aggregation." Take a look at this query: SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN documentAS D USING(did); My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq): SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...; SELECT * FROM tmp AS V, document AS D WHER V.did = D.did; additional informations: table document: did serial primary key, content text not null total rows = 22.000 - table values : did serial references (document), value text not null, /* other columns */ total rows = 2 or 5 times of document rows, (there are 2 up to 5 references to each document) I'd like to know whether it's worth changing the scripts to use the new join syntax. Papp Gyozo - pgerzson@freestart.hu
"Gyozo Papp" <pgerzson@freestart.hu> writes: > Take a look at this query: > SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN documentAS D USING(did); > My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq): > SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...; > SELECT * FROM tmp AS V, document AS D WHER V.did = D.did; My guess is that you'd get exactly the same component plans, so the savings would just be the overhead involved in creating, filling, and eventually deleting a temp table. Hard to tell whether that's significant or not in the context of this sort of query. (Me, I'd go for it just so I didn't have to worry about remembering to delete the temp table before I could issue another similar query.) You might get a different plan for the join step because it would be determined by the estimated number of rows output from the subselect, whereas in the temp-table case you'd just get a default estimate for the size of the temp table. (I'm assuming you don't stop to VACUUM ANALYZE the temp table...) Depending on how good the estimate is, this could be a better or worse plan. > I'd like to know whether it's worth changing the scripts to use the > new join syntax. Try a few and see. regards, tom lane