Thread: Need help - optimizer trouble
Hi! I am desperate with 7.1beta4; I have a serious performance problem that does not manifest in 7.0.2. This is currently just a feasability test, so I am willing to try every hack. The problematic part of my schema looks as follows: document written_by author author_keyword -------- 1 n ---------- ---------- -------------- *doc_id <---- *doc_id n 1 *name 1 n *keyword doc *author_id ----> *author_id <---- *author_id columns marked with an asterik have an index document contains ~100 000 tuples; each is "written_by" 1-3 out of ~10 000 authors, and every author is associated with ~4 keywords I am trying to retrieve documents written by authors associated with a given keyword. I can get the doc_ids using: SELECT doc_id FROM written_by AS wb JOIN author_keyword AS kw ON kw.author_id=wb.author_id WHERE kw.keyword='foo' 7.1beta4 and 7.0.2 both use the following query plan: Nested Loop (cost=0.00..18.37 rows=8 width=20) -> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4) -> Index Scan using written_by_idx on written_by (cost=0.00..16.29 rows=4 width=16) I can then take the values returned and retrieve the documents in a separate query; the two queries take at most a second. However if I add a join to get the documents directly: SELECT doc FROM document JOIN written_by AS wb ON document.doc_id=wb.doc_id JOIN author_keyword AS kw ON kw.author_id=wb.author_id WHERE kw.keyword='foo' 7.0.2 will use the following query plan, query takes ~1 second: Nested Loop (cost=0.00..27.53 rows=2 width=32) -> Nested Loop (cost=0.00..10.00 rows=4 width=20) -> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4) -> Index Scan using written_by_idx on ist_autor_von (cost=0.00..7.95 rows=2 width=16) -> Index Scan using document_pkey on document (cost=0.00..4.16 rows=1 width=12) 7.1beta4 uses the following query plan, query takes ~150 seconds: Nested Loop (cost=8562.09..39846.62 rows=4 width=32) -> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4) -> Materialize (cost=37969.52..37969.52 rows=100000 width=28) -> Hash Join (cost=8562.09..37969.52 rows=100000 width=28) -> Seq Scan on document (cost=0.00..11932.00 rows=100000 width=12) -> Hash (cost=5129.55..5129.55 rows=203555 width=16) -> Seq Scan on written_by (cost=0.00..5129.55 rows=303555 width=16) The two databases are not identical, but they were created using the same script which fills the database with random data of the above structure. I recreated the databases several times, so it is completely repeatable. I tried playing with the optimizer parameters, but it only got worse because I do not really understand what they are all doing. I fell uncomfortable with the idea of having to split up my query into two separate ones. Can anyone give me any hints how to influence the 7.1 optimizer to behave like 7.0.2 in this case? Is there something wrong with my query? Is there an alternative form which the optimizer can handle better? Thanks for your patience Helge
On Thu, Apr 05, 2001 at 12:01:39AM +0200, Helge Bahmann wrote: > 7.0.2 will use the following query plan, query takes ~1 second: > > Nested Loop (cost=0.00..27.53 rows=2 width=32) > -> Nested Loop (cost=0.00..10.00 rows=4 width=20) > -> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4) > -> Index Scan using written_by_idx on ist_autor_von (cost=0.00..7.95 rows=2 width=16) > -> Index Scan using document_pkey on document (cost=0.00..4.16 rows=1 width=12) > > 7.1beta4 uses the following query plan, query takes ~150 seconds: > > Nested Loop (cost=8562.09..39846.62 rows=4 width=32) > -> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4) > -> Materialize (cost=37969.52..37969.52 rows=100000 width=28) > -> Hash Join (cost=8562.09..37969.52 rows=100000 width=28) > -> Seq Scan on document (cost=0.00..11932.00 rows=100000 width=12) > -> Hash (cost=5129.55..5129.55 rows=203555 width=16) > -> Seq Scan on written_by (cost=0.00..5129.55 rows=303555 width=16) > > The two databases are not identical, but they were created using the > same script which fills the database with random data of the above > structure. I recreated the databases several times, so it is completely > repeatable. If the amount of data in the databases is similar, it looks like the planner's guesses are totally off. Have you VACUUM ANALYZE'd both databases recently? (That's the obvious question. I'll leave it to the PgSQL gurus to help you if that's not the problem.) Cheers, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Vegetarians do not love animals... they hate plants.
Helge Bahmann <bahmann@math.tu-freiberg.de> writes: > SELECT doc FROM document > JOIN written_by AS wb ON document.doc_id=wb.doc_id > JOIN author_keyword AS kw ON kw.author_id=wb.author_id > WHERE kw.keyword='foo' > 7.0.2 will use the following query plan, query takes ~1 second: > ... > 7.1beta4 uses the following query plan, query takes ~150 seconds: > ... 7.1 treats the JOIN clauses as determining join order, where 7.0 did not. Looks like you should re-order the query to be, say, SELECT doc FROM written_by AS wb JOIN author_keyword AS kw ON kw.author_id=wb.author_id JOIN document ON document.doc_id=wb.doc_id WHERE kw.keyword='foo' This behavior is a bit of a hack, which we'll likely reconsider in future releases, but it does have some redeeming social value too. See http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html regards, tom lane
On Wed, 4 Apr 2001, Tom Lane wrote: > 7.1 treats the JOIN clauses as determining join order, where 7.0 did > not. Looks like you should re-order the query to be, say, > > SELECT doc FROM written_by AS wb > JOIN author_keyword AS kw ON kw.author_id=wb.author_id > JOIN document ON document.doc_id=wb.doc_id > WHERE kw.keyword='foo' It works! I restructured my query SELECT doc FROM document JOIN (SELECT keyword, doc_id FROM written_by AS _w JOIN author_keyword AS _kw ON _w.author_id=_kw.author_id ) AS wb ON document.doc_id=wb.doc_id WHERE wb.keyword='foo' which is easier to handle for my application logic and it is pretty fast now. Many thanks for your help! regards, Helge
Tom Lane wrote: > > 7.1 treats the JOIN clauses as determining join order, where 7.0 did > not. Looks like you should re-order the query to be, say, > > SELECT doc FROM written_by AS wb > JOIN author_keyword AS kw ON kw.author_id=wb.author_id > JOIN document ON document.doc_id=wb.doc_id > WHERE kw.keyword='foo' > > This behavior is a bit of a hack, which we'll likely reconsider in > future releases, but it does have some redeeming social value too. > See > http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html Excuse my audacity in posing such a preposterous question... but shouldn't the SQL syntax be as free of effects on the physical level as possible? I've already posted a similar question at pgsql-general that got no answer, but I will try it again here. As I understand from the relational model there should be three levels: user, logical and physical. The SELECT command should pertain to the user level, and the access paths to the physical. Any alteration in the behaviour of the optimizer should be or (1) set by the administrator at the optimizer itself, or (2) at the logical-physical levels' mapping, or (3) thru hints delivered parallel with, but separate to, the SQL syntax of the query itself. So this change, besides really being a hack, would kind of pollute the environment by adding complexity to the use of SQL. Perhaps it would be cleaner to have somehing like Oracle's hints? Thanks for your attention. -- _ / \ Leandro Guimarães Faria Corsetti Dutra +55 (11) 3040 8913 \ / Amdocs at Bell Canada +1 (514) 786 87 47 X Support Center, São Paulo, Brazil mailto:adbaamd@bell.ca / \ http://terravista.pt./Enseada/1989/ mailto:leandrod@amdocs.com
"ADBAAMD" <adba.amdocs@bell.ca> writes: > Tom Lane wrote: >> This behavior is a bit of a hack, which we'll likely reconsider in >> future releases, but it does have some redeeming social value too. > Perhaps it would be cleaner to have somehing like Oracle's hints? Possibly. Want to enlighten those of us who've never used Oracle about how that works? regards, tom lane
Tom Lane wrote: > "ADBAAMD" <adba.amdocs@bell.ca> writes: > >> Tom Lane wrote: >> >>> This behavior is a bit of a hack, which we'll likely reconsider in >>> future releases, but it does have some redeeming social value too. > >> Perhaps it would be cleaner to have somehing like Oracle's hints? > > Possibly. Want to enlighten those of us who've never used Oracle about > how that works? You can read (almost) all about it at http://technet.oracle.com/doc/server.804/a58246/toc.htm, more to the point http://technet.oracle.com/doc/server.804/a58246/optimiz.htm#2250. For example: SELECT /*+ STAR */ doc_id FROM written_by AS wb JOIN author_keyword AS kw ON kw.author_id = wb.author_id WHERE kw.keyword = 'foo' ; There is also a /*+ ORDERED */ hint, but I dislike it as it also requires query rewriting. Keep in mind I have an opinion against having to rewrite queries for performance reasons, thinking this should be done at the optimizer or preferably at the physical level or the logical-to-physical mapping. Also it does not seems to me very nice to have queries changing behaviour from version to version and then back again. -- _ / \ Leandro Guimarães Faria Corsetti Dutra +55 (11) 3040 8913 \ / Amdocs at Bell Canada +1 (514) 786 87 47 X Support Center, São Paulo, Brazil mailto:adbaamd@bell.ca / \ http://terravista.pt./Enseada/1989/ mailto:leandrod@amdocs.com