Thread: Preformance
LS, I'm at a loss and need help. I have a server (SuSE 7.1) with PostgreSQL 7.0.3 with 2 databases. The databases are roughly the same size. Basicly the setup is as follows: tblNumber: project_id INTEGER project_name VARCHAR(30) ddi_number INTEGER phone_number VARCHAR(30) tblData: start_call DATE project_id INTEGER ddi_number INTEGER ...more fields... The main index is on tblData on fields (start_call, project_id, ddi_number). This index is indeed used as EXPLAIN shows. The query is something like: SELECT n.project_name, n.phone_number, SUM(d.data), SUM(d.data...) FROM tblNumber n, tblData d WHERE d.start_call BETWEEN i AND j AND d.project_id BETWEEN k AND l AND d.project_id = n.project_id AND d.ddi_number = n.ddi_number GROUP BY n.project_name, n.phone_number ORDER BY n.project_name, n.phone_number; VACUUM ANALYZE is run on both databases daily. The strange part is that a query on the first database takes 2.5 seconds and on the second one 3 minutes and 7.1 second! EXPLAIN looks the same on both databases. I've attached the output. The first query is the quick one, the second the slow one. What can be the cause of this huge difference in time? Regards, Cees.
Attachment
Explain's don't look the same to me. The first explain indicates use of HASH JOIN, whereas the second is using NESTED LOOP. I'm guessing some of the other indexes on the two tables are not the same between the two databases. One of these other indexes is being used in the first explain, but not the second. Frank At 05:56 PM 2/2/02 +0100, Cees van de Griend wrote: >I have a server (SuSE 7.1) with PostgreSQL 7.0.3 with 2 databases. >The databases are roughly the same size. Basicly the setup is as >follows: > >tblNumber: project_id INTEGER > project_name VARCHAR(30) > ddi_number INTEGER > phone_number VARCHAR(30) > >tblData: start_call DATE > project_id INTEGER > ddi_number INTEGER > ...more fields... > >The main index is on tblData on fields (start_call, project_id, >ddi_number). This index is indeed used as EXPLAIN shows. > >The query is something like: >SELECT n.project_name, > n.phone_number, > SUM(d.data), > SUM(d.data...) >FROM tblNumber n, > tblData d >WHERE d.start_call BETWEEN i AND j >AND d.project_id BETWEEN k AND l >AND d.project_id = n.project_id >AND d.ddi_number = n.ddi_number >GROUP BY n.project_name, n.phone_number >ORDER BY n.project_name, n.phone_number; > >VACUUM ANALYZE is run on both databases daily. > >The strange part is that a query on the first database takes 2.5 seconds >and on the second one 3 minutes and 7.1 second! >EXPLAIN looks the same on both databases. > >I've attached the output. >The first query is the quick one, the second the slow one. > >What can be the cause of this huge difference in time? > >Regards, >Cees. > >Attachment Converted: "m:\BELL_HSE\ATTACH\t" > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly >
Cees van de Griend <cees-list@griend.xs4all.nl> writes: > The strange part is that a query on the first database takes 2.5 seconds > and on the second one 3 minutes and 7.1 second! > EXPLAIN looks the same on both databases. No, it doesn't look the same at all: you're getting hash joins in one case and nestloop joins in the other. I'd also observe that you haven't given us anything close to an accurate version of the query, as the EXPLAINs show four input tables not two. Moreover, the two EXPLAINs are clearly not for the same query (the table names aren't the same). > -> Hash Join (cost=20.38..1145.68 rows=1161 width=40) > -> Seq Scan on tblcitsdialinday dd (cost=0.00..1006.03 rows=2485 width=32) > -> Hash (cost=18.10..18.10 rows=910 width=8) > -> Seq Scan on tblcitsddinumber dn (cost=0.00..18.10 rows=910 width=8) vs > -> Nested Loop (cost=0.00..1119.65 rows=4 width=40) > -> Seq Scan on tblatnnumber dn (cost=0.00..10.03 rows=503 width=8) > -> Index Scan using idxatndialinday04 on tblatndialinday dd (cost=0.00..2.19 rows=1 width=32) The problem is presumably that the planner is drastically underestimating the number of joinable rows in "dn" in the second case, and so choosing a plan that works well if that number is small but not well when it's large. There's not a lot you can do about this in 7.0.3, but you are overdue for an update anyway. I'd suggest trying it in 7.2, due out Monday. regards, tom lane
Cees van de Griend <cees@griend.xs4all.nl> writes: >> The problem is presumably that the planner is drastically >> underestimating the number of joinable rows in "dn" in the >> second case, and so choosing a plan that works well if that >> number is small but not well when it's large. > This is the strange part. The sizes of the 2 databases are not that > great and the 'fast' one has a bigger Number table: Nothing strange about that. The larger table is probably enough larger to persuade the planner to take the hash-join plan; remember the problem in the other case is that the planner thinks there are few enough rows to make nestloop appropriate, when there really are too many for that plan to be a good choice. > Someone has altered the table and added a column which is never used, > there is no data in it and it is never used in a query. > Can this be the reason for the huge preformance loss? No. regards, tom lane
Tom, thanks for the effort of answering my questions. I'm a long time reader of this list and you are always very helpfull. On Sat, Feb 02, 2002 at 03:57:19PM -0500, Tom Lane wrote: > Cees van de Griend <cees@griend.xs4all.nl> writes: > >> The problem is presumably that the planner is drastically > >> underestimating the number of joinable rows in "dn" in the > >> second case, and so choosing a plan that works well if that > >> number is small but not well when it's large. > > > This is the strange part. The sizes of the 2 databases are not that > > great and the 'fast' one has a bigger Number table: > > Nothing strange about that. The larger table is probably enough > larger to persuade the planner to take the hash-join plan; remember > the problem in the other case is that the planner thinks there are few > enough rows to make nestloop appropriate, when there really are too > many for that plan to be a good choice. OK, I can understand this, but... > > Someone has altered the table and added a column which is never used, > > there is no data in it and it is never used in a query. > > Can this be the reason for the huge preformance loss? > > No. Sigh. Then what can be? Before, a few days back, the 'same' query on the 'slow' database took approximately the same time on the 'fast' one; it took seconds. I've never taken the time to study the results of EXPLAIN, because I was happy with the default actions. Now, on the 'fast' database is still takes seconds but on the 'slow' one it takes minutes and the only change I can think of is the added column. The database are on the same server, they share the same processor, the same memory and the same disk. What can possible be the cause of the difference in preformance? Is is as simple as a wrong choice of the planner and can a wrong choice have such huge effect? > regards, tom lane Regards, Cees.
Cees van de Griend <cees-list@griend.xs4all.nl> writes: > What can possible be the cause of the difference in preformance? Probably the VACUUM ANALYZE statistics changed just enough to push the planner into making the wrong choice. You could experiment with doing "set enable_nestloop to off" and then EXPLAIN to see what the plan and cost are; I'll bet that the estimated cost of the hash plan is now just fractionally more than that of the nestloop. Of course, the *true* costs are very different, which is why I consider this a planner estimation failure. > Is is as simple as a wrong choice of the planner and can a wrong choice > have such huge effect? Yes, and yes. regards, tom lane
On Sat, Feb 02, 2002 at 05:09:58PM -0500, Tom Lane wrote: > Cees van de Griend <cees-list@griend.xs4all.nl> writes: > > What can possible be the cause of the difference in preformance? > > Probably the VACUUM ANALYZE statistics changed just enough to push the > planner into making the wrong choice. You could experiment with doing > "set enable_nestloop to off" and then EXPLAIN to see what the plan and > cost are; I'll bet that the estimated cost of the hash plan is now > just fractionally more than that of the nestloop. > > Of course, the *true* costs are very different, which is why I consider > this a planner estimation failure. I have dumped the table, droped the table and put the dump back into the database. Now the speed is as it should be (for the time being). If I understand you explanation, a possible hack to circumvent this problem could be to trick the planner into thinking that it should use the hash scan method, by inserting dummy data into the tbl[XXX]Number. Another possible fix is to execute 'SET enable_nestloop = OFF;' before every query. Is this correct? > regards, tom lane Regards, Cees.
Cees van de Griend <cees-list@griend.xs4all.nl> writes: > ... a possible hack to circumvent this problem ... Before contorting your data or application to hack around this problem, I'd suggest updating to 7.2 and seeing if the problem is fixed for real by the new statistics code. 7.2 should be out tomorrow, if Marc wraps it on schedule. regards, tom lane