Thread: Query very different speeds on seemingly similar data
I’ve got a weird problem that I can’t work out…
A customer was complaining that their system was running slowly. They’ve only been using it for a few days, so we tested it on our copy of their data and it’s running at normal speed.
Uploaded our backup to the live server and it’s still quick.
Restored a new backup of their live database on our local test server and it’s REALLY slow. So the problem seems to be data-related.
I’ve run Analyse, Vacuum and Reindex and still no change.
I’ve stripped the query down to the bare minimum that causes the speed difference.
The only difference between the “good” data and the “bad” data involved in this query is about 80 extra records in an invoice table (which had 250,000 records to start with).
It’s behaving (in my opinion) like it does if Analyse isn’t run after restoring data, or as if the indexes are broken. Explain Analyse shows it is running the query in completely different ways on the two databases.
The query, now I’ve stripped it down to the offending part, is as follows:
SELECT stk_key,
(SELECT SUM(stdp_quantity) FROM sales_invoicedetails_purchlinks
LEFT JOIN sales_invoicedetails ON std_unique = stdp_std_unique
WHERE stdp_loc_key = '__NBI' AND std_stk_key = stock.stk_key
) as level
FROM stock
Table “stock” has about 5000 records, sales_invoicedetails has about 250,000, sales_invoicedetails_purchlinks has about 80
The bit that kills it is the “std_stk_key=stock_stk_key” in the sub-query.
On the “good” data it runs in less than 100 milliseconds, on the “bad” data it takes ten minutes!
Explain files attached (assuming attachments will get through to the group – otherwise what’s the best way to post it?)
I’m completely stumped – any suggestions most welcome!
Med vänlig hälsning / Best Regards
Rob Northcott
Software Developer (UK Office, formerly TEAM Systems)
Phone +44 1752 712052
Compilator AB
Södergatan 22
SE-211 34 Malmö
Sweden
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL AND IS THUS FOR USE ONLY BY THE INTENDED RECIPIENT. IF YOU RECEIVED THIS IN ERROR, PLEASE CONTACT THE SENDER AND DELETE THE E-MAIL AND ITS ATTACHMENTS FROM ALL COMPUTERS.
Attachment
On Wed, 5 Jun 2019 at 04:55, Rob Northcott <Rob.Northcott@compilator.com> wrote: > Explain files attached (assuming attachments will get through to the group – otherwise what’s the best way to post it?) You're best to post the EXPLAIN ANALYZE output to https://explain.depesz.com what you attached is pretty horrible to read and details are only gained by hovering the mouse cursor over the node. I for one struggle to work out what's the inner and outer sides of the join with the output you've given and can only work it out by looking at the nested loop plan to see which side the parameter is on. It appears that the sub-query plan has changed from a Nested Loop plan to a Merge Join. The Merge Join is a pretty bad plan since the index that provides the pre-sorted input must filter out many non-matching rows. It's not quite clear to me why the planner chooses that index, mostly because I'm too lazy to learn the output you've shown the plans in, but if you did have an index on sales_invoicedetails (std_stk_key, std_unique), then the Merge Join plan would likely produce a better plan, or at least better than the current Merge Join plan. You'll likely want to ensure that random_page_cost has not been set to something insane on the Merge Join instance. I'd also check seq_page_cost too and also effective_cache_size. More recommendations might be easier to give if you show the plans in a better format. Apart from that, you could consider also instead of performing a sub-query, LEFT JOIN to a query similar to your subquery but after removing the std_stk_key = stock.stk_key condition and adding a GROUP BY std_stk_key. However, that may perform worse if there are many more std_stk_key groups than there are matching rows in stock.stk_key. Experimentation might be required there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>-----Original Message----- >From: David Rowley <david.rowley@2ndquadrant.com> >Sent: 05 June 2019 01:48 >To: Rob Northcott <Rob.Northcott@compilator.com> >Cc: pgsql-general@lists.postgresql.org >Subject: Re: Query very different speeds on seemingly similar data >On Wed, 5 Jun 2019 at 04:55, Rob Northcott <Rob.Northcott@compilator.com> wrote: >> Explain files attached (assuming attachments will get through to the >> group – otherwise what’s the best way to post it?) >You're best to post the EXPLAIN ANALYZE output to https://explain.depesz.com what you attached is pretty horrible to readand details are only gained by hovering the >mouse cursor over the node. I for one struggle to work out what's the innerand outer sides of the join with the output you've given and can only work it out by looking >at the nested loop planto see which side the parameter is on. >It appears that the sub-query plan has changed from a Nested Loop plan to a Merge Join. The Merge Join is a pretty bad plansince the index that provides the pre-sorted input must filter out many non-matching rows. It's not quite clear to mewhy the planner chooses that index, mostly because I'm too lazy to learn the output you've shown the plans in, but if youdid have an index on sales_invoicedetails (std_stk_key, std_unique), then the Merge Join plan would likely produce a betterplan, or at least better than the current Merge Join plan. You'll likely want to ensure that random_page_cost has not been set to something insane on the Merge Join instance. I'dalso check seq_page_cost too and also effective_cache_size. More recommendations might be easier to give if you show the plans in a better format. Apart from that, you could consider also instead of performing a sub-query, LEFT JOIN to a query similar to your subquerybut after removing the std_stk_key = stock.stk_key condition and adding a GROUP BY std_stk_key. However, that mayperform worse if there are many more std_stk_key groups than there are matching rows in stock.stk_key. Experimentation might be required there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services Thanks for the reply David, Those output files came from PGAdmin, but I agree they're not the easiest to look at. I've tried writing the query in differentways (which is easy enough with the minimal query here but not so easy with the original massive query), but it'sstill way slower on one set of data than on the other. I'll try changing those settings on the test server and see if it makes any difference... We've got almost 100 databasesrunning the same system (on the same server) and this is the only one that's being a problem so far, but it couldof course just be that we've been lucky. If I can't get anywhere with that I'll try to post the analyse output in a better format. Thanks again for your reply Rob