Re: Helping planner to chose sequential scan when it improves performance - Mailing list pgsql-general

From Ruslan Zakirov
Subject Re: Helping planner to chose sequential scan when it improves performance
Date
Msg-id CAMOxC8sJEVujzyToR_wgeAERZ1AJDhU4fs92xY6MOUw0mGPqYA@mail.gmail.com
Whole thread Raw
In response to Helping planner to chose sequential scan when it improves performance  ("Patrick O'Toole" <patrick.otoole@sturdy.ai>)
List pgsql-general


On Tue, Jun 13, 2023 at 10:28 PM Patrick O'Toole <patrick.otoole@sturdy.ai> wrote:
Hi all,

 
Questions:
  1. In Plan A, what factors are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't noticed any differences in these two) it is ~250k. The planner uses a less expensive plan.

Also, in the plans you can see that Pg estimates the number of rows correctly.
  1. Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
I don't see any difference in plans B and C, but you report timing changes. To me this looks like just a fluctuation in measurements. So I wouldn't trust any measurements for plan A either.

I'm not a big expert, but can not say that plan A and B are essentially the same.

Plan A: DB scans item_text table and for every record looks into the index of conversation_item table, then looks into the table itself.

Plan B/C: DB scans conversation_item table without looking into its indexes building a hash table on the fly.


  1. Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
No idea.
  1. Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?
Recheck your numbers. 
  1. Are there other problems with our schema, query, or plans shown here? Other approaches (or tools/analyses) we should consider?
You can try the following index:
 
CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid, conversation_uuid, tenant_id);

I believe this index would allow Pg to use "index only scan" as variation of Plan A and avoid touching the conversation_item table completely.

--
Best regards, Ruslan.

pgsql-general by date:

Previous
From: Ruslan Zakirov
Date:
Subject: Re: Reproducing incorrect order with order by in a subquery
Next
From: Karsten Hilbert
Date:
Subject: Re: Reproducing incorrect order with order by in a subquery