In Plan A, what factors are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't noticed any differences in these two) it is ~250k. The planner uses a less expensive plan.
Also, in the plans you can see that Pg estimates the number of rows correctly.
Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
I don't see any difference in plans B and C, but you report timing changes. To me this looks like just a fluctuation in measurements. So I wouldn't trust any measurements for plan A either.
I'm not a big expert, but can not say that plan A and B are essentially the same.
Plan A: DB scans item_text table and for every record looks into the index of conversation_item table, then looks into the table itself.
Plan B/C: DB scans conversation_item table without looking into its indexes building a hash table on the fly.
Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
No idea.
Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?
Recheck your numbers.
Are there other problems with our schema, query, or plans shown here? Other approaches (or tools/analyses) we should consider?
You can try the following index:
CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid, conversation_uuid, tenant_id);
I believe this index would allow Pg to use "index only scan" as variation of Plan A and avoid touching the conversation_item table completely.