Home > mailing lists

Re: Query Performance / Planner estimate off - Mailing list pgsql-performance

From	Mats Olsen
Subject	Re: Query Performance / Planner estimate off
Date	October 20, 2020 20:02:09
Msg-id	b424307b-013d-7256-0198-6fed5c00edca@duneanalytics.com Whole thread Raw
In response to	Re: Query Performance / Planner estimate off (Sushant Pawar <sushantxp@gmail.com>)
List	pgsql-performance

Tree view

On 10/20/20 3:40 PM, Sushant Pawar wrote:

Looping in the main group ID.

Regards
Sushant

On Tue, Oct 20, 2020 at 6:49 PM Sushant Pawar <sushantxp@gmail.com> wrote:

On Tue, Oct 20, 2020 at 3:08 PM Mats Julian Olsen <mats@duneanalytics.com> wrote:
Dear Postgres community,

I'm looking for some help to manage queries against two large tables.

Context:
We run a relatively large postgresql instance (5TB, 32 vCPU, 120GB RAM) with a hybrid transactional/analytical workload. Data is written in batches every 15 seconds or so, and the all queryable tables are append-only (we never update or delete). Our users can run analytical queries on top of these tables.

We recently came across a series of troublesome queries one of which I'll dive into here.

Please see the following gist for both the query we run and the \d+ output: https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf.

The tables in question are:
- `ethereum.transactions`: 833M rows, partitioned, 171M rows after WHERE
- `uniswap_v2."Pair_evt_Swap": 12M rows, not partitioned, 12M rows after WHERE

The crux of our issue is that the query planner chooses a nested loop join for this query. Essentially making this query (and other queries) take a very long time to complete. In contrast, by toggling `enable_nestloop` and `enable_seqscan` off we can take the total runtime down from 16 minutes to 2 minutes.

1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
3) enable_nestloop=off; enable_seqscan=off (2 min): https://explain.depesz.com/s/0WXx

The cost of a query while using the default Vanila plan is very less compared to the 3rd plan with nested loop and seqscan being set to off. As the JIT is enabled, it seems the planner tries to select the plan with the least cost and going for the plan which is taking more time of execution. Can you try running this query with JIT=off in the session and see if it selects the plan with the least time for execution?

Thank you for your reply. Here's the result using set jit=off; https://explain.depesz.com/s/rpKc. It's essentially the same plan as the initial one.

How can I get Postgres not to loop over 12M rows?

Let me know if there is anything I left out here that would be useful for further debugging.

--
Regards
Sushant

pgsql-performance by date:

From: Victor Yegorov
Date: 20 October 2020, 19:51:39
Subject: Re: Query Performance / Planner estimate off

From: Mats Olsen
Date: 20 October 2020, 20:40:40
Subject: Re: Query Performance / Planner estimate off

Re: Query Performance / Planner estimate off - Mailing list pgsql-performance

Previous

Next