Re: Slow recursive CTE query questions, with row estimate and n_distinct issues - Mailing list pgsql-performance

From Michael Lewis
Subject Re: Slow recursive CTE query questions, with row estimate and n_distinct issues
Date
Msg-id CAHOFxGo2qtDSbBh6xqQ-DZpCRSR=d0sXdU9xAOyTDxH-iPVo4A@mail.gmail.com
Whole thread Raw
In response to Slow recursive CTE query questions, with row estimate and n_distinct issues  (Christopher Baines <mail@cbaines.net>)
Responses Re: Slow recursive CTE query questions, with row estimate and n_distinct issues  (Christopher Baines <mail@cbaines.net>)
List pgsql-performance
On Mon, Dec 28, 2020 at 7:51 AM Christopher Baines <mail@cbaines.net> wrote:
derivation_inputs:
  COUNT(*):  285422539
  reltuples: 285422528

  derivation_id:
    COUNT(DISTINCT): 7508610
    n_distinct:      4336644 (~57% of the true value)

  derivation_output_id:
    COUNT(DISTINCT): 5539406
    n_distinct:      473762 (~8% of the true value)

If you expect the ratio of distinct of derivation_output_id values to be roughly linear going forward, you can set a custom value for n_distinct on the column (currently seems like -.0194, aka distinct count of derivation_output_id divided by reltuples of the table). You could also do this analysis every month or six and set the custom value as needed.


I am not sure if it will resolve your query problems though.

pgsql-performance by date:

Previous
From: Christopher Baines
Date:
Subject: Slow recursive CTE query questions, with row estimate and n_distinct issues
Next
From: Greg Spiegelberg
Date:
Subject: Re: Slow recursive CTE query questions, with row estimate and n_distinct issues