Home > mailing lists

Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column - Mailing list pgsql-performance

From	Mark Frost
Subject	Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column
Date	June 5 18:42:00
Msg-id	PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com Whole thread Raw
Responses	Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column
List	pgsql-performance

Tree view

Hello all,

We’re seeing intermittently very poor performance of a query, when occasionally a poor query plan is chosen. We’re using Postgres 16.9.

One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very wrong estimated number of rows to be returned from a text[] column queried with ‘&&’.

After playing around with a simple recreate (details below), it seems ANALYZE of the table is affected by the number of rows in the table. Statistic `most_common_elems` is [null] when there’s over 15,873 rows in the table when analyzed. With fewer rows it’s analyzed correctly.

Is there any good explanation for this behaviour? Preferably we’d like some way for proper `most_common_elems` statistics to be collected in our production database, in the hope that influences a good query plan to always be selected.

In our production system there’s ~150,000 rows in a table including a `text[]` column, where each row has an array containing a single 19ish char string, unique within the table. The full query joins against a couple more tables, and has a GIN index on the text[] column. If necessary, I can get into details of the real system, but hope the simple recreate will be sufficient to understand the problem:

CREATE TABLE IF NOT EXISTS public.test(

id SERIAL PRIMARY KEY,

tags text[]

)

INSERT INTO public.test (tags)

SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT generate_series(1,15_873) AS n );

ANALYZE public.test;

SELECT * FROM pg_stat_user_tables WHERE relname = 'test';

EXPLAIN (ANALYZE,BUFFERS,VERBOSE)

SELECT * FROM test WHERE tags && ARRAY['00000002']

Results

-------

table with 15_000 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.666667e-05)

table with 15_872 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.300403e-05)

table with 15_873 rows has [null] most_common_elems after ANALYZE

table with 100_000 rows has [null] most_common_elems after ANALYZE

Query plans show an estimated 1 row is predicted when statistics has `most_common_elems` available, or the hardcoded default 1/200 of the estimated table size when most_common_elems is null.

Here 79 rows are estimated, when the table contained 15,873 rows and stats weren’t available.

Query plan

-----------

Seq Scan on public.test (cost=0.00..463.41 rows=79 width=37) (actual time=9.934..17.190 rows=1 loops=1)

Output: id, tags

Filter: (test.tags && '{00000002}'::text[])

Rows Removed by Filter: 15872

Buffers: shared hit=268

Planning:

Buffers: shared hit=75

Planning Time: 2.060 ms

Execution Time: 17.205 ms

Full version

------------

"PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit"

Regards,

Mark Frost

IBM

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN

pgsql-performance by date:

From: Frédéric Yhuel
Date: 05 June, 17:51:05
Subject: Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

From: Frédéric Yhuel
Date: 05 June, 20:09:06
Subject: Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column - Mailing list pgsql-performance

Previous

Next