Hello all,
We’re seeing intermittently very poor performance of a query, when occasionally a poor query plan is chosen. We’re using Postgres 16.9.
One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very wrong estimated number of rows to be returned from a text[] column queried with ‘&&’.
After playing around with a simple recreate (details below), it seems ANALYZE of the table is affected by the number of rows in the table. Statistic `most_common_elems` is [null] when there’s over 15,873 rows in the table when analyzed. With fewer rows it’s analyzed correctly.
Is there any good explanation for this behaviour? Preferably we’d like some way for proper `most_common_elems` statistics to be collected in our production database, in the hope that influences a good query plan to always be selected.
In our production system there’s ~150,000 rows in a table including a `text[]` column, where each row has an array containing a single 19ish char string, unique within the table. The full query joins against a couple more tables, and has a GIN index on the text[] column. If necessary, I can get into details of the real system, but hope the simple recreate will be sufficient to understand the problem:
CREATE TABLE IF NOT EXISTS public.test(
id SERIAL PRIMARY KEY,
tags text[]
)
INSERT INTO public.test (tags)
SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT generate_series(1,15_873) AS n );
ANALYZE public.test;
SELECT * FROM pg_stat_user_tables WHERE relname = 'test';
EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
SELECT * FROM test WHERE tags && ARRAY['00000002']
Results
-------
table with 15_000 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.666667e-05)
table with 15_872 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.300403e-05)
table with 15_873 rows has [null] most_common_elems after ANALYZE
table with 100_000 rows has [null] most_common_elems after ANALYZE
Query plans show an estimated 1 row is predicted when statistics has `most_common_elems` available, or the hardcoded default 1/200 of the estimated table size when most_common_elems is null.
Here 79 rows are estimated, when the table contained 15,873 rows and stats weren’t available.
Query plan
-----------
Seq Scan on public.test (cost=0.00..463.41 rows=79 width=37) (actual time=9.934..17.190 rows=1 loops=1)
Output: id, tags
Filter: (test.tags && '{00000002}'::text[])
Rows Removed by Filter: 15872
Buffers: shared hit=268
Planning:
Buffers: shared hit=75
Planning Time: 2.060 ms
Execution Time: 17.205 ms
Full version
------------
"PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit"
Regards,
Mark Frost
IBM