Home > mailing lists

Statistics on array values - Mailing list pgsql-performance

From	Marco Colli
Subject	Statistics on array values
Date	February 2, 2020 17:18:19
Msg-id	CAFvCgN4d9G6=u-_jJPBZhTSdf1Uqo+-txwWdAp_hCDKXbBQOHw@mail.gmail.com Whole thread Raw
Responses	Re: Statistics on array values (Justin Pryzby <pryzby@telsasoft.com>) Re: Statistics on array values (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

Hello!

Let's say that you have a simple query like the following on a large table (for a multi-tenant application):

SELECT "subscribers".* FROM "subscribers" WHERE "subscribers"."project_id" = 123 AND (tags @> ARRAY['de']::varchar[]);

If you run EXPLAIN ANALYZE you can see that stats are completely wrong. For example I get an expected count of 3,500 rows whereas the actual result is 20 rows. This also results in bad query plans...

In a previous discussion someone said that this wrong estimate is because @> uses a fixed selectivity of 0.001, **regardless of actual data**!!

Is that true? Is there any solution or any plan to improve this in future versions of PostgreSQL?

Finally it would be useful to have the ability to CREATE STATISTICS, to show PostgreSQL that there's a correlation between project_id and tag values... but this is a further step. Currently I can create statistics, however it seems to have no positive effect on the estimates for the above case

Marco Colli

pgsql-performance by date:

From: Duncan Whitham
Date: 31 January 2020, 14:01:20
Subject: Re: Specific query taking time to process

From: Justin Pryzby
Date: 02 February 2020, 17:23:21
Subject: Re: Statistics on array values

Statistics on array values - Mailing list pgsql-performance

Previous

Next