Thread: Column changes such as an increase in varchar size, can cause extremely slow queries, and postgres should run analyze automatically.
Column changes such as an increase in varchar size, can cause extremely slow queries, and postgres should run analyze automatically.
From
Tyler
Date:
In our project lemmy, we recently had a production breaking bug causing extremely slow queries to one of our tables.
Finally after a lot of testing, we narrowed it down to a migration that increased the size of a varchar column meant to store URL data.
This increases the url column from 512 -> 2000 characters.
up.sql:
```sql
ALTER TABLE post
ALTER COLUMN url TYPE varchar(2000);
```
This table currently has 1,186,895 rows, and joins are occasionally done to that column.
We finally realized that running this simple query manually, fixed the issue:
`ANALYZE post (url);`
I'm sure we're not the only ones to experience this potentially production-breaking bug, and postgres should probably automatically re-run analyze on columns for tables that have a large number of rows, that are changed.
For more context, see: https://github.com/LemmyNet/lemmy/pull/5148
Thanks everyone,
-- dessalines
Finally after a lot of testing, we narrowed it down to a migration that increased the size of a varchar column meant to store URL data.
This increases the url column from 512 -> 2000 characters.
up.sql:
```sql
ALTER TABLE post
ALTER COLUMN url TYPE varchar(2000);
```
This table currently has 1,186,895 rows, and joins are occasionally done to that column.
We finally realized that running this simple query manually, fixed the issue:
`ANALYZE post (url);`
I'm sure we're not the only ones to experience this potentially production-breaking bug, and postgres should probably automatically re-run analyze on columns for tables that have a large number of rows, that are changed.
For more context, see: https://github.com/LemmyNet/lemmy/pull/5148
Thanks everyone,
-- dessalines
Re: Column changes such as an increase in varchar size, can cause extremely slow queries, and postgres should run analyze automatically.
From
Laurenz Albe
Date:
On Thu, 2024-10-31 at 14:34 -0400, Tyler wrote: > In our project lemmy, we recently had a production breaking bug causing extremely > slow queries to one of our tables. > > Finally after a lot of testing, we narrowed it down to a migration that increased > the size of a varchar column meant to store URL data. > > This increases the url column from 512 -> 2000 characters. > > ALTER TABLE post > ALTER COLUMN url TYPE varchar(2000); > > We finally realized that running this simple query manually, fixed the issue: > > `ANALYZE post (url);` > > I'm sure we're not the only ones to experience this potentially production-breaking bug, and postgres should > probably automatically re-run analyze on columns for tables that have a large number of rows, that are changed. > > For more context, see: https://github.com/LemmyNet/lemmy/pull/5148 I don't know if that should be considered a bug, but I sympathize with the complaint. At the very least we should document on https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS that operations like ALTER TABLE or CREATE INDEX don't trigger autoanalyze. Would it be an option to clear pg_class.reltuples and pg_stat_all_tables.n_mod_since_analyze whenever the statistics for a table are cleared? Then autoanalyze would trigger after 50 modifications. Yours, Laurenz Albe
Re: Column changes such as an increase in varchar size, can cause extremely slow queries, and postgres should run analyze automatically.
From
David Rowley
Date:
On Fri, 1 Nov 2024 at 19:32, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > Would it be an option to clear pg_class.reltuples and pg_stat_all_tables.n_mod_since_analyze > whenever the statistics for a table are cleared? Then autoanalyze would trigger after 50 > modifications. I wondered about that too. I also wondered which cases we could have ATExecAlterColumnType() not call RemoveStatistics(). If the table is going to be rewritten, then we need to, but there must be plenty of cases where we could forego removing the stats when there's no rewrite. David
Re: Column changes such as an increase in varchar size, can cause extremely slow queries, and postgres should run analyze automatically.
From
Noah Misch
Date:
On Fri, Nov 01, 2024 at 10:03:08PM +1300, David Rowley wrote: > On Fri, 1 Nov 2024 at 19:32, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Would it be an option to clear pg_class.reltuples and pg_stat_all_tables.n_mod_since_analyze > > whenever the statistics for a table are cleared? Then autoanalyze would trigger after 50 > > modifications. > > I wondered about that too. I also wondered which cases we could have > ATExecAlterColumnType() not call RemoveStatistics(). If the table is > going to be rewritten, then we need to, but there must be plenty of > cases where we could forego removing the stats when there's no > rewrite. In the absence of a rewrite, stanullfrac and stawidth always remain valid. The conditions for keeping other stats are like CheckIndexCompatible() deciding to skip an index rebuild. Specifically, I think keeping pg_statistic entries would require (1) staop and stacoll haven't changed and (2) if staop is polymorphic, the argument type hasn't changed. Those conditions would also suffice to keep non-expression pg_statistic_ext_data. That may be enough. If not, type-specific rules would identify more cases where discarding stats has no value. For example, if staop is "<" and the column type is changing between timestamp and timestamptz, none of the "<" comparisons will change. Similarly, staop of "=" works the same for oid and for int4. Beyond those cases where discarding stats has no value, there may be cases where the staop behaves differently before and after, but you could argue that it's better to keep sometimes-incorrect stats than to remove them. For example, <(oid,oid) and <(int4,int4) behave differently due to signed vs. unsigned. Still, the last histogram may be better than no histogram. Keeping pg_statistic_ext_data for expression stats would require expression introspection that CheckIndexCompatible() hasn't bothered with. If we ever implement such expression introspection, I expect it would cover both CheckIndexCompatible() and pg_statistic_ext_data.