Thread: Restore of a reference database kills the auto analyze processing.
Running Postgres 15.3 with PostGIS 3.3
On Windows 10 (yes, I know)
It’s a single node db with no replication, topping out at about 200GB.
We have a schema (A) in the default 'postgres' maintenance database, which our software services connect to, with one set of users (SU)
We have another database, let’s call it LFM, which contains reference data for some COTS software. I don't know what's in it, we just get given updates for it in pg_backup binary files, about 2MB each.
This is accessed by a different postgres user (LFU) supplied to the COTS tool.
To apply an update, we:
stop the applications that use LFM,
set the user (LFU) to NOLOGIN
kill any left-over connections: select pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'lfm' and usename = 'lfu';
drop the existing reference database using the dropDb utility.
reload the new file using pg_restore and the postgres super user.
set the user (LFU) to LOGIN
Other services connecting to the default db, with SU users should keep running with no dropouts.
This works, some of the time.
If I repeat the update process, somewhere around run #4 the auto analyzer stops working, and only analyzes tables in the new db at the point of reload, then shuts off again.
All vacuum and analyze operations on the 'postgres' database just stops, even though there is still data processing into it.
With log_autovacuum_min_duration = 0, we are logging all vacuum & analyze operations, so we can see when the entries shut off in the Postgres log files, e.g.
2024-05-02 14:52:01.597 GMT [6896]: [23-1] db=,user=,app=,client= LOG: automatic analyze of table "lfm.pg_catalog.pg_trigger"
The only way I can find of getting the analyzer back is to restart Postgres.
We've narrowed the cause down to the pg_restore, but have no idea where to go from here.
Can anyone help stand the anaylzer back up please?
Most configs are left at default, (apart from memory settings) but we currently have
autovacuum_max_workers = 10
log_autovacuum_min_duration = 0
thanks,
Phil Horder
Database Mechanic
Thales Land & Air Systems
The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to it by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any other way use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have received this e-mail in error, please inform the originator immediately and delete it and all copies from your system.
Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading, Berks RG2 6GF. Registered Number: 868273
Please consider the environment before printing a hard copy of this e-mail.
On 5/2/24 8:52 AM, HORDER Philip wrote: > Running Postgres 15.3 with PostGIS 3.3 > > On Windows 10 (yes, I know) > > It’s a single node db with no replication, topping out at about 200GB. > > We have a schema (A) in the default 'postgres' maintenance database, > which our software services connect to, with one set of users (SU) > > We have another database, let’s call it LFM, which contains reference > data for some COTS software. I don't know what's in it, we just get > given updates for it in pg_backup binary files, about 2MB each. Do you mean pg_basebackup, pg_dump or something else? -- Adrian Klaver adrian.klaver@aklaver.com
On 5/2/24 08:52, HORDER Philip wrote: > Running Postgres 15.3 with PostGIS 3.3 > > On Windows 10 (yes, I know) > > It’s a single node db with no replication, topping out at about 200GB. > > We have a schema (A) in the default 'postgres' maintenance database, > which our software services connect to, with one set of users (SU) This above is probably not a good idea, The 'postgres' database is generally taken to be a throw away database for establishing an initial connection. Many utilities/tools use it for that purpose, having your data in it exposes that data. > This works, some of the time. > > If I repeat the update process, somewhere around run #4 the auto > analyzer stops working, and only analyzes tables in the new db at the > point of reload, then shuts off again. > > All vacuum and analyze operations on the 'postgres' database just stops, > even though there is still data processing into it. Is there enough data processing? Autovacuum has thresholds for turning on, are you sure those thresholds are just not being met? > > With log_autovacuum_min_duration = 0, we are logging all vacuum & > analyze operations, so we can see when the entries shut off in the > Postgres log files, e.g. > > 2024-05-02 14:52:01.597 GMT [6896]: [23-1] db=,user=,app=,client= LOG: > automatic analyze of table "lfm.pg_catalog.pg_trigger" Except the above shows it working. What is the evidence it is not? > > The only way I can find of getting the analyzer back is to restart Postgres. Did you wait to see if activity after the pg_restore crossed the autovacuum thresholds? -- Adrian Klaver adrian.klaver@aklaver.com
Sorry, pg_dump. Phil Horder Database Mechanic -----Original Message----- From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: 02 May 2024 17:59 To: HORDER Philip <Phil.Horder@uk.thalesgroup.com>; pgsql-general@lists.postgresql.org Subject: [EXTERNAL EMAIL] Re: Restore of a reference database kills the auto analyze processing. On 5/2/24 8:52 AM, HORDER Philip wrote: > Running Postgres 15.3 with PostGIS 3.3 > > On Windows 10 (yes, I know) > > It’s a single node db with no replication, topping out at about 200GB. > > We have a schema (A) in the default 'postgres' maintenance database, > which our software services connect to, with one set of users (SU) > > We have another database, let’s call it LFM, which contains reference > data for some COTS software. I don't know what's in it, we just get > given updates for it in pg_backup binary files, about 2MB each. Do you mean pg_basebackup, pg_dump or something else? -- Adrian Klaver adrian.klaver@aklaver.com The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
Thanks for your time Adrian > Is there enough data processing? Yes, one table is receiving upwards of 20 million rows daily. We noticed the problem when fetch performance on this table degraded after updates. > Autovacuum has thresholds for turning on, are you sure those thresholds are just not being met? Yes we're sure. Our biggest table is set for a fixed number of rows rather than a percentage, this gets an auto analyseabout every 15 minutes. After an update this just stops, and there are no analyse entries in the log file. None at all, for any table. When we restart Postgres the auto analyse restarts and catches up with the backlog. Phil Horder Database Mechanic Thales Land & Air Systems The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/7/24 02:38, HORDER Philip wrote: > Thanks for your time Adrian > > >> Is there enough data processing? > > Yes, one table is receiving upwards of 20 million rows daily. > We noticed the problem when fetch performance on this table degraded after updates. > >> Autovacuum has thresholds for turning on, are you sure those thresholds are just not being met? > > Yes we're sure. Our biggest table is set for a fixed number of rows rather than a percentage, this gets an auto analyseabout every 15 minutes. > > After an update this just stops, and there are no analyse entries in the log file. None at all, for any table. 1) What is the exact pg_restore command you are using? 2) From earlier post: '... only analyzes tables in the new db at the point of reload, then shuts off again.' Provide that sequence of events from the Postgres log. 3) Also statistics from https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW for that table after the reload. > > When we restart Postgres the auto analyse restarts and catches up with the backlog. > > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
On 5/7/24 08:24, Adrian Klaver wrote: > On 5/7/24 02:38, HORDER Philip wrote: >> Thanks for your time Adrian >> >> > > 1) What is the exact pg_restore command you are using? > > 2) From earlier post: '... only analyzes tables in the new db at the > point of reload, then shuts off again.' Provide that sequence of events > from the Postgres log. > > 3) Also statistics from > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW > > for that table after the reload. 4) The autovacuum settings you have in effect. -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} Backups of this db are created with: pg_dump --file=fsm.dmp -Fc --blobs --oids --dbname=lfm --host=localhost --port=nnnn --username=superuser Restore is run with: dropdb --port=nnnn --maintenance-db=postgres --username=superuser --if-exists lfm pg_restore -Fc --create --dbname=postgres --port=nnnn --username=superuser fsm.dmp ------------- > 2) From earlier post: '... only analyzes tables in the new db at the point of reload, then shuts off again.' Provide thatsequence of events from the Postgres log. Log file extract is attached, with object names obfuscated. > 3) Also statistics from https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW for that table after the reload. Well, 'that' table is everything, I'll add an entry for a table that obviously needs stats collection. From yesterday, current stats for table a.accp, from pg_STAT_all_tables: "811486381""airscape""accp"16458538988177871456553503047055581967135016364880294000"2024-05-14 14:51:37.158892+00""2024-05-0908:27:45.328468+00""2024-05-14 13:15:31.999198+00"01815170101653 This table has a low row count, but high content turnover. It usually gets auto-analyzed every minute. For today, this hasn't been auto analysed since the update at 3am. > 4) The autovacuum settings you have in effect: vacuum_cost_limit = 2000 log_autovacuum_min_duration = 0 autovacuum_max_workers = 10 all other vacuum settings are defaults. ------------ Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
Attachment
On 5/15/24 01:08, HORDER Philip wrote: > Classified as: {OPEN} > > Backups of this db are created with: > > pg_dump --file=fsm.dmp -Fc --blobs --oids --dbname=lfm --host=localhost --port=nnnn --username=superuser From your original post: "Running Postgres 15.3 ..." --oids have not been supported with pg_dump since v11 When I try it: pg_dump -V pg_dump (PostgreSQL) 15.7 (Ubuntu 15.7-1.pgdg22.04+1) pg_dump -d test -U postgres --oids -f test.sql /usr/lib/postgresql/15/bin/pg_dump: unrecognized option '--oids' So what version of pg_dump are you using? Or you not working on a v15 instance of Postgres? FYI, --blobs is only needed when dumping a specific table or schema, otherwise it happens by default. > > Restore is run with: > > dropdb --port=nnnn --maintenance-db=postgres --username=superuser --if-exists lfm > pg_restore -Fc --create --dbname=postgres --port=nnnn --username=superuser > fsm.dmp > > ------------- > >> 2) From earlier post: '... only analyzes tables in the new db at the point of reload, then shuts off again.' Providethat sequence of events from the Postgres log. > > Log file extract is attached, with object names obfuscated. Still working my way through that. > > From yesterday, current stats for table a.accp, from pg_STAT_all_tables: > "811486381""airscape""accp"16458538988177871456553503047055581967135016364880294000"2024-05-14 14:51:37.158892+00""2024-05-0908:27:45.328468+00""2024-05-14 13:15:31.999198+00"01815170101653 I can't parse this out well enough to really say anything. Though the timestamps seem to indicate recent activity. Using psql do \x select * from pg_stat_all_tables where relname = 'a.accp'; to get an extended output that is easier to read. > > This table has a low row count, but high content turnover. > It usually gets auto-analyzed every minute. > > For today, this hasn't been auto analysed since the update at 3am. > > >> 4) The autovacuum settings you have in effect: > > vacuum_cost_limit = 2000 > log_autovacuum_min_duration = 0 > autovacuum_max_workers = 10 > > all other vacuum settings are defaults. > > ------------ > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > > {OPEN} > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > --oids have not been supported with pg_dump since v11 You're absolutely correct, this command came from my notes, which are obviously out of date. We're running Postgres 15 pg_dump, and I've updated my notes. I'm double-checking the command options with the guy who creates these files, but he's on holiday. I expect we just droppedthe OID option. > FYI, --blobs is only needed when dumping a specific table or schema, otherwise it happens by default. Thanks, I didn’t know that. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/16/24 03:08, HORDER Philip wrote: > Classified as: {OPEN} > >> --oids have not been supported with pg_dump since v11 > > You're absolutely correct, this command came from my notes, which are obviously out of date. > > We're running Postgres 15 pg_dump, and I've updated my notes. > I'm double-checking the command options with the guy who creates these files, but he's on holiday. I expect we just droppedthe OID option. > >> FYI, --blobs is only needed when dumping a specific table or schema, otherwise it happens by default. > > Thanks, I didn’t know that. Did you have chance to do below? > From yesterday, current stats for table a.accp, from pg_STAT_all_tables: >"811486381""airscape""accp"16458538988177871456553503047055581967135016364880294000"2024-05-14 14:51:37.158892+00""2024-05-0908:27:45.328468+00""2024-05-14 13:15:31.999198+00"01815170101653 I can't parse this out well enough to really say anything. Though the timestamps seem to indicate recent activity. Using psql do \x select * from pg_stat_all_tables where relname = 'a.accp'; to get an extended output that is easier to read. > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > > {OPEN} > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > Did you have chance to do below? > Using psql do > \x > select * from pg_stat_all_tables where relname = 'a.accp'; Sorry, missed that bit. From this output you can see that no stats have been collected since the last two overnight updates. postgres=# select * from pg_stat_all_tables where relname='accp'; -[ RECORD 1 ]-------+-------------------------------- relid | 811486381 schemaname | a relname | accp seq_scan | 1654767 seq_tup_read | 901811880 idx_scan | 146070383 idx_tup_fetch | 305949969 n_tup_ins | 2056948 n_tup_upd | 0 n_tup_del | 1662699 n_tup_hot_upd | 0 n_live_tup | 294 n_dead_tup | 75162 n_mod_since_analyze | 0 n_ins_since_vacuum | 75162 last_vacuum | last_autovacuum | 2024-05-15 03:25:16.007386+00 last_analyze | last_autoanalyze | 2024-05-15 03:25:16.008873+00 vacuum_count | 0 autovacuum_count | 1888 analyze_count | 19836 autoanalyze_count | 1715 Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/16/24 07:38, HORDER Philip wrote: > Classified as: {OPEN} > >> Did you have chance to do below? >> Using psql do > >> \x >> select * from pg_stat_all_tables where relname = 'a.accp'; > > Sorry, missed that bit. > From this output you can see that no stats have been collected since the last two overnight updates. Still your contention was that autovacuum quit running after the initial restore and that is not the case. Assuming this is the instance you restored on May 2 2024 then approximate counts are: autovacuum_count 1888/13 days = 145 autovacuums/day = 6/hour autoanalyze_count 1715/13 = 132 autoanalyze/day = 5.5/hr > > postgres=# select * from pg_stat_all_tables where relname='accp'; > -[ RECORD 1 ]-------+-------------------------------- > relid | 811486381 > schemaname | a > relname | accp > seq_scan | 1654767 > seq_tup_read | 901811880 > idx_scan | 146070383 > idx_tup_fetch | 305949969 > n_tup_ins | 2056948 > n_tup_upd | 0 > n_tup_del | 1662699 > n_tup_hot_upd | 0 > n_live_tup | 294 > n_dead_tup | 75162 > n_mod_since_analyze | 0 > n_ins_since_vacuum | 75162 > last_vacuum | > last_autovacuum | 2024-05-15 03:25:16.007386+00 > last_analyze | > last_autoanalyze | 2024-05-15 03:25:16.008873+00 > vacuum_count | 0 > autovacuum_count | 1888 > analyze_count | 19836 > autoanalyze_count | 1715 > > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > > {OPEN} > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} Adrian, > Still your contention was that autovacuum quit running after the initial restore and that is not the case.... This Postgres server has been restarted a few times since 2nd May most recently on Tuesday 14th, hence the more recent analyzestatus. We've had some problems with our data feeds on this integration system, but these are now running again. I'm planning to leave it all alone until I'm back in the office on Tuesday, and run this query again for a few tables andsend you an update. I'm expecting no further stats analysis, (and the performance to be appalling). Thanks for your time. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/16/24 08:59, HORDER Philip wrote: > Classified as: {OPEN} > > Adrian, > >> Still your contention was that autovacuum quit running after the initial restore and that is not the case.... > > This Postgres server has been restarted a few times since 2nd May most recently on Tuesday 14th, hence the more recentanalyze status. Assuming clean shutdowns the statistics will survive restarts. They would be wiped when you drop a database and start over, have an unclean shutdown or you use one of the reset functions from here: https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-FUNCTIONS 28.2.25. Statistics Functions > > We've had some problems with our data feeds on this integration system, but these are now running again. > I'm planning to leave it all alone until I'm back in the office on Tuesday, and run this query again for a few tables andsend you an update. > I'm expecting no further stats analysis, (and the performance to be appalling). From here: https://www.postgresql.org/docs/current/runtime-config-logging.html log_autovacuum_min_duration In addition, when this parameter is set to any value other than -1, a message will be logged if an autovacuum action is skipped due to a conflicting lock or a concurrently dropped relation. > > Thanks for your time. > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > Assuming clean shutdowns the statistics will survive restarts. They would be wiped when you drop a database and start over,have an unclean shutdown or you use one of the reset functions... Yes, stats are permanent, but are not being updated. We don't use any of the pg_stat_reset functions. ------------------------------------- I've left the system alone over the weekend. Here's the timeline: 14th May: Postgres working ok, 1767 log entries for "automatic analyze", mostly in database postgres. 03:30 Jenkins deployed an update, resulting in reload of lfm database. 15th May: Postgres working ok, with 257 logged "automatic analyze" events, up until 03:30 03:30 Jenkins deployed an update, resulting in reload of lfm database. Log of the auto analyse around that update is attached. No further auto analyse logged after 03.30 16th May: 03:30 Jenkins deployed an update, resulting in reload of lfm database. Only 3 logged "automatic analyze" in the whole file, timed at 03:30, for lfm.public and lfm.pg_catalog tables. Test data feed restarts at 2024-05-16 14:54 Daily partitions are created for this data, and each partition from here is showing no vacuum or analyze timestamps 17th May: 03:30 Jenkins deployed an update, resulting in reload of lfm database. Only the reloaded database shows log entries for "automatic analyze", at 03:30, then nothing more. 18th - 21st may: As per 17th Stats output for a sample of tables is attached. You can see that the partitions were auto analysed on the day they were created, and not since. And that new partitions haven’t been analysed at all. (accp does get a manual analyze occasionally, from an SQL function somewhere, but not enough to stop auto analyze from runningas well) Summary: since the reload of lfm database on 15th May, the only "automatic analyze" events logged have been for the lfm database,at the point of reload. No other stats analyze seems to have taken place, on any database on the server since that point, even partitions with overa million rows. Apart from that, Postgres appears to be working normally. I'm sure that another restart of Postgres will restore the auto analyze, until the next pg_restore of lfm. So what's going on? How is it that we're breaking this important function that we shouldn't be able to have any effect on? Thanks for looking, Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
Attachment
On 5/21/24 06:00, HORDER Philip wrote: > Classified as: {OPEN} > >> Assuming clean shutdowns the statistics will survive restarts. They would be wiped when you drop a database and startover, have an unclean shutdown or you use one of the reset functions... > > Yes, stats are permanent, but are not being updated. > We don't use any of the pg_stat_reset functions. > > ------------------------------------- > I've left the system alone over the weekend. > Here's the timeline: > > 14th May: > Postgres working ok, 1767 log entries for "automatic analyze", mostly in database postgres. > 03:30 Jenkins deployed an update, resulting in reload of lfm database. This is where I am getting confused. In your original post you had: To apply an update, we: stop the applications that use LFM, set the user (LFU) to NOLOGIN kill any left-over connections: select pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'lfm' and usename = 'lfu'; drop the existing reference database using the dropDb utility. reload the new file using pg_restore and the postgres super user. set the user (LFU) to LOGIN In other words DROP DATABASE then CREATE DATABASE and reload the schema objects and associated data. Yet your int stats output.txt file has things like the following: -[ RECORD 3 ]-------+-------------------------------------------------- relid | 923130055 schemaname | a relname | cr_pt_e_202405020000_202405030000 seq_scan | 1264 seq_tup_read | 8800722491 idx_scan | 4601405 idx_tup_fetch | 4415621 n_tup_ins | 3851400 n_tup_upd | 15790 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 7166325 n_dead_tup | 0 n_mod_since_analyze | 0 n_ins_since_vacuum | 0 last_vacuum | last_autovacuum | 2024-05-03 09:03:44.810654+00 last_analyze | 2024-05-09 08:44:37.725674+00 last_autoanalyze | 2024-05-03 09:03:58.838664+00 vacuum_count | 0 autovacuum_count | 1 analyze_count | 1 autoanalyze_count | 1 I am having a hard time figuring out how both of the above can be true. Dropping and recreating the database would wipe out the statistics. > Summary: since the reload of lfm database on 15th May, the only "automatic analyze" events logged have been for the lfmdatabase, at the point of reload. > No other stats analyze seems to have taken place, on any database on the server since that point, even partitions withover a million rows. The above is confusing also. In your original post you only referred to the postgres and lfm databases. What other databases are you referring to? As to partitions are you referring to partitions of tables in the lfm database or something else? > Apart from that, Postgres appears to be working normally. > > I'm sure that another restart of Postgres will restore the auto analyze, until the next pg_restore of lfm. > So what's going on? How is it that we're breaking this important function that we shouldn't be able to have any effecton? > > Thanks for looking, > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > I am having a hard time figuring out how both of the above can be true. > Dropping and recreating the database would wipe out the statistics. We have multiple databases on the one Postgres server. The 'postgres' database contains our main application, with tables in schema 'a'. We have two other databases, one of which is 'lfm'. In PgAdmin, there is Databases (3), then those database names. As the superuser, I can connect to any of these databases. When we drop database lfm, and recreate it using pg_restore, the stats collection dies, for all databases on the server:'postgres', 'lfm', 'lfm2' I haven't dropped database 'postgres' So the existing stats on tables in the 'postgres' database remain, but no further stats are collected. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/21/24 9:29 AM, HORDER Philip wrote: > Classified as: {OPEN} > >> I am having a hard time figuring out how both of the above can be true. >> Dropping and recreating the database would wipe out the statistics. > > We have multiple databases on the one Postgres server. > The 'postgres' database contains our main application, with tables in schema 'a'. > We have two other databases, one of which is 'lfm'. My mistake, I was over concentrating on the log messages and did not pay attention to the psql prompt which was showing postgres as the database. > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > > {OPEN} > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
On 5/21/24 9:29 AM, HORDER Philip wrote: > Classified as: {OPEN} > >> I am having a hard time figuring out how both of the above can be true. >> Dropping and recreating the database would wipe out the statistics. > > We have multiple databases on the one Postgres server. > The 'postgres' database contains our main application, with tables in schema 'a'. > We have two other databases, one of which is 'lfm'. > > In PgAdmin, there is Databases (3), then those database names. > As the superuser, I can connect to any of these databases. > > When we drop database lfm, and recreate it using pg_restore, the stats collection dies, for all databases on the server:'postgres', 'lfm', 'lfm2' > I haven't dropped database 'postgres' > > So the existing stats on tables in the 'postgres' database remain, but no further stats are collected. From int-postgres-log-21-05-2024.txt: 2024-05-15 03:31:31.290 GMT [4556]: [3-1] db=lfm,user=superuser,app=[unknown],client=::1 LOG: connection authorized: user=superuser database=lfm application_name=pg_restore That would be the lfm database being restored. What does the log show after that as pertains to autovacuum? > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > > {OPEN} > The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access toit by any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. > > Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 > > Please consider the environment before printing a hard copy of this e-mail. -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} 2024-05-15 03:31:31.290 GMT [4556]: [3-1] db=lfm,user=superuser,app=[unknown],client=::1 LOG: connection authorized: user=superuser database=lfm application_name=pg_restore > That would be the lfm database being restored. > What does the log show after that as pertains to autovacuum? Yep, pg_restore recreates the dropped lfm database. And after that.... nothing. The log just holds connection requests, and a checkpoint every hour. That's it. No "automatic vacuum", or "automatic analyze" anywhere. And nothing any day since then, for a week. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/21/24 13:44, HORDER Philip wrote: > Classified as: {OPEN} > > 2024-05-15 03:31:31.290 GMT [4556]: [3-1] > db=lfm,user=superuser,app=[unknown],client=::1 LOG: connection > authorized: user=superuser database=lfm application_name=pg_restore > >> That would be the lfm database being restored. >> What does the log show after that as pertains to autovacuum? > > Yep, pg_restore recreates the dropped lfm database. > And after that.... nothing. > The log just holds connection requests, and a checkpoint every hour. > That's it. > No "automatic vacuum", or "automatic analyze" anywhere. > And nothing any day since then, for a week. Just for confirmation your settings are still?: autovacuum_max_workers = 10 log_autovacuum_min_duration = 0 You said previously: "The only way I can find of getting the analyzer back is to restart Postgres." To be clear this means: 1) The lfm database is dropped/created. 2) There is a round of autovacuum immediately after the lfm is restored. 3) autovacuum then goes silent. 4) Before the next drop/create lfm you restart the Postgres server and autovacuum starts again. What is in the logs when you do the restart? Is there some process that runs shortly after the drop/create lfm cycle? > > Phil Horder > Database Mechanic > -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > Just for confirmation your settings are still?: > autovacuum_max_workers = 10 > log_autovacuum_min_duration = 0 Yes. > You said previously: > "The only way I can find of getting the analyzer back is to restart Postgres." > > To be clear this means: > 1) The lfm database is dropped/created. Yes, using dropdb and pg_restore > 2) There is a round of autovacuum immediately after the lfm is restored. Yes, some tables in the lfm database, but not all, an apparently random selection, anywhere between 2 and 21 tables, acrossthe lfm schemas, public & pg_catalog. > 3) autovacuum then goes silent. Yes. Dead in a ditch. But with no errors. > 4) Before the next drop/create lfm you restart the Postgres server and autovacuum starts again. I haven't restarted in a week, and the pattern remains, with a bit of analyze at each reload of lfm, and then nothing. > What is in the logs when you do the restart? Nothing notable: 1) denied connections, while restarting 2) authorized connections 3) auto analyze going into overdrive: See below > Is there some process that runs shortly after the drop/create lfm cycle? Not that I can see. Extract of postgres log for a typical restart : 2024-05-13 05:56:19.151 GMT [4688]: [99-1] db=,user=,app=,client= LOG: shutting down 2024-05-13 05:56:19.162 GMT [4688]: [100-1] db=,user=,app=,client= LOG: checkpoint starting: shutdown immediate 2024-05-13 05:56:19.751 GMT [344]: [1-1] db=[unknown],user=[unknown],app=[unknown],client=11.22.33.44 LOG: connection received:host=11.22.33.44 port=54730 2024-05-13 05:56:19.752 GMT [344]: [2-1] db=postgres,user=a_a,app=[unknown],client=11.22.33.44 FATAL: the database systemis shutting down 2024-05-13 05:56:19.843 GMT [4324]: [1-1] db=[unknown],user=[unknown],app=[unknown],client=11.22.33.44 LOG: connection received:host=11.22.33.44 port=54731 2024-05-13 05:56:19.845 GMT [4324]: [2-1] db=lfm2,user=fs_admin,app=[unknown],client=11.22.33.44 FATAL: the database systemis shutting down ... 2024-05-13 05:56:20.319 GMT [4688]: [101-1] db=,user=,app=,client= LOG: checkpoint complete: wrote 326 buffers (0.1%); 0WAL file(s) added, 0 removed, 0 recycled; write=0.073 s, sync=1.006 s, total=1.168 s; sync files=33, longest=0.281 s, average=0.031s; distance=588 kB, estimate=26425 kB 2024-05-13 05:57:13.889 GMT [4220]: [1-1] db=,user=,app=,client= LOG: database system was shut down at 2024-05-13 05:56:20GMT 2024-05-13 05:57:13.921 GMT [4236]: [1-1] db=[unknown],user=[unknown],app=[unknown],client=11.22.33.44 LOG: connection received:host=11.22.33.44 port=54863 2024-05-13 05:57:13.922 GMT [4236]: [2-1] db=postgres,user=a_a,app=[unknown],client=11.22.33.44 FATAL: the database systemis starting up ... 2024-05-13 05:57:14.572 GMT [4868]: [1-1] db=[unknown],user=[unknown],app=[unknown],client=11.22.33.44 LOG: connection received:host=11.22.33.44 port=54869 2024-05-13 05:57:14.608 GMT [4868]: [2-1] db=lfm2,user=fs_admin,app=[unknown],client=11.22.33.44 LOG: connection authenticated:identity="fs_admin" method=md5 (E:/PostgreSQL/15/data/pg_hba.conf:108) 2024-05-13 05:57:14.608 GMT [4868]: [3-1] db=lfm2,user=fs_admin,app=[unknown],client=11.22.33.44 LOG: connection authorized:user=fs_admin database=lfm2 ... 2024-05-13 05:57:29.305 GMT [6048]: [1-1] db=,user=,app=,client= LOG: automatic analyze of table "lfm.fs.ij" avg read rate: 7.813 MB/s, avg write rate: 0.000 MB/s buffer usage: 350 hits, 5 misses, 0 dirtied system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.00 s 2024-05-13 05:57:44.209 GMT [6140]: [1-1] db=,user=,app=,client= LOG: automatic vacuum of table "lfm2.fs.ij": index scans:0 pages: 0 removed, 2 remain, 2 scanned (100.00% of total) tuples: 1 removed, 2 remain, 0 are dead but not yet removable removable cutoff: 597176608, which was 0 XIDs old when operation ended new relfrozenxid: 597176605, which is 372 XIDs ahead of previous value index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 7.481 MB/s, avg write rate: 5.611 MB/s buffer usage: 37 hits, 4 misses, 3 dirtied WAL usage: 3 records, 2 full page images, 15749 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2024-05-13 05:57:59.312 GMT [6064]: [1-1] db=,user=,app=,client= LOG: automatic vacuum of table "postgres.a.acap": indexscans: 1 pages: 0 removed, 1 remain, 1 scanned (100.00% of total) tuples: 56 removed, 24 remain, 0 are dead but not yet removable removable cutoff: 597176633, which was 0 XIDs old when operation ended new relfrozenxid: 597176559, which is 14186 XIDs ahead of previous value index scan needed: 1 pages from table (100.00% of total) had 56 dead item identifiers removed index "app_ctxt_area_pk": pages: 6 in total, 0 newly deleted, 3 currently deleted, 3 reusable index "acap_area_hix": pages: 27 in total, 0 newly deleted, 0 currently deleted, 0 reusable index "acap_process_id_ix": pages: 2 in total, 0 newly deleted, 0 currently deleted, 0 reusable avg read rate: 18.881 MB/s, avg write rate: 6.743 MB/s buffer usage: 104 hits, 14 misses, 5 dirtied WAL usage: 1 records, 1 full page images, 7577 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/22/24 01:33, HORDER Philip wrote: > Classified as: {OPEN} >> 2) There is a round of autovacuum immediately after the lfm is restored. > Yes, some tables in the lfm database, but not all, an apparently random selection, anywhere between 2 and 21 tables, acrossthe lfm schemas, public & pg_catalog. > >> 3) autovacuum then goes silent. > Yes. Dead in a ditch. But with no errors. > >> 4) Before the next drop/create lfm you restart the Postgres server and autovacuum starts again. > I haven't restarted in a week, and the pattern remains, with a bit of analyze at each reload of lfm, and then nothing. > >> What is in the logs when you do the restart? > Nothing notable: > 1) denied connections, while restarting > 2) authorized connections > 3) auto analyze going into overdrive: > See below > >> Is there some process that runs shortly after the drop/create lfm cycle? > Not that I can see. I was hoping more coffee would lead to enlightenment, it did not. It did lead me to do what I should have done at the start which is look at the release notes for 15.x. You are on Postgres 15.3 and current is 15.7. On the path from .5 --> .7 is: https://www.postgresql.org/docs/15/release-15-5.html#id-1.11.6.7.4 Fix race condition in database dropping that could lead to the autovacuum launcher getting stuck (Andres Freund, Will Mortensen, Jacob Speidel) The race could lead to a statistics entry for the removed database remaining present, confusing the launcher's selection of which database to process. > Phil Horder > Database Mechanic -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} > https://www.postgresql.org/docs/15/release-15-5.html#id-1.11.6.7.4 > Fix race condition in database dropping that could lead to the autovacuum launcher getting stuck Wow, that sounds like our problem! I will investigate. Maybe try and find the orphaned stats entry to prove it. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/22/24 08:55, HORDER Philip wrote: > Classified as: {OPEN} > >> https://www.postgresql.org/docs/15/release-15-5.html#id-1.11.6.7.4 > >> Fix race condition in database dropping that could lead to the autovacuum launcher getting stuck > > Wow, that sounds like our problem! Another thought I had is that since you are on Windows maybe there is an AV program seeing the autovacuum as virus-like and killing it. Though I would expect there to be more errors in that case. Anyway it might be worth investigating should an upgrade not fix it. > > I will investigate. Maybe try and find the orphaned stats entry to prove it. It would be a good idea to upgrade to 15.7 in any case to get all the other bug/security fixes. > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > > -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} I actually hadn't thought to look for Postgres bugs, the system is so reliable for us, it just doesn't go wrong! I can trigger the fault be running a reload of the lfm database, and we've been running Postgres 10 & 13 for several yearsnow without seeing this problem. Our symptoms match that bug description exactly, so I'm pretty confident this is the problem. I'll have to push through the right paperwork to get a Postgres update, and that will take a while Thank you so much for your time. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 5/23/24 06:01, HORDER Philip wrote: > Classified as: {OPEN} > > I actually hadn't thought to look for Postgres bugs, the system is so reliable for us, it just doesn't go wrong! > > I can trigger the fault be running a reload of the lfm database, and we've been running Postgres 10 & 13 for several yearsnow without seeing this problem. > > Our symptoms match that bug description exactly, so I'm pretty confident this is the problem. > I'll have to push through the right paperwork to get a Postgres update, and that will take a while While you are doing the paperwork make sure it is for upgrade to 15.7. Also might mention minor upgrades (x in X.x) are a good idea as Postgres, like any software, does get bugs and the minor releases fix them. Point the folks in charge at: https://www.postgresql.org/docs/15/release.html Also minor releases as a rule do not require dump/restore or pg_upgrade or other heavy lifting. There have been exceptions, but they are rare. > > Thank you so much for your time. > > Phil Horder > Database Mechanic > > Thales > Land & Air Systems > -- Adrian Klaver adrian.klaver@aklaver.com
Classified as: {OPEN} Installing 15.7 has indeed fixed the problem. Phil Horder Database Mechanic Thales Land & Air Systems {OPEN} The information contained in this e-mail is confidential. It is intended only for the stated addressee(s) and access to itby any other person is unauthorised. If you are not an addressee, you must not disclose, copy, circulate or in any otherway use or rely on the information contained in this e-mail. Such unauthorised use may be unlawful. If you have receivedthis e-mail in error, please inform the originator immediately and delete it and all copies from your system. Thales UK Limited. A company registered in England and Wales. Registered Office: 350 Longwater Avenue, Green Park, Reading,Berks RG2 6GF. Registered Number: 868273 Please consider the environment before printing a hard copy of this e-mail.
On 2024-06-18 14:59:16 +0000, HORDER Philip wrote: > Classified as: {OPEN} [...] > {OPEN} > The information contained in this e-mail is confidential. It is > intended only for the stated addressee(s) and access to it by any > other person is unauthorised. [...] This is an interesting definition of "OPEN". hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"