'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2) - Mailing list pgsql-bugs
From | Marco Boeringa |
---|---|
Subject | 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2) |
Date | |
Msg-id | ed211b60-2b76-4d98-4284-328916068036@boeringa.demon.nl Whole thread Raw |
Responses |
Fwd: 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2)
Re: 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2) |
List | pgsql-bugs |
Hi all, I am running into a weird issue I haven't encountered before. This is PostgreSQL 15.2 (Ubuntu 15.2-1.pgdg22.04+1) running on Ubuntu 22.04.2 LTS, with PostGIS (POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="150" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"). Now the PostgreSQL database cluster contains multiple databases. In one of these, an OpenStreetMap 'Planet' size database, a CLUSTER operation is running against a very large table of buildings (> 400M records). This runs fine, and I can see the actual progress in pgAdmin by inspecting the 'pg_stat_progress_cluster' view. So far everything seems OK. However, as part of two secondary Python scripts running that do all kinds of batch operations against spatial tables (generalization and such) in two other databases on the same database cluster, at the very end of the processing, these initialize 'CLUSTER' operations as well. Both of these now fail with a: "missing chunk number 0 for toast value X in pg_toast_Y" type error. After these errors, I see the CLUSTER command listed as well in the PostgreSQL main log, but the ERROR line mentioned above is listed just above the actual CLUSTER statement in the log. No other anomaly / error is visible there. The CLUSTER command just fails though on the two secondary databases. The time displayed in the log is exactly the same for these two log lines one after another, also the number between the square brackets listed immediately after the "CET" of the time are the same for the same database. The pg_toast_Y table names mentioned in the errors differ by the way between the two databases. If I inspect the table that is mentioned in the CLUSTER command in the log, the very first one that is going to be CLUSTERed in the secondary databases, the table can be displayed fine in DBeaver, even spatially. Checking the geometries with PostGIS's 'ST_IsValid' also shows them to be fine, no errors in the geometries. Everything else seems OK, database cluster is still up&running, all tables in all databases can be viewed and accessed in DBeaver and pgAdmin. In all other rare cases where I have seen the "missing chunk number 0 for toast value X in pg_toast_Y" type error, the generalization processing already failed somewhere half-way, never at the end when CLUSTER is initiated. The error therefore seems weird to occur at that point. I have now been able to reproduce this twice, with two tiny database's filled with just Liechtenstein data, while the Planet database is still in the stage of CLUSTERing the huge building table. Obviously, it will be interesting to see what happens if the CLUSTER operation on the Planet database is finished, but this will still take many hours, especially since other large tables need to be CLUSTERed in the same database, and I do not want to break off this process right now after already gotten this far. I will report back once it is finished, but does anyone have a clue why this might happen? The system has plenty of free space by the way (4x2TB NVMe), and the database runs on a professional workstation with ECC RAM, and used RAM also doesn't seem an issue. Marco
pgsql-bugs by date: