Thread: Less available diskspace after crashed CLUSTER

Less available diskspace after crashed CLUSTER

From
"Sander Steffann"
Date:
Hi,

I was running a CLUSTER on a big database (approx. 8G) and I didn't
anticipate the diskspace usage, so I ran out of diskspace. This is on
PostgreSQL 7.4.6. The backend crashed with the following log messages:

2005-01-15 15:17:46 [30605] PANIC:  PANIC:  could not write to file
"/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op apparaat
STATEMENT:  CLUSTER;
2005-01-15 15:17:46 [11245] LOG:  server process (PID 30605) was terminated
by signal 6
2005-01-15 15:17:46 [11245] LOG:  terminating any other active server
processes
2005-01-15 15:17:46 [3378] WARNING:  terminating connection because of crash
of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
<... a lot more of the same messages from other backends ...>
2005-01-15 15:17:46 [11245] LOG:  all server processes terminated;
reinitializing
2005-01-15 15:17:46 [3389] LOG:  database system was interrupted at
2005-01-15 15:17:44 CET
2005-01-15 15:17:46 [3389] LOG:  checkpoint record is at 12/9E2976B0
2005-01-15 15:17:46 [3389] LOG:  redo record is at 12/9E00A790; undo record
is at 0/0; shutdown FALSE
2005-01-15 15:17:46 [3389] LOG:  next transaction ID: 13482165; next OID:
1694415
2005-01-15 15:17:46 [3389] LOG:  database system was not properly shut down;
automatic recovery in progress
2005-01-15 15:17:46 [3389] LOG:  redo starts at 12/9E00A790
2005-01-15 15:18:03 [3389] LOG:  could not open file
"/var/lib/pgsql/data/pg_xlog/00000012000000BF" (log file 18, segment 191):
Onbekend bestand of map
2005-01-15 15:18:03 [3389] LOG:  redo done at 12/BEFFD810
2005-01-15 15:18:13 [3389] LOG:  recycled transaction log file
"000000120000009B"
2005-01-15 15:18:13 [3389] LOG:  recycled transaction log file
"000000120000009C"
2005-01-15 15:18:13 [3389] LOG:  recycled transaction log file
"000000120000009D"
2005-01-15 15:18:13 [3389] LOG:  database system is ready

Translations:
 Geen ruimte over op apparaat -> No space left on device
 Onbekend bestand of map -> Unknown file or directory

The database is running nice again, but I have the feeling I lost some
diskspace... I suspected a temp file, so I stopped the postmaster, looked in
/var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it. But
now I still have 1G less available diskspace than before the CLUSTER.

Where should I look next? I would like the diskspace back :-)

Thanks,
Sander.



Re: Less available diskspace after crashed CLUSTER

From
Tom Lane
Date:
"Sander Steffann" <sander@steffann.nl> writes:
> 2005-01-15 15:17:46 [30605] PANIC:  PANIC:  could not write to file
> "/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op apparaat
> STATEMENT:  CLUSTER;

Hmm.  I wonder why we have XLogFileInit forcing a PANIC for this.  At
one time it was called only from critical sections and so the error
would become a panic anyway, but that's not true anymore...

> The database is running nice again, but I have the feeling I lost some
> diskspace... I suspected a temp file, so I stopped the postmaster, looked in
> /var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it. But
> now I still have 1G less available diskspace than before the CLUSTER.

> Where should I look next? I would like the diskspace back :-)

I'm guessing that the new table/index files created during the CLUSTER
didn't get released after the PANIC.  Look for files that are not
referenced by any relfilenode value in pg_class.  Don't forget that
there may be multiple sections (filenode.1, etc).  See
http://developer.postgresql.org/docs/postgres/storage.html
(which is for 8.0, but everything except the material on tablespaces
applies to 7.4).

Also, did you get rid of the xlog temp file mentioned in the message?

            regards, tom lane

Re: Less available diskspace after crashed CLUSTER

From
"Sander Steffann"
Date:
Hi Tom,

>> 2005-01-15 15:17:46 [30605] PANIC:  PANIC:  could not write to file
>> "/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op
>> apparaat
>> STATEMENT:  CLUSTER;
>
> Hmm.  I wonder why we have XLogFileInit forcing a PANIC for this.  At
> one time it was called only from critical sections and so the error
> would become a panic anyway, but that's not true anymore...

I'll leave that part to you :-)  I don't know much about PostgreSQL
internals...

>> The database is running nice again, but I have the feeling I lost some
>> diskspace... I suspected a temp file, so I stopped the postmaster, looked
>> in
>> /var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it.
>> But
>> now I still have 1G less available diskspace than before the CLUSTER.
>
>> Where should I look next? I would like the diskspace back :-)
>
> I'm guessing that the new table/index files created during the CLUSTER
> didn't get released after the PANIC.  Look for files that are not
> referenced by any relfilenode value in pg_class.  Don't forget that
> there may be multiple sections (filenode.1, etc).  See
> http://developer.postgresql.org/docs/postgres/storage.html
> (which is for 8.0, but everything except the material on tablespaces
> applies to 7.4).

Using the oid2name tool I get this:
# for x in *; do echo -n "$x: "; oid2name -q -d abacus -o $x; done
1247: 1247   = pg_type
1249: 1249   = pg_attribute
1255: 1255   = pg_proc
1259: 1259   = pg_class
16384: 16384  = pg_attrdef
16386: 16386  = pg_constraint
16388: 16388  = pg_inherits
16390: 16390  = pg_index
16392: 16392  = pg_operator
16394: 16394  = pg_opclass
16396: 16396  = pg_am
16398: 16398  = pg_amop
16400: 16400  = pg_amproc
16402: 16402  = pg_language
16404: 16404  = pg_largeobject
16406: 16406  = pg_aggregate
16408: 16408  = pg_statistic
16410: 16410  = pg_rewrite
16412: 16412  = pg_trigger
16414: 16414  = pg_listener
16416: 16416  = pg_description
16418: 16418  = pg_cast
16595: 16595  = pg_namespace
16597: 16597  = pg_conversion
16599: 16599  = pg_depend
16601: 16601  = pg_aggregate_fnoid_index
16602: 16602  = pg_am_name_index
16603: 16603  = pg_am_oid_index
16604: 16604  = pg_amop_opr_opc_index
16605: 16605  = pg_amop_opc_strategy_index
16606: 16606  = pg_amproc_opc_procnum_index
16607: 16607  = pg_attrdef_adrelid_adnum_index
16608: 16608  = pg_attrdef_oid_index
16609: 16609  = pg_attribute_relid_attnam_index
16610: 16610  = pg_attribute_relid_attnum_index
16611: 16611  = pg_cast_oid_index
16612: 16612  = pg_cast_source_target_index
16613: 16613  = pg_class_oid_index
16614: 16614  = pg_class_relname_nsp_index
16615: 16615  = pg_constraint_conname_nsp_index
16616: 16616  = pg_constraint_conrelid_index
16617: 16617  = pg_constraint_contypid_index
16618: 16618  = pg_constraint_oid_index
16619: 16619  = pg_conversion_default_index
16620: 16620  = pg_conversion_name_nsp_index
16621: 16621  = pg_conversion_oid_index
16624: 16624  = pg_depend_depender_index
16625: 16625  = pg_depend_reference_index
16626: 16626  = pg_description_o_c_o_index
16629: 16629  = pg_index_indrelid_index
16630: 16630  = pg_index_indexrelid_index
16631: 16631  = pg_inherits_relid_seqno_index
16632: 16632  = pg_language_name_index
16633: 16633  = pg_language_oid_index
16634: 16634  = pg_largeobject_loid_pn_index
16635: 16635  = pg_namespace_nspname_index
16636: 16636  = pg_namespace_oid_index
16637: 16637  = pg_opclass_am_name_nsp_index
16638: 16638  = pg_opclass_oid_index
16639: 16639  = pg_operator_oid_index
16640: 16640  = pg_operator_oprname_l_r_n_index
16641: 16641  = pg_proc_oid_index
16642: 16642  = pg_proc_proname_args_nsp_index
16643: 16643  = pg_rewrite_oid_index
16644: 16644  = pg_rewrite_rel_rulename_index
16647: 16647  = pg_statistic_relid_att_index
16648: 16648  = pg_trigger_tgconstrname_index
16649: 16649  = pg_trigger_tgconstrrelid_index
16650: 16650  = pg_trigger_tgrelid_tgname_index
16651: 16651  = pg_trigger_oid_index
16652: 16652  = pg_type_oid_index
16653: 16653  = pg_type_typname_nsp_index
16656: 16656  = pg_toast_16384
16658: 16658  = pg_toast_16384_index
16659: 16659  = pg_toast_16386
16661: 16661  = pg_toast_16386_index
16665: 16665  = pg_toast_16416
16667: 16667  = pg_toast_16416_index
16671: 16671  = pg_toast_1255
16673: 16673  = pg_toast_1255_index
16674: 16674  = pg_toast_16410
16676: 16676  = pg_toast_16410_index
16680: 16680  = pg_toast_16408
16682: 16682  = pg_toast_16408_index
1688060: No tables with that oid found
1688062: 1688062 = pg_toast_1688060
1688064: 1688064 = pg_toast_1688060_index
1688065: No tables with that oid found
1688066: No tables with that oid found
1688068: 1688068 = pg_toast_1688066
1688070: 1688070 = pg_toast_1688066_index
1688071: No tables with that oid found
1688072: No tables with that oid found
1688074: 1688074 = pg_toast_1688072
1688076: 1688076 = pg_toast_1688072_index
1688078: No tables with that oid found
1688079: No tables with that oid found
1688085: No tables with that oid found
1688086: No tables with that oid found
1688088: No tables with that oid found
1688089: No tables with that oid found
1688091: 1688091 = pg_toast_1688089
1688093: 1688093 = pg_toast_1688089_index
1688094: No tables with that oid found
1688095: No tables with that oid found
1688097: No tables with that oid found
1688098: No tables with that oid found
1688101: No tables with that oid found
1688102: No tables with that oid found
1688104: 1688104 = pg_toast_1688102
1688106: 1688106 = pg_toast_1688102_index
1688107: No tables with that oid found
1688111: No tables with that oid found
1688114: No tables with that oid found
1688116: 1688116 = pg_toast_1688114
1688118: 1688118 = pg_toast_1688114_index
1688119: No tables with that oid found
1688120: No tables with that oid found
1688122: 1688122 = pg_toast_1688120
1688124: 1688124 = pg_toast_1688120_index
1688125: No tables with that oid found
1688133: No tables with that oid found
1688135: 1688135 = pg_toast_1688133
1688137: 1688137 = pg_toast_1688133_index
1688138: No tables with that oid found
1688140: No tables with that oid found
1688142: 1688142 = pg_toast_1688140
1688144: 1688144 = pg_toast_1688140_index
1688145: No tables with that oid found
1688146: No tables with that oid found
1688147: No tables with that oid found
1688147.1: No tables with that oid found
1688150: No tables with that oid found
1688152: No tables with that oid found
1688155: No tables with that oid found
17070: 17070  = sql_features
17072: 17072  = pg_toast_17070
17074: 17074  = pg_toast_17070_index
17075: 17075  = sql_implementation_info
17077: 17077  = pg_toast_17075
17079: 17079  = pg_toast_17075_index
17080: 17080  = sql_languages
17082: 17082  = pg_toast_17080
17084: 17084  = pg_toast_17080_index
17085: 17085  = sql_packages
17087: 17087  = pg_toast_17085
17089: 17089  = pg_toast_17085_index
17090: 17090  = sql_sizing
17092: 17092  = pg_toast_17090
17094: 17094  = pg_toast_17090_index
17095: 17095  = sql_sizing_profiles
17097: 17097  = pg_toast_17095
17099: 17099  = pg_toast_17095_index
17246: 17246  = logfile_progress
17248: 17248  = pg_toast_17246
17250: 17250  = pg_toast_17246_index
17251: 17251  = servernames_id_seq
17253: 17253  = servernames
17257: 17257  = pg_toast_17253
17259: 17259  = pg_toast_17253_index
17260: 17260  = serveraliassen_id_seq
17262: 17262  = serveraliassen
17265: 17265  = pg_toast_17262
17267: 17267  = pg_toast_17262_index
17299: 17299  = bezoekers
17299.1: 17299  = bezoekers
17302: 17302  = pg_toast_17299
17304: 17304  = pg_toast_17299_index
17306: 17306  = providers_id_seq
17308: 17308  = providers
17311: 17311  = pg_toast_17308
17313: 17313  = pg_toast_17308_index
17314: 17314  = provider_domains_id_seq
17316: 17316  = provider_domains
17319: 17319  = pg_toast_17316
17321: 17321  = pg_toast_17316_index
17326: 17326  = asn_names
17328: 17328  = pg_toast_17326
17330: 17330  = pg_toast_17326_index
17334: 17334  = crawlers_id_seq
17336: 17336  = crawlers
17339: 17339  = pg_toast_17336
17341: 17341  = pg_toast_17336_index
17348: 17348  = browsers_id_seq
17350: 17350  = browsers
17353: 17353  = pg_toast_17350
17355: 17355  = pg_toast_17350_index
17359: 17359  = operating_systems_id_seq
17361: 17361  = operating_systems
17364: 17364  = pg_toast_17361
17366: 17366  = pg_toast_17361_index
17376: 17376  = landcodes
17378: 17378  = pg_toast_17376
17380: 17380  = pg_toast_17376_index
17393: 17393  = route_asn
17395: 17395  = ip_range_country
17398: 17398  = pg_toast_17395
17400: 17400  = pg_toast_17395_index
17401: 17401  = unknown_user_agents
17404: 17404  = pg_toast_17401
17406: 17406  = pg_toast_17401_index
17429: 17429  = route_asn_nexthop
17430: 17430  = route_asn_asn
17431: 17431  = ip_range_start_end
17434: 17434  = server_session
17435: 17435  = un_logfile_progress
17437: 17437  = servernames_pkey
17439: 17439  = servernames_name_key
17441: 17441  = serveraliassen_pkey
17443: 17443  = serveraliassen_alias_key
17445: 17445  = un_serveraliassen
17457: 17457  = un_session
17459: 17459  = providers_pkey
17461: 17461  = providers_name_key
17463: 17463  = provider_domains_pkey
17465: 17465  = provider_domains_domain_key
17469: 17469  = asn_names_pkey
17473: 17473  = crawlers_pkey
17475: 17475  = un_crawlers
17479: 17479  = browsers_pkey
17481: 17481  = un_browsers
17485: 17485  = operating_systems_pkey
17487: 17487  = un_operating_systems
17493: 17493  = landcodes_pkey
17499: 17499  = route_asn_pkey
17501: 17501  = ip_range_country_pkey

I suspect all the files with "No tables with that oid found" can be removed,
but I don't know exactly how oid2name works, so I'll check them against
pg_class before removing them.

> Also, did you get rid of the xlog temp file mentioned in the message?

It didn't exist anymore after the crash.

Thanks for the help!
Sander.



Re: Less available diskspace after crashed CLUSTER

From
"Sander Steffann"
Date:
Hi,

Sorry for replying to myself, but I had to point out that:

>> I'm guessing that the new table/index files created during the CLUSTER
>> didn't get released after the PANIC.  Look for files that are not
>> referenced by any relfilenode value in pg_class.
>
> Using the oid2name tool I get this:
> [...]
> I suspect all the files with "No tables with that oid found" can be
> removed, but I don't know exactly how oid2name works, so I'll check them
> against pg_class before removing them.

This is ofcourse not a good idea. There is quite a difference between the
oid and the relfilenode. Comparing the output of:
 SELECT DISTINCT relfilenode FROM pg_class ORDER BY relfilenode;
to the directory listing gave a much better result.

The files I found in this way also had an atime which was the same as the
time of the crash.

Just to prevent someone else from making the mistake I made in my previous
message...

Thanks!
Sander.