Thread: Postgres 10.3 delete on partition table cannot execute, out of memory.
Bug
Partitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.
mark-thinkpad-x230
description: Notebook
product: 2306CTO (LENOVO_MT_2306)
vendor: LENOVO
version: ThinkPad X230
serial: PK12NFA
width: 64 bits
capabilities: smbios-2.7 dmi-2.7 vsyscall32
configuration: administrator_password= disabled chassis=notebook family=ThinkPad X230 power-on_password=disabled sku=LENOVO_MT_2306 uuid=814592E2-9652-CB11-A65F- 83069E2A672A
*-core
description: Motherboard
product: 2306CTO
vendor: LENOVO
physical id: 0
version: Not Defined
serial: 1ZST039410C
slot: Not Available
*-cpu
description: CPU
product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
serial: None
slot: CPU Socket - U3E1
size: 2208MHz
capacity: 2600MHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq
configuration: cores=2 enabledcores=2 threads=4
P.S. Please feel free to contact me for more information.
RegardsPartitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.
Default Postgres Configuration with exception
max_locks_per_transaction = 1024
Postgres Logs
2018-03-15 14:26:40.340 AEDT [7120] LOG: server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL: Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG: terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG: all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG: database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG: database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG: invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG: database system is ready to accept connections
2018-03-15 14:26:40.340 AEDT [7120] LOG: server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL: Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG: terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG: all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG: database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG: database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG: invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG: database system is ready to accept connections
System
ubuntu-16.4 LTS
mark-thinkpad-x230
description: Notebook
product: 2306CTO (LENOVO_MT_2306)
vendor: LENOVO
version: ThinkPad X230
serial: PK12NFA
width: 64 bits
capabilities: smbios-2.7 dmi-2.7 vsyscall32
configuration: administrator_password=
*-core
description: Motherboard
product: 2306CTO
vendor: LENOVO
physical id: 0
version: Not Defined
serial: 1ZST039410C
slot: Not Available
*-cpu
description: CPU
product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
serial: None
slot: CPU Socket - U3E1
size: 2208MHz
capacity: 2600MHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq
configuration: cores=2 enabledcores=2 threads=4
P.S. Please feel free to contact me for more information.
Mark Magnus
Software Developer
Impact Data Pty Ltd
Data Driven. Direct. Digital.
a: Level 1/428 Little Burke Street, Melbourne, VIC 3000
Hi Mark, To me it looks like OOM was triggered. Could you post to us a bit more info? like how much RAM on the machine, Postgres memory settings (like work_mem shared buffers maintenance_work_mem , max_connections),what was happening on the DB at the time of the incident and linux logs (if OOM was triggered, then is likelyyou will find something like that), please? Regards, Fabio Pardi On 03/15/2018 05:15 AM, Mark Magnus wrote: > Bug > > Partitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign key.Any delete operation i.e. > "delete from contacts where id = ?" or > "delete from contacts where id = ? and account_id = ?" or > "delete from contacts where account_id = ?" > results in out of memory condition. > > Default Postgres Configuration with exception > max_locks_per_transaction = 1024 > > Postgres Logs > > 2018-03-15 14:26:40.340 AEDT [7120] LOG: server process (PID 8177) was terminated by signal 9: Killed > 2018-03-15 14:26:40.340 AEDT [7120] DETAIL: Failed process was running: delete from contacts where id = 82398 and account_id= 9000 > 2018-03-15 14:26:40.354 AEDT [7120] LOG: terminating any other active server processes > 2018-03-15 14:26:40.367 AEDT [3821] WARNING: terminating connection because of crash of another server process > 2018-03-15 14:26:40.367 AEDT [3821] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. > 2018-03-15 14:26:40.367 AEDT [3821] HINT: In a moment you should be able to reconnect to the database and repeat yourcommand. > 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING: terminating connection because of crash of another server process > 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL: The postmaster has commanded this server process to roll backthe current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. > 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT: In a moment you should be able to reconnect to the database andrepeat your command. > 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING: terminating connection because of crash ofanother server process > 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL: The postmaster has commanded this server processto roll back the current transaction and exit, because another server process exited abnormally and possibly corruptedshared memory. > 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT: In a moment you should be able to reconnect tothe database and repeat your command. > 2018-03-15 14:26:40.569 AEDT [7120] LOG: all server processes terminated; reinitializing > 2018-03-15 14:26:40.639 AEDT [9244] LOG: database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT > 2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL: the database system is in recovery mode > 2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL: the database system is in recovery mode > 2018-03-15 14:26:44.778 AEDT [9244] LOG: database system was not properly shut down; automatic recovery in progress > 2018-03-15 14:26:44.798 AEDT [9244] LOG: redo starts at 0/56782CE0 > 2018-03-15 14:26:44.798 AEDT [9244] LOG: invalid record length at 0/56782D18: wanted 24, got 0 > 2018-03-15 14:26:44.798 AEDT [9244] LOG: redo done at 0/56782CE0 > 2018-03-15 14:26:44.870 AEDT [7120] LOG: database system is ready to accept connections > > > System > > ubuntu-16.4 LTS > > mark-thinkpad-x230 > description: Notebook > product: 2306CTO (LENOVO_MT_2306) > vendor: LENOVO > version: ThinkPad X230 > serial: PK12NFA > width: 64 bits > capabilities: smbios-2.7 dmi-2.7 vsyscall32 > configuration: administrator_password=disabled chassis=notebook family=ThinkPad X230 power-on_password=disabled sku=LENOVO_MT_2306uuid=814592E2-9652-CB11-A65F-83069E2A672A > *-core > description: Motherboard > product: 2306CTO > vendor: LENOVO > physical id: 0 > version: Not Defined > serial: 1ZST039410C > slot: Not Available > *-cpu > description: CPU > product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz > vendor: Intel Corp. > physical id: 1 > bus info: cpu@0 > version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz > serial: None > slot: CPU Socket - U3E1 > size: 2208MHz > capacity: 2600MHz > width: 64 bits > clock: 100MHz > capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopologynonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority eptvpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq > configuration: cores=2 enabledcores=2 threads=4 > > P.S. Please feel free to contact me for more information. > > Regards > > Mark Magnus > Software Developer > > Impact Data Pty Ltd > Data Driven. Direct. Digital. > > a: Level 1/428 Little Burke Street, Melbourne, VIC 3000 > e: mark.magnus@impactdata.com.au <mailto:jay@impactdata.com.au> > w: www.impactdata.com.au <http://www.impactdata.com.au/>
Hi. On 2018/03/15 13:15, Mark Magnus wrote: > Bug > > Partitioned table has 7202 partitions. No partition contains more than 50 > records. Partitioning is done on a foreign key. Any delete operation i.e. > "delete from contacts where id = ?" or > "delete from contacts where id = ? and account_id = ?" or > "delete from contacts where account_id = ?" > results in out of memory condition. > > Default Postgres Configuration with exception > max_locks_per_transaction = 1024 As Fabio says, it seems most likely that OOM was triggered while planning the delete operation. I can reproduce OOM being triggered on my modest development machine, so perhaps that's what's happening in your case too. This is unfortunately expected, given that the underlying planning mechanism cannot cope beyond a few hundred partitions. :-( See a relevant note in the documentation; last line of the page at this link: https://www.postgresql.org/docs/devel/static/ddl-partitioning.html. Until things improve in that area, one workaround might be to perform the delete operation directly on the partition, as it's possible to do that. Or redesign your schema to use less number of partitions. Thanks, Amit