Thread: BUG #13985: Segmentation fault on PREPARE TRANSACTION
The following bug has been logged on the website: Bug reference: 13985 Logged by: Chris Tessels Email address: chris.tessels@inergy.nl PostgreSQL version: 9.4.6 Operating system: CentOS release 6.6 (Final) Description: The first time we saw this segmentation fault was on an instance running PostgreSQL 9.4.4. To make sure the problem wasn't already fixed, we upgraded another instance on identical hardware to PostgreSQL 9.4.6. This would also exclude potential hardware related problems like bad memory. We also updated our JDBC driver to the latest version: https://jdbc.postgresql.org/download/postgresql-9.4.1208.jar Although we were able to make both systems crash, we were not able to systematically reproduce the problem. Here is the postgresql.log when the segfault occurred. /data/postgres/pg_log/postgresql.log 2016-02-23 14:45:12.901 CET LOG: server process (PID 519) was terminated by signal 11: Segmentation fault 2016-02-23 14:45:12.901 CET DETAIL: Failed process was running: PREPARE TRANSACTION '131077_AAAAAAAAAAAAAP//CjIGBaCG30NWzFRpAdtPATE=_AAAAAAAAAAAAAP//CjIGBaCG30NWzFRpAdtP4AAAAAIAAAAA' 2016-02-23 14:45:12.901 CET LOG: terminating any other active server processes 2016-02-23 14:45:12.902 CET LOG: archiver process (PID 114131) exited with exit code 1 2016-02-23 14:45:14.226 CET replication_user FATAL: the database system is in recovery mode 2016-02-23 14:45:14.564 CET LOG: all server processes terminated; reinitializing 2016-02-23 14:45:17.404 CET LOG: database system was interrupted; last known up at 2016-02-23 14:31:36 CET 2016-02-23 14:53:06.753 CET mailinfo_ow FATAL: the database system is in recovery mode 2016-02-23 14:53:06.760 CET mailinfo_ow FATAL: the database system is in recovery mode 2016-02-23 14:53:08.830 CET replication_user FATAL: the database system is in recovery mode 2016-02-23 14:53:10.203 CET LOG: record with zero length at 5/1409A540 2016-02-23 14:53:10.203 CET LOG: redo done at 5/1409A3C0 2016-02-23 14:53:10.203 CET LOG: last completed transaction was at log time 2016-02-23 14:44:32.012791+01 2016-02-23 14:53:11.708 CET LOG: MultiXact member wraparound protections are now enabled 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609585 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609594 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609596 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609601 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609591 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609588 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609593 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609584 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609595 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609586 2016-02-23 14:53:11.710 CET LOG: recovering prepared transaction 12609590 2016-02-23 14:53:11.756 CET LOG: autovacuum launcher started 2016-02-23 14:53:11.757 CET LOG: database system is ready to accept connections /var/log/messages Feb 23 14:44:32 [hostname] kernel: postmaster[519]: segfault at 7fc2d8e6b634 ip 000000000066a95b sp 00007fff5ca887d0 error 4 in postgres[400000+566000] Analyzing the coredump gave us the following information: sudo -u postgres gdb -q -c /var/spool/abrt/ccpp-2016-02-23-16\:27\:50-77566/coredump /usr/pgsql-9.4/bin/postgres Core was generated by `postgres: mailinfo_ow mailinfo_ods 10.50.6.6(4188'. Program terminated with signal 11, Segmentation fault. #0 MinimumActiveBackends (min=50) at procarray.c:2472 2472 if (pgxact->xid == InvalidTransactionId) Postgresql version: PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit Operating system and version: [root@[hostname] ccpp-2016-02-23-16:27:50-77566]# cat /etc/redhat-release CentOS release 6.6 (Final) [root@[hostname] ccpp-2016-02-23-16:27:50-77566]# uname -a Linux [hostname] 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux We installed Postgres using the following command: yum install from http://yum.postgresql.org/9.4/redhat/rhel-6.6-x86_64/ Our configuration settings are: name current_setting source archive_command rsync -a %p /data/postgres_archive/%f configuration file archive_mode on configuration file bgwriter_delay 100ms configuration file bgwriter_lru_maxpages 1000 configuration file checkpoint_completion_target 0 configuration file checkpoint_segments 192 configuration file checkpoint_timeout 1h configuration file client_encoding UTF8 client commit_delay 50 configuration file commit_siblings 50 configuration file DateStyle ISO, MDY client default_text_search_config pg_catalog.english configuration file dynamic_shared_memory_type posix configuration file effective_cache_size 16GB configuration file effective_io_concurrency 1 configuration file extra_float_digits 3 session fsync on configuration file full_page_writes off configuration file hot_standby on configuration file hot_standby_feedback on configuration file lc_messages en_US.UTF-8 configuration file lc_monetary en_US.UTF-8 configuration file lc_numeric en_US.UTF-8 configuration file lc_time en_US.UTF-8 configuration file listen_addresses * configuration file log_destination stderr configuration file log_line_prefix %m %u configuration file log_min_messages log configuration file log_rotation_age 0 configuration file log_rotation_size 1000MB configuration file log_statement none configuration file log_timezone CET configuration file log_truncate_on_rotation on configuration file logging_collector on configuration file maintenance_work_mem 2GB configuration file max_connections 400 configuration file max_prepared_transactions 100 configuration file max_stack_depth 2MB environment variable max_standby_archive_delay 2min configuration file max_standby_streaming_delay 2min configuration file max_wal_senders 3 configuration file port 5432 configuration file shared_buffers 32GB configuration file synchronous_commit off configuration file temp_buffers 8MB configuration file TimeZone Europe/Amsterdam client wal_level hot_standby configuration file work_mem 64MB configuration file The program using to connect to PostgreSQL: Wildfly 8.2.0 XA datasource with driver https://jdbc.postgresql.org/download/postgresql-9.4.1208.jar wildfly config: <xa-datasource jndi-name="java:jboss/datasources/MailInfoXADS" pool-name="MailInfoXADS" enabled="true" use-java-context="true"> <xa-datasource-property name="ServerName"> [hostname] </xa-datasource-property> <xa-datasource-property name="PortNumber"> 5432 </xa-datasource-property> <xa-datasource-property name="DatabaseName"> mailinfo_ods </xa-datasource-property> <driver>postgresql-jdbc4</driver> <xa-pool> <min-pool-size>5</min-pool-size> <initial-pool-size>5</initial-pool-size> <max-pool-size>40</max-pool-size> <prefill>true</prefill> </xa-pool> <security> <user-name>mailinfo_ow</user-name> <password>xxxxxx</password> </security> <validation> <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLValidConnectionChecker"/> <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLExceptionSorter"/> </validation> </xa-datasource> <drivers> <driver name="postgresql-jdbc4" module="org.postgresql"> <xa-datasource-class>org.postgresql.xa.PGXADataSource</xa-datasource-class> </driver> </drivers>
chris.tessels@inergy.nl wrote: > Core was generated by `postgres: mailinfo_ow mailinfo_ods 10.50.6.6(4188'. > Program terminated with signal 11, Segmentation fault. > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > 2472 if (pgxact->xid == InvalidTransactionId) It's not surprising that you're not able to make this crash consistently, because it looks like the problem might be in concurrent modifications to the PGXACT array. This routine, MinimumActiveBackends, walks the PGPROC array explicitely without locks. There are comments indicating that this is safe, but evidently something has slipped in there. Apparently this code is trying to dereference an invalid pgxact, but it's not clear to me how this happens. Those structs are allocated in advance, and they are referenced in the code via array indexes, so even if the pgxact doesn't actually hold data about a valid transaction, dereferencing the XID shouldn't cause a crash. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote: > chris.tessels@inergy.nl wrote: > > > Core was generated by `postgres: mailinfo_ow mailinfo_ods 10.50.6.6(4188'. > > Program terminated with signal 11, Segmentation fault. > > > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > > 2472 if (pgxact->xid == InvalidTransactionId) > > It's not surprising that you're not able to make this crash > consistently, because it looks like the problem might be in concurrent > modifications to the PGXACT array. This routine, MinimumActiveBackends, > walks the PGPROC array explicitely without locks. There are comments > indicating that this is safe, but evidently something has slipped in > there. > > Apparently this code is trying to dereference an invalid pgxact, but > it's not clear to me how this happens. Those structs are allocated in > advance, and they are referenced in the code via array indexes, so even > if the pgxact doesn't actually hold data about a valid transaction, > dereferencing the XID shouldn't cause a crash. Well, that code is pretty, uh, questionable. E.g. for int pgprocno = arrayP->pgprocnos[index]; volatile PGPROC *proc = &allProcs[pgprocno]; volatile PGXACT *pgxact = &allPgXact[pgprocno]; there's no guarantee that pgprocno is actually the same index for both lookups and the following if (pgprocno == -1) continue; /* do not count deleted entries */ check. It's perfectly reasonable for a compiler to reload pgprocno from memory, or just always reference it via memory. I presume what happened here is that initially arrayP->pgprocnos[index] was -1, but by the time if (pgprocno == -1) is reached, it changed to a different value. It's also really crummy that we're doing the PGPROC/PGXACT lookups before checking whether pgprocno is -1. At the very least ISTM that we have to make pgprocno volatile (or use a memory barrier - but we don't have sufficient support for those in the older branches), and move the PGPROC/PGXACT lookups after the == -1 check. Andres
On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote: > > chris.tessels@inergy.nl wrote: > > > > > Core was generated by `postgres: mailinfo_ow mailinfo_ods > 10.50.6.6(4188'. > > > Program terminated with signal 11, Segmentation fault. > > > > > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > > > 2472 if (pgxact->xid == InvalidTransactionId) > > > > It's not surprising that you're not able to make this crash > > consistently, because it looks like the problem might be in concurrent > > modifications to the PGXACT array. This routine, MinimumActiveBackends, > > walks the PGPROC array explicitely without locks. There are comments > > indicating that this is safe, but evidently something has slipped in > > there. > > > > Apparently this code is trying to dereference an invalid pgxact, but > > it's not clear to me how this happens. Those structs are allocated in > > advance, and they are referenced in the code via array indexes, so even > > if the pgxact doesn't actually hold data about a valid transaction, > > dereferencing the XID shouldn't cause a crash. > > Well, that code is pretty, uh, questionable. E.g. for > int pgprocno = > arrayP->pgprocnos[index]; > volatile PGPROC *proc = &allProcs[pgprocno]; > volatile PGXACT *pgxact = &allPgXact[pgprocno]; > there's no guarantee that pgprocno is actually the same index for both > lookups and the following > if (pgprocno == -1) > continue; /* do not count > deleted entries */ > check. It's perfectly reasonable for a compiler to reload pgprocno from > memory, or just always reference it via memory. > Hm... this is against my understanding of what a compiler could (or should) do. Do you have a documentation reference or other evidence? A naive disassemble dump on-my-box(tm) suggests that pgprocno is stored on stack (with offset -0x1c) and is referenced from there: ... 0x00000000007f8011 <+53>: mov -0x18(%rbp),%rax 0x00000000007f8015 <+57>: mov -0x20(%rbp),%edx 0x00000000007f8018 <+60>: movslq %edx,%rdx 0x00000000007f801b <+63>: add $0x8,%rdx 0x00000000007f801f <+67>: mov 0x8(%rax,%rdx,4),%eax 0x00000000007f8023 <+71>: mov %eax,-0x1c(%rbp) 0x00000000007f8026 <+74>: mov 0x67b15b(%rip),%rdx # 0xe73188 <allProcs> 0x00000000007f802d <+81>: mov -0x1c(%rbp),%eax # <== here 0x00000000007f8030 <+84>: cltq 0x00000000007f8032 <+86>: imul $0x338,%rax,%rax 0x00000000007f8039 <+93>: add %rdx,%rax 0x00000000007f803c <+96>: mov %rax,-0x10(%rbp) 0x00000000007f8040 <+100>: mov 0x67b149(%rip),%rcx # 0xe73190 <allPgXact> 0x00000000007f8047 <+107>: mov -0x1c(%rbp),%eax # <== here 0x00000000007f804a <+110>: movslq %eax,%rdx 0x00000000007f804d <+113>: mov %rdx,%rax 0x00000000007f8050 <+116>: add %rax,%rax 0x00000000007f8053 <+119>: add %rdx,%rax 0x00000000007f8056 <+122>: shl $0x2,%rax 0x00000000007f805a <+126>: add %rcx,%rax 0x00000000007f805d <+129>: mov %rax,-0x8(%rbp) 0x00000000007f8061 <+133>: cmpl $0xffffffff,-0x1c(%rbp) # <== and here 0x00000000007f8065 <+137>: je 0x7f80a4 <MinimumActiveBackends+200> ... I presume what happened here is that initially arrayP->pgprocnos[index] > was -1, but by the time if (pgprocno == -1) is reached, it changed to a > different value. > > It's also really crummy that we're doing the PGPROC/PGXACT lookups > before checking whether pgprocno is -1. > No doubt here. At the very least ISTM that we have to make pgprocno volatile (or use a > memory barrier - but we don't have sufficient support for those in the > older branches), and move the PGPROC/PGXACT lookups after the == -1 > check. > Use of volatile doesn't change the resulting code dramatically for me. The above is when compiling without optimizations. With -Og I'm getting similar code for the variant with volatile and if no volatile, pgprocno is just loaded into a register as one would expect. -- Alex
"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes: > On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote: > At the very least ISTM that we have to make pgprocno volatile (or use a >> memory barrier - but we don't have sufficient support for those in the >> older branches), and move the PGPROC/PGXACT lookups after the == -1 >> check. > Use of volatile doesn't change the resulting code dramatically for me. Marking pgprocno volatile is silly. What *is* missing is this: - ProcArrayStruct *arrayP = procArray; + volatile ProcArrayStruct *arrayP = procArray; which corresponds directly to what the problem is: the storage arrayP is pointing at may change asynchronously. regards, tom lane
On Thu, Feb 25, 2016 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes: > > On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> > wrote: > > At the very least ISTM that we have to make pgprocno volatile (or use a > >> memory barrier - but we don't have sufficient support for those in the > >> older branches), and move the PGPROC/PGXACT lookups after the == -1 > >> check. > > > Use of volatile doesn't change the resulting code dramatically for me. > > Marking pgprocno volatile is silly. What *is* missing is this: > > - ProcArrayStruct *arrayP = procArray; > + volatile ProcArrayStruct *arrayP = procArray; > > which corresponds directly to what the problem is: the storage arrayP > is pointing at may change asynchronously. > Right, this makes a lot more sense. -- Alex
Hi, On 2016-02-24 12:58:24 +0000, chris.tessels@inergy.nl wrote: > Core was generated by `postgres: mailinfo_ow mailinfo_ods 10.50.6.6(4188'. > Program terminated with signal 11, Segmentation fault. > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > 2472 if (pgxact->xid == InvalidTransactionId) > Could you get us a full backtrace here? It's interesting that you're in an XLogFlush()... > Postgresql version: > commit_delay 50 configuration file > commit_siblings 50 configuration file FWIW, these look unlikely to be beneficial. How did you end up with those parameters? Greetings, Andres Freund
On 2016-02-25 09:51:49 -0500, Tom Lane wrote: > "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes: > > On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote: > > At the very least ISTM that we have to make pgprocno volatile (or use a > >> memory barrier - but we don't have sufficient support for those in the > >> older branches), and move the PGPROC/PGXACT lookups after the == -1 > >> check. > > > Use of volatile doesn't change the resulting code dramatically for me. > > Marking pgprocno volatile is silly. What *is* missing is this: > > - ProcArrayStruct *arrayP = procArray; > + volatile ProcArrayStruct *arrayP = procArray; Well, that'll also force arrayP->numProcs to be loaded from memory every loop iteration. Not sure if we really want that. Otherwise, yes, that's pretty much what I'm suggesting, although I think a temporary volatile cast when loading pgprocno is better here. The important part is to force that pgprocno is loaded *once* *before* the checks. What bothers me about this right now is that we currently write the pgprocno array with: memmove(&arrayP->pgprocnos[index + 1], &arrayP->pgprocnos[index], (arrayP->numProcs - index) * sizeof(int)); arrayP->pgprocnos[index] = proc->pgprocno; arrayP->numProcs++; afaics there's absolutely no guarantee that memmov() will only use aligned sizeof(int) writes. Sure, efficiency reasons make that a sane choice, but a memmove() doing sizeof(char) wide moves would still be correct. Which afaics can mean we end up with completely bogus pgprocno values. Consider e.g. 0x0000ff00 being moved where previously 0x000000ff was set - we could temporarily end up with 0x0000ffff; which might be above MaxBackends. Regards, Andres
On 2016-02-25 09:02:07 +0100, Shulgin, Oleksandr wrote: > On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote: > > > On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote: > > > chris.tessels@inergy.nl wrote: > > > > > > > Core was generated by `postgres: mailinfo_ow mailinfo_ods > > 10.50.6.6(4188'. > > > > Program terminated with signal 11, Segmentation fault. > > > > > > > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > > > > 2472 if (pgxact->xid == InvalidTransactionId) > > > > > > It's not surprising that you're not able to make this crash > > > consistently, because it looks like the problem might be in concurrent > > > modifications to the PGXACT array. This routine, MinimumActiveBackends, > > > walks the PGPROC array explicitely without locks. There are comments > > > indicating that this is safe, but evidently something has slipped in > > > there. > > > > > > Apparently this code is trying to dereference an invalid pgxact, but > > > it's not clear to me how this happens. Those structs are allocated in > > > advance, and they are referenced in the code via array indexes, so even > > > if the pgxact doesn't actually hold data about a valid transaction, > > > dereferencing the XID shouldn't cause a crash. > > > > Well, that code is pretty, uh, questionable. E.g. for > > int pgprocno = > > arrayP->pgprocnos[index]; > > volatile PGPROC *proc = &allProcs[pgprocno]; > > volatile PGXACT *pgxact = &allPgXact[pgprocno]; > > there's no guarantee that pgprocno is actually the same index for both > > lookups and the following > > if (pgprocno == -1) > > continue; /* do not count > > deleted entries */ > > check. It's perfectly reasonable for a compiler to reload pgprocno from > > memory, or just always reference it via memory. > > > > Hm... this is against my understanding of what a compiler could (or > should) do. Do you have a documentation reference or other evidence? Which part does not conform to your expectations? Moving stores/loads around from where they're apparently happening in the C program? Repeatedly reading from memory instead of storing something on the stack? All of those are side effect free for single-threaded programs, which pretty much is the gold standard for doing optimizations in C (at least pre-C11, but even there the above all would be possible). Regards, Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-02-25 09:51:49 -0500, Tom Lane wrote: >> Marking pgprocno volatile is silly. What *is* missing is this: >> >> - ProcArrayStruct *arrayP = procArray; >> + volatile ProcArrayStruct *arrayP = procArray; > Well, that'll also force arrayP->numProcs to be loaded from memory every > loop iteration. Not sure if we really want that. I think we do. The entire point here is not to assume that that storage isn't changing. > What bothers me about this right now is that we currently write the > pgprocno array with: > memmove(&arrayP->pgprocnos[index + 1], &arrayP->pgprocnos[index], > (arrayP->numProcs - index) * sizeof(int)); > arrayP->pgprocnos[index] = proc->pgprocno; > arrayP->numProcs++; > afaics there's absolutely no guarantee that memmov() will only use > aligned sizeof(int) writes. Ugh. That's a separate problem, but yes, it's a problem. Seems like we can either (1) get rid of that memmove in favor of a handwritten loop, or (2) give up on unlocked access to the pgprocnos array. Which performance hit would you rather take? regards, tom lane
Hi, On 2016-02-25 12:20:06 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-02-25 09:51:49 -0500, Tom Lane wrote: > >> Marking pgprocno volatile is silly. What *is* missing is this: > >> > >> - ProcArrayStruct *arrayP = procArray; > >> + volatile ProcArrayStruct *arrayP = procArray; > > > Well, that'll also force arrayP->numProcs to be loaded from memory every > > loop iteration. Not sure if we really want that. > > I think we do. The entire point here is not to assume that that storage > isn't changing. Well, but what does it buy us? Given there's no locks involved, the index < arrayP->numProcs check fundamentally cannot be anything than an optimization, preventing us from looking at all PGPROC/PGXACT entries, no? Reloading it at every loop iteration doesn't seem to give us anything other than some sense of false security? It's a) perfectly possible that numProcs is outdated because we're hitting a local cacheline whereas another backend has it modified locally (store buffers) b) that it's decremented after we entered the loop. > > What bothers me about this right now is that we currently write the > > pgprocno array with: > > memmove(&arrayP->pgprocnos[index + 1], &arrayP->pgprocnos[index], > > (arrayP->numProcs - index) * sizeof(int)); > > arrayP->pgprocnos[index] = proc->pgprocno; > > arrayP->numProcs++; > > afaics there's absolutely no guarantee that memmov() will only use > > aligned sizeof(int) writes. > > Ugh. That's a separate problem, but yes, it's a problem. While it's a separate bug, it seems to me that this could just as well be the reason for crashes triggering this report :/. If pgprocno points behind the end of the allPgXact array, it's quite possible that we're reading zeroes :( > Seems like we can either (1) get rid of that memmove in favor of > a handwritten loop, or (2) give up on unlocked access to the > pgprocnos array. Which performance hit would you rather take? I think 1), while it'll be noticeable, it's probably going to degrade much more gracefully. I think we additionally also should add a security check to MinimumActiveBackends that prevents a pgprocno above the max number of backends to be seen as valid. - Andres
On Thu, Feb 25, 2016 at 6:06 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-02-25 09:02:07 +0100, Shulgin, Oleksandr wrote: > > On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> > wrote: > > > > Hm... this is against my understanding of what a compiler could (or > > should) do. Do you have a documentation reference or other evidence? > > Which part does not conform to your expectations? Moving stores/loads > around from where they're apparently happening in the C program? > > Repeatedly reading from memory instead of storing something on the > stack? > This one. But now that I think about it, this can buy you a free register, so yeah it can be an optimization in some contexts. How does compiler know this is not multi-threaded program? Only because we don't specify any -pthread flags (assuming gcc)? Or it doesn't know/care at all and we have to specifically turn off thread-unsafe optimizations? -- Alex
Andres Freund wrote > Hi, > > On 2016-02-24 12:58:24 +0000, > chris.tessels@ > wrote: > >> Core was generated by `postgres: mailinfo_ow mailinfo_ods >> 10.50.6.6(4188'. >> Program terminated with signal 11, Segmentation fault. >> >> #0 MinimumActiveBackends (min=50) at procarray.c:2472 >> 2472 if (pgxact->xid == InvalidTransactionId) >> > > Could you get us a full backtrace here? It's interesting that you're in > an XLogFlush()... > >> Postgresql version: >> commit_delay 50 configuration file >> commit_siblings 50 configuration file > > FWIW, these look unlikely to be beneficial. How did you end up with > those parameters? > > > Greetings, > > Andres Freund > > > -- > Sent via pgsql-bugs mailing list ( > pgsql-bugs@ > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs Hi Andres, The full backtrace: Core was generated by `postgres: mailinfo_subscription_ow mailinfo_subsc'. Program terminated with signal 11, Segmentation fault. #0 MinimumActiveBackends (min=50) at procarray.c:2472 2472 if (pgxact->xid == InvalidTransactionId) (gdb) bt #0 MinimumActiveBackends (min=50) at procarray.c:2472 #1 0x00000000004c5bfb in XLogFlush (record=79323083800) at xlog.c:2856 #2 0x00000000004bea27 in EndPrepare (gxact=0x7fc2ccf22a00) at twophase.c:1125 #3 0x00000000004b6b02 in PrepareTransaction () at xact.c:2222 #4 0x00000000004b6e07 in CommitTransactionCommand () at xact.c:2738 #5 0x0000000000685719 in finish_xact_command () at postgres.c:2437 #6 0x00000000006899b0 in exec_execute_message (argc=<value optimized out>, argv=<value optimized out>, dbname=0x24e2c30 "mailinfo_subscription_ods", username=<value optimized out>) at postgres.c:1974 #7 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x24e2c30 "mailinfo_subscription_ods", username=<value optimized out>) at postgres.c:4142 #8 0x0000000000634929 in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4285 #9 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3948 #10 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1679 #11 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1287 #12 0x00000000005cc248 in main (argc=3, argv=0x24b87e0) at main.c:228 I'm not sure why we deviated from the default settings for 'commit_delay' and 'commit_siblings'. I went through our documentation and could not find an entry why this setting was changed. We are running a test with the default settings to see if this has any impact. Regards, Chris -- View this message in context: http://postgresql.nabble.com/BUG-13985-Segmentation-fault-on-PREPARE-TRANSACTION-tp5889158p5889384.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Andres Freund <andres@anarazel.de> writes: > On 2016-02-25 12:20:06 -0500, Tom Lane wrote: >> Seems like we can either (1) get rid of that memmove in favor of >> a handwritten loop, or (2) give up on unlocked access to the >> pgprocnos array. Which performance hit would you rather take? > I think 1), while it'll be noticeable, it's probably going to degrade > much more gracefully. > I think we additionally also should add a security check to > MinimumActiveBackends that prevents a pgprocno above the max number of > backends to be seen as valid. So we have three proposals on the table: * volatile-ize arrayP * range-check the pgprocno after we fetch it * replace memmove(s) with int-at-a-time move loop(s) I wonder whether we couldn't leave the memmoves alone if we do the other two things. It would mean that MinimumActiveBackends would sometimes miss examining some backends, but that's going to happen anyway given its use of unlocked access. regards, tom lane