Thread: Fault injection framework
Hello
Fault injection was discussed a few months ago at PGCon in Ottawa. At least a few folks showed interest and so I would like to present what we have been using in Greenplum.
The attached patch set contains the fault injector framework ported to PostgreSQL master. It provides ability to define points of interest in backend code and then inject faults at those points from SQL. Also included is an isolation test to simulate a speculative insert conflict scenario that was found to be rather cumbersome to implement using advisory locks, see [1]. The alternative isolation spec using fault injectors seems much simpler to understand.
Asim
[1] CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com
Fault injection was discussed a few months ago at PGCon in Ottawa. At least a few folks showed interest and so I would like to present what we have been using in Greenplum.
The attached patch set contains the fault injector framework ported to PostgreSQL master. It provides ability to define points of interest in backend code and then inject faults at those points from SQL. Also included is an isolation test to simulate a speculative insert conflict scenario that was found to be rather cumbersome to implement using advisory locks, see [1]. The alternative isolation spec using fault injectors seems much simpler to understand.
Asim
[1] CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com
Attachment
- 0001-Framework-to-inject-faults-from-SQL-tests.patch
- 0003-Speculative-insert-isolation-test-spec-using-fault-i.patch
- 0004-Run-tests-with-faults-if-faultinjector-was-compiled-.patch
- 0002-Add-syntax-to-declare-a-step-that-is-expected-to-blo.patch
- 0005-Isolation-schedule-for-tests-that-inject-faults.patch
On Thu, Aug 22, 2019 at 07:45:09PM +0530, Asim R P wrote: > Fault injection was discussed a few months ago at PGCon in Ottawa. At > least a few folks showed interest and so I would like to present what we > have been using in Greenplum. > > The attached patch set contains the fault injector framework ported to > PostgreSQL master. It provides ability to define points of interest in > backend code and then inject faults at those points from SQL. Also > included is an isolation test to simulate a speculative insert conflict > scenario that was found to be rather cumbersome to implement using advisory > locks, see [1]. The alternative isolation spec using fault injectors seems > much simpler to understand. You may want to double-check whitespaces in your patch set, and 0002 does not apply because of conflicts in isolationtester.h (my fault!). 0002 is an independent feature, so I would keep it out of the fault framework for integration. There has been a argument from Alvaro more convincing than mine about the use of a separate keyword, hence removing a dependency with steps: https://www.postgresql.org/message-id/20190823153825.GA11405@alvherre.pgsql It would be good also to have a test case which exercises it, without the need of the fault framework or its dedicated schedule. Patches 0003, 0004 and 0005 could just be grouped together, they deal about the same thing. My first impressions about this patch is that it is very intrusive. Could you explain the purpose of am_faultinjector? That's a specific connection string parameter which can be used similarly to replication for WAL senders? Couldn't there be an equivalent with a SUSET GUC? It may be interesting to see which parts of this framework could be moved into an extension loaded with shared_preload_libraries, one thing being the shared memory initialization part. At the end it would be interesting to not have a dependency with a compile-time flag. Things like exec_fault_injector_command() need to be more documented. It is hard to guess what it is being used for. -- Michael
Attachment
On Tue, Aug 27, 2019 at 12:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> You may want to double-check whitespaces in your patch set, and 0002
> does not apply because of conflicts in isolationtester.h (my fault!).
> 0002 is an independent feature, so I would keep it out of the fault
> framework for integration. There has been a argument from Alvaro
> more convincing than mine about the use of a separate keyword, hence
> removing a dependency with steps:
> https://www.postgresql.org/message-id/20190823153825.GA11405@alvherre.pgsql
> It would be good also to have a test case which exercises it, without
> the need of the fault framework or its dedicated schedule.
>
> My first impressions about this patch is that it is very intrusive.
> Could you explain the purpose of am_faultinjector? That's a specific
> connection string parameter which can be used similarly to replication
> It may be interesting to see which parts of this framework could be
> moved into an extension loaded with shared_preload_libraries, one
> thing being the shared memory initialization part. At the end it
> would be interesting to not have a dependency with a compile-time
>
> Things like exec_fault_injector_command() need to be more documented.
>
> You may want to double-check whitespaces in your patch set, and 0002
> does not apply because of conflicts in isolationtester.h (my fault!).
>
I've rebased the patch set against the latest master and tried to resolve whitespace issues. Apologies for the whitespace conflicts, I tried resolving them but there is some trailing whitespace in the answer file of the regress test in v1-0001 that cannot be removed, else the test will fail.
> framework for integration. There has been a argument from Alvaro
> more convincing than mine about the use of a separate keyword, hence
> removing a dependency with steps:
> https://www.postgresql.org/message-id/20190823153825.GA11405@alvherre.pgsql
>
That is a valid point, thank you Alvaro for the feedback. I've changed 0002 so that a step within a permutation can be declared as blocking, revised patch set is attached.
> the need of the fault framework or its dedicated schedule.
>
It is for this reason that I have not separated patch 0002 out from faultinjector patch set because the test to demonstrate the blocking feature uses faults. I need to give more thought to find a test having a session that needs to block for reasons other than locking. Any pointers will be very helpful.
> My first impressions about this patch is that it is very intrusive.
> Could you explain the purpose of am_faultinjector? That's a specific
> connection string parameter which can be used similarly to replication
> for WAL senders? Couldn't there be an equivalent with a SUSET GUC?
Thank you for the review. Admittedly, the patch set doesn't include a test to demonstrate am_faultinjector. That is used when a fault needs to be injected into a remote server, say a standby. And that standby may be accepting connections or not, depending on if it's operating in hot-standby mode. Therefore, the am_faultinjector and the connection parameter is used to identify fault injection requests and allow those to be handled even when normal user connections are not allowed. Also, such connections do not need to be associated with a database, they simply need to set the fault in the shared memory hash table. In that sense, fault injection connections are treated similar to replication connections.
I was looking into tests under src/test/recovery/t/. Let me write a test to demonstrate what I'm trying to explain above.
> moved into an extension loaded with shared_preload_libraries, one
> thing being the shared memory initialization part. At the end it
> would be interesting to not have a dependency with a compile-time
> flag.
Patch 0001 includes an extension that provides a SQL UDF as a wrapper over the fault injection interface in backend. Moving the backend part of the patch also into an extension seems difficult to me. Getting rid of the compile time dependency is a strong enough advantage to spend more efforts on this.
> Things like exec_fault_injector_command() need to be more documented.
> It is hard to guess what it is being used for.
Added a comment to explain things a bit. Hope that helps. And as mentioned above, I'm working on a test case to demonstrate this feature.
Asim
Attachment
- v1-0002-Add-syntax-to-declare-a-step-that-is-expected-to-.patch
- v1-0001-Framework-to-inject-faults-from-SQL-tests.patch
- v1-0003-Speculative-insert-isolation-test-spec-using-faul.patch
- v1-0005-Isolation-schedule-for-tests-that-inject-faults.patch
- v1-0004-Run-tests-with-faults-if-faultinjector-was-compil.patch
On Tue, Aug 27, 2019 at 6:57 PM Asim R P <apraveen@pivotal.io> wrote:
>
> On Tue, Aug 27, 2019 at 12:35 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > Things like exec_fault_injector_command() need to be more documented.
> > It is hard to guess what it is being used for.
>
> Added a comment to explain things a bit. Hope that helps. And as mentioned above, I'm working on a test case to demonstrate this feature.
>
After learning a bit of Perl, I have a TAP test to share. The test validates that a commit on master waits until a synchronous standby has flushed WAL up to or greater than the commit LSN. The test demonstrates remote faultinjector interface to inject a fault on standby. That's where exec_fault_injector_command() and related code is exercised.
Patch summary:
0001 - the original fault injector patch up thread with remote fault injection capability
0006 - TAP test that makes use of the remote fault injector API
Patches 0002-0005 are not included because they are not changed.
Asim