Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.
Date
Msg-id 16320.1442587639@sss.pgh.pa.us
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.  (Jan Wieck <jan@wi3ck.info>)
Responses Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.  (Jan Wieck <jan@wi3ck.info>)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Jan Wieck <jan@wi3ck.info> writes:
> Attached is a complete rework of the fix from scratch, based on Tom's
> suggestion.

> The code now maintains a double linked list as suggested, but only uses
> it to mark all currently valid entries as invalid when hashvalue == 0.
> If a specific entry is invalidated it performs a hash lookup for maximum
> efficiency in that case.

That does not work; you're ignoring the possibility of hashvalue
collisions, and for that matter not considering that the hash value
is not equal to the hash key.  I don't think your code is invalidating
the right entry at all during a foreign key constraint update, and
it certainly cannot do the right thing if there's a hash value collision.

Attached is something closer to what I was envisioning; can you do
performance testing on it?

            regards, tom lane

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 61edde9..db1787a 100644
*** a/src/backend/utils/adt/ri_triggers.c
--- b/src/backend/utils/adt/ri_triggers.c
***************
*** 40,45 ****
--- 40,46 ----
  #include "commands/trigger.h"
  #include "executor/executor.h"
  #include "executor/spi.h"
+ #include "lib/ilist.h"
  #include "parser/parse_coerce.h"
  #include "parser/parse_relation.h"
  #include "miscadmin.h"
*************** typedef struct RI_ConstraintInfo
*** 125,130 ****
--- 126,132 ----
                                                   * PK) */
      Oid            ff_eq_oprs[RI_MAX_NUMKEYS];        /* equality operators (FK =
                                                   * FK) */
+     dlist_node    valid_link;        /* Link in list of valid entries */
  } RI_ConstraintInfo;


*************** typedef struct RI_CompareHashEntry
*** 185,190 ****
--- 187,194 ----
  static HTAB *ri_constraint_cache = NULL;
  static HTAB *ri_query_cache = NULL;
  static HTAB *ri_compare_cache = NULL;
+ static dlist_head ri_constraint_cache_valid_list;
+ static int    ri_constraint_cache_valid_count = 0;


  /* ----------
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2924,2929 ****
--- 2928,2940 ----

      ReleaseSysCache(tup);

+     /*
+      * For efficient processing of invalidation messages below, we keep a
+      * doubly-linked list, and a count, of all currently valid entries.
+      */
+     dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
+     ri_constraint_cache_valid_count++;
+
      riinfo->valid = true;

      return riinfo;
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2936,2956 ****
   * gets enough update traffic that it's probably worth being smarter.
   * Invalidate any ri_constraint_cache entry associated with the syscache
   * entry with the specified hash value, or all entries if hashvalue == 0.
   */
  static void
  InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  {
!     HASH_SEQ_STATUS status;
!     RI_ConstraintInfo *hentry;

      Assert(ri_constraint_cache != NULL);

!     hash_seq_init(&status, ri_constraint_cache);
!     while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != NULL)
      {
!         if (hentry->valid &&
!             (hashvalue == 0 || hentry->oidHashValue == hashvalue))
!             hentry->valid = false;
      }
  }

--- 2947,2987 ----
   * gets enough update traffic that it's probably worth being smarter.
   * Invalidate any ri_constraint_cache entry associated with the syscache
   * entry with the specified hash value, or all entries if hashvalue == 0.
+  *
+  * Note: at the time a cache invalidation message is processed there may be
+  * active references to the cache.  Because of this we never remove entries
+  * from the cache, but only mark them invalid, which is harmless to active
+  * uses.  (Any query using an entry should hold a lock sufficient to keep that
+  * data from changing under it --- but we may get cache flushes anyway.)
   */
  static void
  InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  {
!     dlist_mutable_iter iter;

      Assert(ri_constraint_cache != NULL);

!     /*
!      * If the list of currently valid entries gets excessively large, we mark
!      * them all invalid so we can empty the list.  This arrangement avoids
!      * O(N^2) behavior in situations where a session touches many foreign keys
!      * and also does many ALTER TABLEs, such as a restore from pg_dump.
!      */
!     if (ri_constraint_cache_valid_count > 1000)
!         hashvalue = 0;            /* pretend it's a cache reset */
!
!     dlist_foreach_modify(iter, &ri_constraint_cache_valid_list)
      {
!         RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo,
!                                                     valid_link, iter.cur);
!
!         if (hashvalue == 0 || riinfo->oidHashValue == hashvalue)
!         {
!             riinfo->valid = false;
!             /* Remove invalidated entries from the list, too */
!             dlist_delete(iter.cur);
!             ri_constraint_cache_valid_count--;
!         }
      }
  }


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: tsvector work with citext
Next
From: Nikolay Shaplov
Date:
Subject: Re: pageinspect patch, for showing tuple data