Re: proposal: make NOTIFY list de-duplication optional - Mailing list pgsql-hackers

From Tom Lane
Subject Re: proposal: make NOTIFY list de-duplication optional
Date
Msg-id 17822.1564186806@sss.pgh.pa.us
Whole thread Raw
In response to Re: Re: proposal: make NOTIFY list de-duplication optional  (Filip Rembiałkowski <filip.rembialkowski@gmail.com>)
Responses Re: proposal: make NOTIFY list de-duplication optional
List pgsql-hackers
=?UTF-8?Q?Filip_Rembia=C5=82kowski?= <filip.rembialkowski@gmail.com> writes:
> Still no hash table fallback is implemented, so this is *not* a
> performance improvement. Only a little more flexibility.

I think that we'd probably be better off fixing the root performance issue
than adding semantic complexity to bypass it.  Especially since, if you
don't de-duplicate, that's going to cost you when it comes time to
actually send the notifications, receive them, forward them to clients,
and process them in the clients.

Admittedly, if the app *knows* that it's generating non-duplicate events,
maybe it'd be all right to skip checking that.  But if we can make the
check cheap, I'd just as soon keep it.

Accordingly, I looked into making a hash table when there are more than
a small number of notifications pending, and attached is a lightly-tested
version of that.  This seems to be more or less similar speed to the
existing code for up to 100 or so distinct notifies, but it soon pulls
away above that.

A point that needs discussion is that this patch, unlike the existing
code, *does* de-duplicate fully: any events generated by a subtransaction
that duplicate events already emitted by a parent will get removed when
the subxact is merged to its parent.  I did this partly because we have
to expend O(N) work to merge N subtransaction notifies in any case,
now that we have to make new hashtable entries in the parent xact;
so the old excuse that subxact-end processing is really cheap no
longer applies.  Also because the Assert(!found) assertions in the
attached hash coding fall over if we cheat on this.  If we really want
to maintain exactly the old semantics here, we could relax the hashtable
code to just ignore duplicate entries.  But, per the argument above,
de-duplication is a good thing so I'm inclined to keep it like this.

I also noticed that as things stand, it costs us two or three pallocs to
construct a Notification event.  It wouldn't be terribly hard to reduce
that to one palloc, and maybe it'd be worthwhile if we're thinking that
transactions with many many notifies are a case worth optimizing.
But I didn't do that here; it seems like a separable patch.

I also thought for awhile about not having the hashtable as an auxiliary
data structure, but making it the main data structure.  We could preserve
the required notification ordering information by threading the live
hashtable entries into an slist, say.  However, this would greatly
increase the overhead for transactions with just one or a few distinct
NOTIFY events, since we'd have to set up the hashtable at the first one.
I think that's a common enough case that we shouldn't de-optimize it.
A smaller objection is that such a data structure would absolutely commit
us to de-duplication semantics, whereas the list plus separate hashtable
can cope with not de-duping if someone persuades us that's sane.

Thoughts?

            regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 6e9c580..c21daa5 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -135,6 +135,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/hashutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */

 /*
  * State for outbound notifies consists of a list of all channels+payloads
- * NOTIFYed in the current transaction. We do not actually perform a NOTIFY
- * until and unless the transaction commits.  pendingNotifies is NIL if no
- * NOTIFYs have been done in the current transaction.
+ * NOTIFYed in the current transaction.  We do not actually perform a NOTIFY
+ * until and unless the transaction commits.  pendingNotifies is NULL if no
+ * NOTIFYs have been done in the current (sub) transaction.
+ *
+ * We discard duplicate notify events issued in the same transaction.
+ * Hence, in addition to the list proper (which we need to track the order
+ * of the events, since we guarantee to deliver them in order), we build a
+ * hash table which we can probe to detect duplicates.  Since building the
+ * hash table is somewhat expensive, we do so only once we have at least
+ * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction;
+ * before that we just scan the events linearly.
  *
  * The list is kept in CurTransactionContext.  In subtransactions, each
  * subtransaction has its own list in its own CurTransactionContext, but
- * successful subtransactions attach their lists to their parent's list.
- * Failed subtransactions simply discard their lists.
+ * successful subtransactions add their entries to their parent's list.
+ * Failed subtransactions simply discard their lists.  Since these lists
+ * are independent, there may be notify events in a subtransaction's list
+ * that duplicate events in some ancestor (sub) transaction; we get rid of
+ * the dups when merging the subtransaction's list into its parent's.
  *
  * Note: the action and notify lists do not interact within a transaction.
  * In particular, if a transaction does NOTIFY and then LISTEN on the same
@@ -343,7 +355,20 @@ typedef struct Notification
     char       *payload;        /* payload string (can be empty) */
 } Notification;

-static List *pendingNotifies = NIL; /* list of Notifications */
+typedef struct NotificationList
+{
+    List       *events;            /* list of Notification structs */
+    HTAB       *hashtab;        /* hash of NotificationHash structs, or NULL */
+} NotificationList;
+
+#define MIN_HASHABLE_NOTIFIES 16    /* threshold to build hashtab */
+
+typedef struct NotificationHash
+{
+    Notification *event;        /* => the actual Notification struct */
+} NotificationHash;
+
+static NotificationList *pendingNotifies = NULL;    /* current list, if any */

 static List *upperPendingNotifies = NIL;    /* list of upper-xact lists */

@@ -393,6 +418,9 @@ static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
 static void asyncQueueAdvanceTail(void);
 static void ProcessIncomingNotify(void);
 static bool AsyncExistsPendingNotify(const char *channel, const char *payload);
+static void AddEventToPendingNotifies(Notification *n);
+static uint32 notification_hash(const void *key, Size keysize);
+static int    notification_match(const void *key1, const void *key2, Size keysize);
 static void ClearPendingActionsAndNotifies(void);

 /*
@@ -586,11 +614,19 @@ Async_Notify(const char *channel, const char *payload)
     else
         n->payload = "";

-    /*
-     * We want to preserve the order so we need to append every notification.
-     * See comments at AsyncExistsPendingNotify().
-     */
-    pendingNotifies = lappend(pendingNotifies, n);
+    if (pendingNotifies == NULL)
+    {
+        /* First notify event in current (sub)xact */
+        pendingNotifies = (NotificationList *) palloc(sizeof(NotificationList));
+        pendingNotifies->events = list_make1(n);
+        /* We certainly don't need a hashtable yet */
+        pendingNotifies->hashtab = NULL;
+    }
+    else
+    {
+        /* Append more events to existing list */
+        AddEventToPendingNotifies(n);
+    }

     MemoryContextSwitchTo(oldcontext);
 }
@@ -761,7 +797,7 @@ PreCommit_Notify(void)
 {
     ListCell   *p;

-    if (pendingActions == NIL && pendingNotifies == NIL)
+    if (!pendingActions && !pendingNotifies)
         return;                    /* no relevant statements in this xact */

     if (Trace_notify)
@@ -821,7 +857,7 @@ PreCommit_Notify(void)
         /* Now push the notifications into the queue */
         backendHasSentNotifications = true;

-        nextNotify = list_head(pendingNotifies);
+        nextNotify = list_head(pendingNotifies->events);
         while (nextNotify != NULL)
         {
             /*
@@ -1294,7 +1330,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * database OID in order to fill the page. So every page is always used up to
  * the last byte which simplifies reading the page later.
  *
- * We are passed the list cell (in pendingNotifies) containing the next
+ * We are passed the list cell (in pendingNotifies->events) containing the next
  * notification to write and return the first still-unwritten cell back.
  * Eventually we will return NULL indicating all is done.
  *
@@ -1345,7 +1381,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
         if (offset + qe.length <= QUEUE_PAGESIZE)
         {
             /* OK, so advance nextNotify past this item */
-            nextNotify = lnext(pendingNotifies, nextNotify);
+            nextNotify = lnext(pendingNotifies->events, nextNotify);
         }
         else
         {
@@ -1607,7 +1643,7 @@ AtSubStart_Notify(void)
     Assert(list_length(upperPendingNotifies) ==
            GetCurrentTransactionNestLevel() - 1);

-    pendingNotifies = NIL;
+    pendingNotifies = NULL;

     MemoryContextSwitchTo(old_cxt);
 }
@@ -1621,7 +1657,7 @@ void
 AtSubCommit_Notify(void)
 {
     List       *parentPendingActions;
-    List       *parentPendingNotifies;
+    NotificationList *parentPendingNotifies;

     parentPendingActions = linitial_node(List, upperPendingActions);
     upperPendingActions = list_delete_first(upperPendingActions);
@@ -1634,16 +1670,41 @@ AtSubCommit_Notify(void)
      */
     pendingActions = list_concat(parentPendingActions, pendingActions);

-    parentPendingNotifies = linitial_node(List, upperPendingNotifies);
+    parentPendingNotifies = (NotificationList *) linitial(upperPendingNotifies);
     upperPendingNotifies = list_delete_first(upperPendingNotifies);

     Assert(list_length(upperPendingNotifies) ==
            GetCurrentTransactionNestLevel() - 2);

-    /*
-     * We could try to eliminate duplicates here, but it seems not worthwhile.
-     */
-    pendingNotifies = list_concat(parentPendingNotifies, pendingNotifies);
+    if (pendingNotifies == NULL)
+    {
+        /* easy, no notify events happened in current subxact */
+        pendingNotifies = parentPendingNotifies;
+    }
+    else if (parentPendingNotifies == NULL)
+    {
+        /* easy, subxact's list becomes parent's */
+    }
+    else
+    {
+        /*
+         * Formerly, we didn't need to eliminate duplicates here, but now we
+         * must, else we fall foul of "Assert(!found)", either here or during
+         * a later attempt to build the parent-level hashtable.
+         */
+        NotificationList *childPendingNotifies = pendingNotifies;
+        ListCell   *l;
+
+        pendingNotifies = parentPendingNotifies;
+        /* Insert all the subxact's events into parent, except for dups */
+        foreach(l, childPendingNotifies->events)
+        {
+            Notification *childn = (Notification *) lfirst(l);
+
+            if (!AsyncExistsPendingNotify(childn->channel, childn->payload))
+                AddEventToPendingNotifies(childn);
+        }
+    }
 }

 /*
@@ -1672,7 +1733,7 @@ AtSubAbort_Notify(void)

     while (list_length(upperPendingNotifies) > my_level - 2)
     {
-        pendingNotifies = linitial_node(List, upperPendingNotifies);
+        pendingNotifies = (NotificationList *) linitial(upperPendingNotifies);
         upperPendingNotifies = list_delete_first(upperPendingNotifies);
     }
 }
@@ -2102,50 +2163,149 @@ NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid)
 static bool
 AsyncExistsPendingNotify(const char *channel, const char *payload)
 {
-    ListCell   *p;
-    Notification *n;
-
-    if (pendingNotifies == NIL)
+    if (pendingNotifies == NULL)
         return false;

     if (payload == NULL)
         payload = "";

-    /*----------
-     * We need to append new elements to the end of the list in order to keep
-     * the order. However, on the other hand we'd like to check the list
-     * backwards in order to make duplicate-elimination a tad faster when the
-     * same condition is signaled many times in a row. So as a compromise we
-     * check the tail element first which we can access directly. If this
-     * doesn't match, we check the whole list.
-     *
-     * As we are not checking our parents' lists, we can still get duplicates
-     * in combination with subtransactions, like in:
-     *
-     * begin;
-     * notify foo '1';
-     * savepoint foo;
-     * notify foo '1';
-     * commit;
-     *----------
-     */
-    n = (Notification *) llast(pendingNotifies);
-    if (strcmp(n->channel, channel) == 0 &&
-        strcmp(n->payload, payload) == 0)
-        return true;
-
-    foreach(p, pendingNotifies)
+    if (pendingNotifies->hashtab != NULL)
     {
-        n = (Notification *) lfirst(p);
-
-        if (strcmp(n->channel, channel) == 0 &&
-            strcmp(n->payload, payload) == 0)
+        /* Use the hash table to probe for a match */
+        Notification n;
+        Notification *k;
+
+        /* set up a dummy Notification struct */
+        n.channel = unconstify(char *, channel);
+        n.payload = unconstify(char *, payload);
+        k = &n;
+        /* ... and probe */
+        if (hash_search(pendingNotifies->hashtab,
+                        &k,
+                        HASH_FIND,
+                        NULL))
             return true;
     }
+    else
+    {
+        /* Must scan the event list */
+        ListCell   *l;
+
+        foreach(l, pendingNotifies->events)
+        {
+            Notification *n = (Notification *) lfirst(l);
+
+            if (strcmp(n->channel, channel) == 0 &&
+                strcmp(n->payload, payload) == 0)
+                return true;
+        }
+    }

     return false;
 }

+/*
+ * Add a notification event to a pre-existing pendingNotifies list.
+ *
+ * Because pendingNotifies->events is already nonempty, this works
+ * correctly no matter what CurrentMemoryContext is.
+ */
+static void
+AddEventToPendingNotifies(Notification *n)
+{
+    Assert(pendingNotifies->events != NIL);
+
+    /* Create the hash table if it's time to */
+    if (list_length(pendingNotifies->events) >= MIN_HASHABLE_NOTIFIES &&
+        pendingNotifies->hashtab == NULL)
+    {
+        HASHCTL        hash_ctl;
+        ListCell   *l;
+
+        /* Create the hash table */
+        MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+        hash_ctl.keysize = sizeof(Notification *);
+        hash_ctl.entrysize = sizeof(NotificationHash);
+        hash_ctl.hash = notification_hash;
+        hash_ctl.match = notification_match;
+        hash_ctl.hcxt = CurTransactionContext;
+        pendingNotifies->hashtab =
+            hash_create("Pending Notifies",
+                        256L,
+                        &hash_ctl,
+                        HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+        /* Insert all the already-existing events */
+        foreach(l, pendingNotifies->events)
+        {
+            Notification *oldn = (Notification *) lfirst(l);
+            NotificationHash *hentry;
+            bool        found;
+
+            hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab,
+                                                      &oldn,
+                                                      HASH_ENTER,
+                                                      &found);
+            Assert(!found);
+            hentry->event = oldn;
+        }
+    }
+
+    /* Add new event to the list, in order */
+    pendingNotifies->events = lappend(pendingNotifies->events, n);
+
+    /* Add event to the hash table if needed */
+    if (pendingNotifies->hashtab != NULL)
+    {
+        NotificationHash *hentry;
+        bool        found;
+
+        hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab,
+                                                  &n,
+                                                  HASH_ENTER,
+                                                  &found);
+        Assert(!found);
+        hentry->event = n;
+    }
+}
+
+/*
+ * notification_hash: hash function for notification hash table
+ *
+ * The hash "keys" are pointers to Notification structs.
+ */
+static uint32
+notification_hash(const void *key, Size keysize)
+{
+    const Notification *k = *(const Notification *const *) key;
+    uint32        hashc;
+    uint32        hashp;
+
+    Assert(keysize == sizeof(Notification *));
+    /* We just XOR the hashes for the two strings */
+    hashc = DatumGetUInt32(hash_any((const unsigned char *) k->channel,
+                                    (int) strlen((const char *) k->channel)));
+    hashp = DatumGetUInt32(hash_any((const unsigned char *) k->payload,
+                                    (int) strlen((const char *) k->payload)));
+    return hashc ^ hashp;
+}
+
+/*
+ * notification_match: match function to use with notification_hash
+ */
+static int
+notification_match(const void *key1, const void *key2, Size keysize)
+{
+    const Notification *k1 = *(const Notification *const *) key1;
+    const Notification *k2 = *(const Notification *const *) key2;
+
+    Assert(keysize == sizeof(Notification *));
+    if (strcmp(k1->channel, k2->channel) == 0 &&
+        strcmp(k1->payload, k2->payload) == 0)
+        return 0;                /* equal */
+    return 1;                    /* not equal */
+}
+
 /* Clear the pendingActions and pendingNotifies lists. */
 static void
 ClearPendingActionsAndNotifies(void)
@@ -2158,5 +2318,5 @@ ClearPendingActionsAndNotifies(void)
      * pointers.
      */
     pendingActions = NIL;
-    pendingNotifies = NIL;
+    pendingNotifies = NULL;
 }

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: TopoSort() fix
Next
From: Tom Lane
Date:
Subject: Re: TopoSort() fix