[PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA - Mailing list pgsql-hackers

From CharSyam
Subject [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
Date
Msg-id CAMrLSE7bZ70eGaTMH3-g=TA_sfp3kTDOrHVNYw1=1XL9krapqQ@mail.gmail.com
Whole thread
Responses Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
List pgsql-hackers
Hi hackers,

  While reading aclchk.c I noticed that objectsInSchemaToOids(), used
  by GRANT/REVOKE ... ON ALL TABLES IN SCHEMA, calls
  getRelationsInNamespace() five times for the OBJECT_TABLE case --
  once per relkind (RELATION, VIEW, MATVIEW, FOREIGN_TABLE,
  PARTITIONED_TABLE):

      case OBJECT_TABLE:
          objs = getRelationsInNamespace(namespaceId, RELKIND_RELATION);
          objects = list_concat(objects, objs);
          objs = getRelationsInNamespace(namespaceId, RELKIND_VIEW);
          ...
          objs = getRelationsInNamespace(namespaceId, RELKIND_PARTITIONED_TABLE);
          objects = list_concat(objects, objs);
          break;

  pg_class does have an index on (relname, relnamespace), but there
  is no index matching (relnamespace, relkind), so each of those
  per-relkind calls falls back to a full heap scan via
  table_beginscan_catalog().  The work is just repeated five times.

  The attached patch introduces a small helper
  getRelationsInNamespaceMulti() that performs a single heap scan
  filtered by relnamespace and distributes matching tuples into
  per-relkind buckets supplied by the caller.  Relkind filtering is
  done in C after each tuple is read, which is trivially cheap.  The
  OBJECT_TABLE case uses the helper; OBJECT_SEQUENCE and
  OBJECT_PROPGRAPH are left on the original getRelationsInNamespace()
  helper because they only need a single relkind and benefit from the
  second ScanKey.

  Correctness / order preservation
  --------------------------------
   * Result order is identical.  The underlying pg_class heap (and
     thus its physical scan order) is the same regardless of how we
     filter, so each bucket ends up holding the same OIDs in the same
     relative order as a separate per-relkind heap scan would have
     produced.  Concatenating the buckets in the original relkind
     order reproduces the previous list tuple-for-tuple.

     I verified this empirically.  On a schema with interleaved
     relkinds (tables, views, matviews, partitioned tables) I ran two
     equivalent SQL formulations while forcing seq scans on pg_class:

       OLD-path model: UNION ALL of five
         "SELECT oid FROM pg_class
            WHERE relnamespace = X AND relkind = Y
            ORDER BY ctid"
         queries, one per relkind, in the same group order the code
         uses.

       NEW-path model: a single
         "SELECT oid FROM pg_class
            WHERE relnamespace = X
            ORDER BY ctid"
         bucketed by relkind and concatenated in the same group
         order.

     The two formulations produced identical OID sequences, element
     by element.  A positional FULL JOIN between them returned zero
     rows.

   * MVCC semantics are, if anything, a bit stricter.  The old code
     took five separate catalog scans, so in principle concurrent DDL
     could commit between scan N and scan N+1 and be visible to one
     but not another.  With a single scan everything is collected
     under one catalog snapshot.

   * Locking is unchanged in kind: AccessShareLock on pg_class is
     still taken, just once instead of five times.

  Benchmark
  ---------
  This is a targeted micro-optimization, not a dramatic speedup.
  With 10,000 tables in a single schema (pg_class ~10,452 rows),
  running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
  (6 iterations, first dropped as warmup), I measured a consistent
  ~15% reduction in end-to-end time:

                       baseline    patched     delta
      GRANT  (avg)     88.2 ms     75.9 ms    -14%
      REVOKE (avg)    134.9 ms    115.7 ms    -14%

  Per-iteration numbers (ms):

      baseline GRANT : 92, 87, 87, 85, 89
      patched  GRANT : 77, 72, 72, 76, 79, 79
      baseline REVOKE: 145, 144, 132, 128, 130, 128
      patched  REVOKE: 114, 117, 112, 120, 112, 119

  The absolute savings are small because most of the time in these
  commands is spent updating per-relation ACL tuples, not scanning
  pg_class.  For schemas with only a handful of relations the effect
  is not measurable.  The change is aimed at multi-tenant /
  partition-heavy installations that regularly issue
  "... ON ALL TABLES IN SCHEMA ..." statements over large catalogs.

  Testing
  -------
  Both `make check` and `make check-world` pass cleanly with the
  patch applied on top of current master (all suites green, no new
  failures).  TAP tests were not exercised (tree configured without
  --enable-tap-tests); I can rerun with TAP enabled if that is useful.

  The patch is attached (against master).  Feedback and review
  welcome -- in particular I'd like to know if anyone sees a
  correctness concern I missed, or prefers a different shape for the
  helper (e.g. returning a single flat list rather than per-relkind
  buckets).

  Thanks,
  charsyam
Attachment

pgsql-hackers by date:

Previous
From: Lakshmi N
Date:
Subject: Fix typo in vacuumparallel.c
Next
From: Henson Choi
Date:
Subject: Re: Row pattern recognition