Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Experimental patch for inter-page delay in VACUUM
Date
Msg-id 3FA436E5.8040506@Yahoo.com
Whole thread Raw
In response to Re: Experimental patch for inter-page delay in VACUUM  ("Stephen" <jleelim@xxxxxx.com>)
Responses Re: Experimental patch for inter-page delay in VACUUM  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Stephen wrote:

> I tried the Tom Lane's patch on PostgreSQL 7.4-BETA-5 and it works
> fantastically! Running a few short tests show a significant improvement in
>
> responsiveness on my RedHat 9 Linux 2.4-20-8 (IDE 120GB 7200RPM UDMA5).

I am currently looking at implementing ARC as a replacement strategy. I
don't have anything that works yet, so I can't really tell what the
result would be and it might turn out that we want both features.

All I can say is that the theory looks like an extremely smart and
generalized version of the crude hack I had done. And that one is able
to lower the impact of VACUUM on the foreground clients while increasing
the VACUUM speed. The 7.3.4 version of my crude hack is attached.


Jan



>
> I didn't feel any noticeable delay when vacuum_page_delay is set to 5ms, 10
> ms. Vacuum takes 15 to 24 times longer to complete (as expected)
>
> but I don't mind at all. Vmstat BI/BO load is reduced by 5 times when
> vacuum_page_delay = 1ms. Load average reduced significantly
>
> also as there are less processes waiting to complete. I find a value of 1ms
> to 5ms is quite good and will keep system responsive. Going from 10ms to 1ms
> didn't seem to reduce the total vacuum time by much and I'm not sure why.
>
> Any chance we can get this patched into 7.4 permanently?
>
> I cannot say how well it would work on a heavy load, but on a light load
> this patch is highly recommended for 24/7 large DB systems. The
>
> database is mostly read-only. There are 133,000 rows and each row is about
> 2.5kB in size (mostly due to the bytea column holding a binary
>
> image). The long row causes system to TOAST the table. I repeatedly ran the
> following tests while system is idling:
>
>
> Normal operation with no VACUUM
> ===============================
>
> tsdb=# explain analyze select * from table1 where id =
> '0078997ac809877c1a0d1f76af753608';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=19.030..19.036 rows=1 loops=1)
>    Index Cond: ((id)::text = '0078997ac809877c1a0d1f76af753608'::text)
>  Total runtime: 19.206 ms
> (3 rows)
>
>
>
> VACUUM at vacuum_page_delay = 0
> ===============================
>
> -bash-2.05b$ vmstat 1
>    procs                      memory      swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> id
>  0  1  0 176844   3960  17748 146704    0    0  1408     0  296   556  0  1
> 99
>  0  1  0 176844   3960  17748 146264    0    0  1536     0  285   546  0  2
> 98
>
>
> tsdb=# explain analyze select * from table1 where id =
> '00e5ae5f4fddab371f7847f7da65eebb';
>                                                            QUERY PLAN
> ----------------------------------------------------------------------------
> ----------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=298.028..298.047 rows=1 loops=1)
>    Index Cond: ((id)::text = '0036edc4a92b6afd41304c6c8b76bc3c'::text)
>  Total runtime: 298.275 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '0046751ac3ec290b9f66ea1d66431923';
>                                                              QUERY PLAN
> ----------------------------------------------------------------------------
> --------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=454.727..454.746 rows=1 loops=1)
>    Index Cond: ((id)::text = '0046751ac3ec290b9f66ea1d66431923'::text)
>  Total runtime: 454.970 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00a74e6885579a2d50487f5a1dceba22';
>                                                              QUERY PLAN
> ----------------------------------------------------------------------------
> --------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=344.483..344.501 rows=1 loops=1)
>    Index Cond: ((id)::text = '00a74e6885579a2d50487f5a1dceba22'::text)
>  Total runtime: 344.700 ms
> (3 rows)
>
>
> VACUUM at vacuum_page_delay = 1
> ===============================
>
>    procs                      memory      swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> id
>  0  0  0 176840   4292  23700 137416    0    0   384     0  127   302  0  0
> 100
>  0  0  0 176840   4220  23700 137116    0    0   512     0  118   286  0  0
> 100
>  1  0  0 176840   4220  23700 136656    0    0   384     0  132   303  0  1
> 99
>
>
> tsdb=# explain analyze select * from table1 where id =
> '003d5966f8b9a06e4b0fff9fa8e93be0';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=74.575..74.584 rows=1 loops=1)
>    Index Cond: ((id)::text = '003d5966f8b9a06e4b0fff9fa8e93be0'::text)
>  Total runtime: 74.761 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00677fe46cd0af3d98564068f34db1cf';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=31.779..31.785 rows=1 loops=1)
>    Index Cond: ((id)::text = '00677fe46cd0af3d98564068f34db1cf'::text)
>  Total runtime: 31.954 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00b7c3e2fffdf39ff4ac50add04336b7';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=78.974..78.989 rows=1 loops=1)
>    Index Cond: ((id)::text = '00b7c3e2fffdf39ff4ac50add04336b7'::text)
>  Total runtime: 79.172 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '008d49c007f711d5f5ec48b67a8e58f0';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=30.143..30.148 rows=1 loops=1)
>    Index Cond: ((id)::text = '008d49c007f711d5f5ec48b67a8e58f0'::text)
>  Total runtime: 30.315 ms
> (3 rows)
>
>
> VACUUM at vacuum_page_delay = 5
> ===============================
>
> -bash-2.05b$ vmstat 1
>    procs                      memory      swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> id
>  0  0  0 176840   4228  22668 138212    0    0   512     0  117   276  0  0
> 100
>  0  0  0 176840   4220  22668 138212    0    0   384     0  132   296  0  1
> 99
>  0  0  0 176840   4220  22668 137764    0    0   384     0  114   276  0  0
> 100
>
>
> tsdb=# explain analyze select * from table1 where id =
> '000aa16ffe019fa327b68b7e610e5ac0';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=14.089..14.094 rows=1 loops=1)
>    Index Cond: ((id)::text = '000aa16ffe019fa327b68b7e610e5ac0'::text)
>  Total runtime: 14.252 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00aacc4684577737498df0536be1fac8';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=16.747..16.752 rows=1 loops=1)
>    Index Cond: ((id)::text = '00aacc4684577737498df0536be1fac8'::text)
>  Total runtime: 16.910 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00e295f5644d4cb77a5ebc4efbbaa770';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=16.684..16.690 rows=1 loops=1)
>    Index Cond: ((id)::text = '00e295f5644d4cb77a5ebc4efbbaa770'::text)
>  Total runtime: 16.886 ms
> (3 rows)
>
> VACUUM at vacuum_page_delay = 10
> ================================
>
> -bash-2.05b$ vmstat 1
>    procs                      memory      swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> id
>  0  0  0 176840   4336  20968 139780    0    0   384   108  121   294  0  0
> 100
>  0  0  0 176840   4336  20968 140164    0    0   384     0  130   281  0  1
> 99
>
>
> tsdb=# explain analyze select * from table1 where id =
> '007841017b9f7c80394f2bb4314ba8c1';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=19.576..19.587 rows=1 loops=1)
>    Index Cond: ((id)::text = '007841017b9f7c80394f2bb4314ba8c1'::text)
>  Total runtime: 19.854 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '0070724846c4d0d0dbb8f3e939fd1da4';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=10.616..10.624 rows=1 loops=1)
>    Index Cond: ((id)::text = '0070724846c4d0d0dbb8f3e939fd1da4'::text)
>  Total runtime: 10.795 ms
> (3 rows)
>
> tsdb=# explain analyze select * from table1 where id =
> '00fc92bf0f5048d7680bd8fa2d4c6f3a';
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------
> ------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=2 width=344)
> (actual time=28.007..28.014 rows=1 loops=1)
>    Index Cond: ((id)::text = '00fc92bf0f5048d7680bd8fa2d4c6f3a'::text)
>  Total runtime: 28.183 ms
> (3 rows)
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
diff -crN ../src.orig/backend/commands/vacuum.c ./backend/commands/vacuum.c
*** ../src.orig/backend/commands/vacuum.c    Thu Oct 30 10:17:36 2003
--- ./backend/commands/vacuum.c    Thu Oct 30 10:27:57 2003
***************
*** 98,103 ****
--- 98,107 ----
  } VRelStats;


+ bool vacuum_buffer_hack = true;
+ bool vacuum_buffer_hack_active = false;
+
+
  static MemoryContext vac_context = NULL;

  static int    elevel = -1;
***************
*** 279,284 ****
--- 283,295 ----
      }

      /*
+      * Change the buffer cache policy to avoid vacuum causing a
+      * full cache eviction.
+      */
+     if (vacuum_buffer_hack)
+         vacuum_buffer_hack_active = true;
+
+     /*
       * Loop to process each selected relation.
       */
      foreach(cur, vrl)
***************
*** 320,325 ****
--- 331,341 ----
              }
          }
      }
+
+     /*
+      * Change back to normal buffer cache policy.
+      */
+     vacuum_buffer_hack_active = false;

      /*
       * Finish up processing.
diff -crN ../src.orig/backend/storage/buffer/bufmgr.c ./backend/storage/buffer/bufmgr.c
*** ../src.orig/backend/storage/buffer/bufmgr.c    Thu Oct 30 10:17:38 2003
--- ./backend/storage/buffer/bufmgr.c    Thu Oct 30 10:44:10 2003
***************
*** 64,69 ****
--- 64,72 ----
  bool    zero_damaged_pages = false;


+ extern bool vacuum_buffer_hack_active;
+
+
  static void WaitIO(BufferDesc *buf);
  static void StartBufferIO(BufferDesc *buf, bool forInput);
  static void TerminateBufferIO(BufferDesc *buf);
***************
*** 205,210 ****
--- 208,222 ----
              StartBufferIO(bufHdr, false);
              LWLockRelease(BufMgrLock);
          }
+     }
+     else
+     {
+         /* The block was not found in the buffer cache. If we are in
+          * vacuum buffer cache strategy mode, mark that this buffers
+          * content was caused by vacuum.
+          */
+         if (vacuum_buffer_hack_active)
+             bufHdr->flags |= BM_READ_BY_VACUUM;
      }

      /*
diff -crN ../src.orig/backend/storage/buffer/freelist.c ./backend/storage/buffer/freelist.c
*** ../src.orig/backend/storage/buffer/freelist.c    Thu Oct 30 10:17:38 2003
--- ./backend/storage/buffer/freelist.c    Thu Oct 30 12:38:42 2003
***************
*** 71,79 ****
  #endif   /* BMTRACE */
      IsNotInQueue(bf);

!     /* change bf so it points to inFrontOfNew and its successor */
!     bf->freePrev = SharedFreeList->freePrev;
!     bf->freeNext = Free_List_Descriptor;

      /* insert new into chain */
      BufferDescriptors[bf->freeNext].freePrev = bf->buf_id;
--- 71,93 ----
  #endif   /* BMTRACE */
      IsNotInQueue(bf);

!     if ((bf->flags & BM_READ_BY_VACUUM) != 0)
!     {
!         /* in the case the buffer was read or created by vacuum,
!          * add it to the end of the queue so that GetFreeBuffer()
!          * will return it next again.
!          */
!         bf->freePrev = Free_List_Descriptor;
!         bf->freeNext = SharedFreeList->freeNext;
!
!         bf->flags &= ~BM_READ_BY_VACUUM;
!     }
!     else
!     {
!         /* change bf so it points to inFrontOfNew and its successor */
!         bf->freePrev = SharedFreeList->freePrev;
!         bf->freeNext = Free_List_Descriptor;
!     }

      /* insert new into chain */
      BufferDescriptors[bf->freeNext].freePrev = bf->buf_id;
diff -crN ../src.orig/backend/tcop/postgres.c ./backend/tcop/postgres.c
*** ../src.orig/backend/tcop/postgres.c    Thu Oct 30 10:17:39 2003
--- ./backend/tcop/postgres.c    Thu Oct 30 10:25:07 2003
***************
*** 69,74 ****
--- 69,77 ----
  extern int    optind;
  extern char *optarg;

+ extern bool vacuum_buffer_hack;
+ extern bool vacuum_buffer_hack_active;
+
  char       *debug_query_string; /* for pgmonitor and
                                   * log_min_error_statement */

***************
*** 1874,1879 ****
--- 1877,1888 ----

      for (;;)
      {
+         /*
+          * Reset the vacuum buffer hack active flag, could be left
+          * true if a vacuum got aborted.
+          */
+         vacuum_buffer_hack_active = false;
+
          /*
           * Release storage left over from prior query cycle, and create a
           * new query input buffer in the cleared QueryContext.
diff -crN ../src.orig/include/storage/buf_internals.h ./include/storage/buf_internals.h
*** ../src.orig/include/storage/buf_internals.h    Thu Oct 30 10:17:43 2003
--- ./include/storage/buf_internals.h    Thu Oct 30 10:29:56 2003
***************
*** 41,46 ****
--- 41,47 ----
  #define BM_IO_ERROR                (1 << 6)
  #define BM_JUST_DIRTIED            (1 << 7)
  #define BM_PIN_COUNT_WAITER        (1 << 8)
+ #define BM_READ_BY_VACUUM        (1 << 9)

  typedef bits16 BufFlags;


pgsql-hackers by date:

Previous
From: Manfred Spraul
Date:
Subject: Re: Avoiding SIGPIPE (was Re: OSDL DBT-2 w/ PostgreSQL
Next
From: Bruce Momjian
Date:
Subject: Heading to Germany