Re: Minmax indexes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Minmax indexes
Date
Msg-id 20140820225133.GB6343@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: Minmax indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Minmax indexes
List pgsql-hackers
Heikki Linnakangas wrote:

> So, the other design change I've been advocating is to store the
> revmap in the first N blocks, instead of having the two-level
> structure with array pages and revmap pages.
>
> Attached is a patch for that, to be applied after v15. When the
> revmap needs to be expanded, all the tuples on it are moved
> elsewhere one-by-one. That adds some latency to the unfortunate guy
> who needs to do that, but as the patch stands, the revmap is only
> ever extended by VACUUM or CREATE INDEX, so I think that's fine.
> Like with my previous patch, the point is to demonstrate how much
> simpler the code becomes this way; I'm sure there are bugs and
> cleanup still necessary.

Thanks for the prodding.  I didn't like this too much initially, but
after going over it a few times I agree that having less code and a less
complex physical representation is better.  Your proposed approach is to
just call the update routine on every tuple in the page we're
evacuating.  There are optimizations possible (such as doing bulk
updates; and instead of updating the revmap, keep a redirection pointer
in the page we just evacuated, so that the revmap can be updated lazily
later), but I have spent way too long on this already that I am fine
with keeping what we have here.  If somebody later wants to contribute
improvements to this, it'd be welcome.  But on the other hand the
operation is not that frequent and as you say it's not executed by
user-facing queries, so perhaps it's okay.

I cleaned it up some: mainly I created a separate file (mmpageops.c)
that now hosts the routines related to page operations: mm_doinsert,
mm_doupdate, mm_start_evacuating_page, mm_evacuate_page.  There are
other rather very minor changes here and there; also added
CHECK_FOR_INTERRUPTS in all relevant loops.

This bit in mm_doupdate I just couldn't understand:

   /* If both tuples are in fact equal, there is nothing to do */
   if (!minmax_tuples_equal(oldtup, oldsz, origtup, origsz))
   {
       LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
       return false;
   }

Isn't the test exactly reversed?  I don't see how this would work.
I updated it to

   /*
    * If both tuples are identical, there is nothing to do; except that if we
    * were requested to move the tuple across pages, we do it even if they are
    * equal.
    */
   if (samepage && minmax_tuples_equal(oldtup, oldsz, origtup, origsz))
   {
       LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
       return false;
   }


> PS. Spotted one oversight in patch v15: callers of mm_doupdate must
> check the return value, and retry the operation if it returns false.

Right, thanks.  Fixed.

So here's v16, rebased on top of 9bac66020.  As far as I am concerned,
this is the last version before I start renaming everything to BRIN and
then commit.


 contrib/pageinspect/Makefile             |   2 +-
 contrib/pageinspect/mmfuncs.c            | 407 +++++++++++++
 contrib/pageinspect/pageinspect--1.2.sql |  36 ++
 contrib/pg_xlogdump/rmgrdesc.c           |   1 +
 doc/src/sgml/brin.sgml                   | 248 ++++++++
 doc/src/sgml/filelist.sgml               |   1 +
 doc/src/sgml/indices.sgml                |  36 +-
 doc/src/sgml/postgres.sgml               |   1 +
 minmax-proposal                          | 306 ++++++++++
 src/backend/access/Makefile              |   2 +-
 src/backend/access/common/reloptions.c   |   7 +
 src/backend/access/heap/heapam.c         |  22 +-
 src/backend/access/minmax/Makefile       |  17 +
 src/backend/access/minmax/minmax.c       | 942 +++++++++++++++++++++++++++++++
 src/backend/access/minmax/mmpageops.c    | 638 +++++++++++++++++++++
 src/backend/access/minmax/mmrevmap.c     | 451 +++++++++++++++
 src/backend/access/minmax/mmsortable.c   | 287 ++++++++++
 src/backend/access/minmax/mmtuple.c      | 478 ++++++++++++++++
 src/backend/access/minmax/mmxlog.c       | 323 +++++++++++
 src/backend/access/rmgrdesc/Makefile     |   3 +-
 src/backend/access/rmgrdesc/minmaxdesc.c |  89 +++
 src/backend/access/transam/rmgr.c        |   1 +
 src/backend/catalog/index.c              |  24 +
 src/backend/replication/logical/decode.c |   1 +
 src/backend/storage/page/bufpage.c       | 179 +++++-
 src/backend/utils/adt/selfuncs.c         |  24 +
 src/include/access/heapam.h              |   2 +
 src/include/access/minmax.h              |  52 ++
 src/include/access/minmax_internal.h     |  86 +++
 src/include/access/minmax_page.h         |  70 +++
 src/include/access/minmax_pageops.h      |  29 +
 src/include/access/minmax_revmap.h       |  36 ++
 src/include/access/minmax_tuple.h        |  90 +++
 src/include/access/minmax_xlog.h         | 106 ++++
 src/include/access/reloptions.h          |   3 +-
 src/include/access/relscan.h             |   4 +-
 src/include/access/rmgrlist.h            |   1 +
 src/include/catalog/index.h              |   8 +
 src/include/catalog/pg_am.h              |   2 +
 src/include/catalog/pg_amop.h            |  81 +++
 src/include/catalog/pg_amproc.h          |  73 +++
 src/include/catalog/pg_opclass.h         |   9 +
 src/include/catalog/pg_opfamily.h        |  10 +
 src/include/catalog/pg_proc.h            |  52 ++
 src/include/storage/bufpage.h            |   2 +
 src/include/utils/selfuncs.h             |   1 +
 src/test/regress/expected/opr_sanity.out |  14 +-
 src/test/regress/sql/opr_sanity.sql      |   7 +-
 48 files changed, 5248 insertions(+), 16 deletions(-)


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Arthur Silva
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Bruce Momjian
Date:
Subject: Re: [patch] pg_copy - a command for reliable WAL archiving