Re: Minmax indexes - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | 20140820225133.GB6343@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Minmax indexes (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Minmax indexes
|
List | pgsql-hackers |
Heikki Linnakangas wrote: > So, the other design change I've been advocating is to store the > revmap in the first N blocks, instead of having the two-level > structure with array pages and revmap pages. > > Attached is a patch for that, to be applied after v15. When the > revmap needs to be expanded, all the tuples on it are moved > elsewhere one-by-one. That adds some latency to the unfortunate guy > who needs to do that, but as the patch stands, the revmap is only > ever extended by VACUUM or CREATE INDEX, so I think that's fine. > Like with my previous patch, the point is to demonstrate how much > simpler the code becomes this way; I'm sure there are bugs and > cleanup still necessary. Thanks for the prodding. I didn't like this too much initially, but after going over it a few times I agree that having less code and a less complex physical representation is better. Your proposed approach is to just call the update routine on every tuple in the page we're evacuating. There are optimizations possible (such as doing bulk updates; and instead of updating the revmap, keep a redirection pointer in the page we just evacuated, so that the revmap can be updated lazily later), but I have spent way too long on this already that I am fine with keeping what we have here. If somebody later wants to contribute improvements to this, it'd be welcome. But on the other hand the operation is not that frequent and as you say it's not executed by user-facing queries, so perhaps it's okay. I cleaned it up some: mainly I created a separate file (mmpageops.c) that now hosts the routines related to page operations: mm_doinsert, mm_doupdate, mm_start_evacuating_page, mm_evacuate_page. There are other rather very minor changes here and there; also added CHECK_FOR_INTERRUPTS in all relevant loops. This bit in mm_doupdate I just couldn't understand: /* If both tuples are in fact equal, there is nothing to do */ if (!minmax_tuples_equal(oldtup, oldsz, origtup, origsz)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); return false; } Isn't the test exactly reversed? I don't see how this would work. I updated it to /* * If both tuples are identical, there is nothing to do; except that if we * were requested to move the tuple across pages, we do it even if they are * equal. */ if (samepage && minmax_tuples_equal(oldtup, oldsz, origtup, origsz)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); return false; } > PS. Spotted one oversight in patch v15: callers of mm_doupdate must > check the return value, and retry the operation if it returns false. Right, thanks. Fixed. So here's v16, rebased on top of 9bac66020. As far as I am concerned, this is the last version before I start renaming everything to BRIN and then commit. contrib/pageinspect/Makefile | 2 +- contrib/pageinspect/mmfuncs.c | 407 +++++++++++++ contrib/pageinspect/pageinspect--1.2.sql | 36 ++ contrib/pg_xlogdump/rmgrdesc.c | 1 + doc/src/sgml/brin.sgml | 248 ++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/indices.sgml | 36 +- doc/src/sgml/postgres.sgml | 1 + minmax-proposal | 306 ++++++++++ src/backend/access/Makefile | 2 +- src/backend/access/common/reloptions.c | 7 + src/backend/access/heap/heapam.c | 22 +- src/backend/access/minmax/Makefile | 17 + src/backend/access/minmax/minmax.c | 942 +++++++++++++++++++++++++++++++ src/backend/access/minmax/mmpageops.c | 638 +++++++++++++++++++++ src/backend/access/minmax/mmrevmap.c | 451 +++++++++++++++ src/backend/access/minmax/mmsortable.c | 287 ++++++++++ src/backend/access/minmax/mmtuple.c | 478 ++++++++++++++++ src/backend/access/minmax/mmxlog.c | 323 +++++++++++ src/backend/access/rmgrdesc/Makefile | 3 +- src/backend/access/rmgrdesc/minmaxdesc.c | 89 +++ src/backend/access/transam/rmgr.c | 1 + src/backend/catalog/index.c | 24 + src/backend/replication/logical/decode.c | 1 + src/backend/storage/page/bufpage.c | 179 +++++- src/backend/utils/adt/selfuncs.c | 24 + src/include/access/heapam.h | 2 + src/include/access/minmax.h | 52 ++ src/include/access/minmax_internal.h | 86 +++ src/include/access/minmax_page.h | 70 +++ src/include/access/minmax_pageops.h | 29 + src/include/access/minmax_revmap.h | 36 ++ src/include/access/minmax_tuple.h | 90 +++ src/include/access/minmax_xlog.h | 106 ++++ src/include/access/reloptions.h | 3 +- src/include/access/relscan.h | 4 +- src/include/access/rmgrlist.h | 1 + src/include/catalog/index.h | 8 + src/include/catalog/pg_am.h | 2 + src/include/catalog/pg_amop.h | 81 +++ src/include/catalog/pg_amproc.h | 73 +++ src/include/catalog/pg_opclass.h | 9 + src/include/catalog/pg_opfamily.h | 10 + src/include/catalog/pg_proc.h | 52 ++ src/include/storage/bufpage.h | 2 + src/include/utils/selfuncs.h | 1 + src/test/regress/expected/opr_sanity.out | 14 +- src/test/regress/sql/opr_sanity.sql | 7 +- 48 files changed, 5248 insertions(+), 16 deletions(-) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: