Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: [PATCHES] GIN improvements |
Date | |
Msg-id | 1229910966.2285.76.camel@jdavis Whole thread Raw |
In response to | Re: [PATCHES] GIN improvements (Teodor Sigaev <teodor@sigaev.ru>) |
Responses |
Re: [PATCHES] GIN improvements
|
List | pgsql-hackers |
On Fri, 2008-12-12 at 20:36 +0300, Teodor Sigaev wrote: > Changes: > - added vacuum_delay_point() in gininsertcleanup > - add trigger to run vacuum by number of inserted tuples from > last vacuum. Number of inserted tuples represents number > of really inserted tuples in index and it is calculated as > tuples_inserted + tuples_updated - tuples_hot_updated. > Trigger fires on instuples > vac_base_thresh because search time is linear > on number of pending pages (tuples) Hi, Comments: 1. You use something like the following in a few places: START_CRIT_SECTION(); ... l = PageAddItem(...); if (l == InvalidOffsetNumber) elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(index)); It's no use using ERROR, because it will turn into PANIC, which is obviously unacceptable. It looks to me like those conditions can't happen anyway, so it's probably better to add a comment explaining why it can't happen, and Assert(). 2. It appears to be properly triggering autovacuum when only inserts happen, so I think that issue is solved. 3. Simple performance result with autovacuum off: create table random(i int[]); insert into random select ARRAY[(random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int, (random() * 10)::int] from generate_series(1, 1000000); \timing on drop table foo; create table foo(i int[]); create index foogin on foo using gin (i); insert into foo select i from random; vacuum foo; Without patch: INSERT: 71s VACUUM: 2s total: 73s With patch: INSERT: 33s VACUUM: 12s total: 45s So, there is a performance advantage. This was just a quick test to make sure the numbers matched my expectations. 4. Heikki mentioned: http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php "To make things worse, a query will fail if all the matching fast-inserted tuples don't fit in the non-lossy tid bitmap." That issue still remains, correct? Is there a resolution to that? 5. I attached a newer version merged with HEAD. 6. You defined: #define GinPageHasFullRow(page) ( GinPageGetOpaque(page)->flags & GIN_LIST_FULLROW ) But in many places you still do the same check without using that macro. The macro has only one call site, so I suggest either removing the macro entirely, or using it every place you check that flag. 7. I don't understand this chunk of code: ItemPointerData item = pos->item; if ( scanGetCandidate(scan, pos) == false || ! ItemPointerEquals(&pos->item, &item) ) elog(ERROR,"Could not process tuple"); /* XXX should not be here ! */ How can (!ItemPointerEquals(&pos->item, &item)) ever happen? And how can (scanGetCandidate(scan, pos) == false) ever happen? Should that be an Assert() instead? If those can happen during normal operation, then we need a better error message there. Regards, Jeff Davis
Attachment
pgsql-hackers by date: