_hash_alloc_buckets() safety - Mailing list pgsql-hackers

From Amit Kapila
Subject _hash_alloc_buckets() safety
Date
Msg-id CAA4eK1LMYM8cmdOCgUdz+UzdRs5pe4pq-2BsQpR_BjdN57Mm=Q@mail.gmail.com
Whole thread Raw
Responses Re: _hash_alloc_buckets() safety  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
While working on write-ahead-logging of hash indexes, I noticed that
this function allocates buckets in batches and the mechanism it uses
is that it initialize the last page of batch with zeros and expect
that the filesystem will ensure the intervening pages read as zeroes
too.

I think to make it WAL enabled, we need to initialize the page header
(using PageInit() or equivalent) instead of initializing it with
zeroes as some part of our WAL replay machinery expects that the page
should not be new as indicated by me in other thread [1].  I think WAL
consistency check tool [2] also uses same part of replay functions and
will show this as problem, if we don't initialize the page header.

The point which is not clear to me is that whether it is okay as-is or
shall we try to initialize each page of batch during
_hash_alloc_buckets() considering now we are trying to make hash
indexes WAL enabled.  Offhand, I don't see any problem with just
initializing the last page and write the WAL for same with
log_newpage(), however if we try to initialize all pages, there could
be some performance penalty on split operation.

Thoughts?


[1] - https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com
[2] - https://commitfest.postgresql.org/10/741/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Hash Indexes
Next
From: Robert Haas
Date:
Subject: Re: cost_sort() may need to be updated