Re: [HACKERS] mdnblocks is an amazing time sink in huge relations - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] mdnblocks is an amazing time sink in huge relations
Date
Msg-id 20878.940204380@sss.pgh.pa.us
Whole thread Raw
In response to RE: [HACKERS] mdnblocks is an amazing time sink in huge relations  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Responses RE: [HACKERS] mdnblocks is an amazing time sink in huge relations  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
(Sorry for slow response, I've been off chasing psort problems...)

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I have been suspicious about current implementation of md.c.
> It relies so much on information about existent phisical files.

Yes, but on the other hand we rely completely on those same physical
files to hold our data ;-).  I don't see anything fundamentally
wrong with using the existence and size of a data file as useful
information.  It's not a substitute for a lock, of course, and there
may be places where we need cross-backend interlocks that we haven't
got now.

> How do you think about the following ?
>
> 1. Partial blocks(As you know,I have changed the handling of this
>     kind of blocks recently).

Yes.  I think your fix was good.

> 2. If a backend was killed or crashed in the middle of execution of 
>     mdunlink()/mdtruncate(),half of segments wouldn't be unlink/
>     truncated.

That's bothered me too.  A possible answer would be to do the unlinking
back-to-front (zap the last file first); that'd require a few more lines
of code in md.c, but a crash midway through would then leave a legal
file configuration that another backend could still do something with.

> 3. In cygwin port,mdunlink()/mdtruncate() may leave segments of 0
>     length. 

I don't understand what causes this.  Can you explain?

BTW, I think that having the last segment be 0 length is OK and indeed
expected --- mdnblocks will create the next segment as soon as it
notices the currently last segment has reached RELSEG_SIZE, even if
there's not yet a disk page to put in the next segment.  This seems
OK to me, although it's not really necessary.

> 4. We couldn't mdcreate() existent files and coudn't mdopen()/md
>     unlink() non-existent files.  So there are some cases that we
>     could neither CREATE TABLE nor DROP TABLE. 

True, but I think this is probably the best thing for safety's sake.
It seems to me there is too much risk of losing or overwriting valid
data if md.c bulls ahead when it finds an unexpected file configuration.
I'd rather rely on manual cleanup if things have gotten that seriously
out of whack... (but that's just my opinion, perhaps I'm in the
minority?)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] is it possible to use LIMIT and INTERSECT ?
Next
From: "Hiroshi Inoue"
Date:
Subject: RE: [HACKERS] vacuum of permanently updating database