RE: [HACKERS] mdnblocks is an amazing time sink in huge relations - Mailing list pgsql-hackers
From | Hiroshi Inoue |
---|---|
Subject | RE: [HACKERS] mdnblocks is an amazing time sink in huge relations |
Date | |
Msg-id | 000c01bf192b$5437e2a0$2801007e@cadzone.tpf.co.jp Whole thread Raw |
In response to | Re: [HACKERS] mdnblocks is an amazing time sink in huge relations (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] mdnblocks is an amazing time sink in huge relations
|
List | pgsql-hackers |
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I have been suspicious about current implementation of md.c. > > It relies so much on information about existent phisical files. > > Yes, but on the other hand we rely completely on those same physical > files to hold our data ;-). I don't see anything fundamentally > wrong with using the existence and size of a data file as useful > information. It's not a substitute for a lock, of course, and there > may be places where we need cross-backend interlocks that we haven't > got now. > We have to lseek() each time to know the number of blocks of a table file. Isn't it a overhead ? > > How do you think about the following ? > > > > 2. If a backend was killed or crashed in the middle of execution of > > mdunlink()/mdtruncate(),half of segments wouldn't be unlink/ > > truncated. > > That's bothered me too. A possible answer would be to do the unlinking > back-to-front (zap the last file first); that'd require a few more lines > of code in md.c, but a crash midway through would then leave a legal > file configuration that another backend could still do something with. Oops,it's more serious than I have thought. mdunlink() may only truncates a table file by a crash while unlinking back-to-front. A crash while unlinking front-to-back may leave unlinked segments and they would suddenly appear as segments of the recreated table. Seems there's no easy fix. > > 3. In cygwin port,mdunlink()/mdtruncate() may leave segments of 0 > > length. > > I don't understand what causes this. Can you explain? > You call FileUnlink() after FileTrucnate() to unlink in md.c. If FileUnlink() fails there remains segments of 0 length. But it seems not critical in this issue. > > 4. We couldn't mdcreate() existent files and coudn't mdopen()/md > > unlink() non-existent files. So there are some cases that we > > could neither CREATE TABLE nor DROP TABLE. > > True, but I think this is probably the best thing for safety's sake. > It seems to me there is too much risk of losing or overwriting valid > data if md.c bulls ahead when it finds an unexpected file configuration. > I'd rather rely on manual cleanup if things have gotten that seriously > out of whack... (but that's just my opinion, perhaps I'm in the > minority?) > There is another risk. We may remove other table files manually by mistake. And if I were a newcomer,I would not consider PostgreSQL as a real DBMS(Fortunately I have never seen the reference to this). However,I don't object to you because I also have the same anxiety and could provide no easy solution, Probably it would require a lot of work to fix correctly. Postponing real unlink/truncating until commit and creating table files which correspond to their oids ..... etc ... It's same as "DROP TABLE inside transations" requires. Hmm,is it worth the work ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
pgsql-hackers by date: