Thread: Is mdextend really safe?
Earlier we saw some bug reports from someone who had a buffer flush fail do to ENOSPC. We asserted then that that should never happen because when we extend the relation we write out the new blocks so any ENOSPC errors out to happen at that point, not when a buffer is flushed. However looking at mdextend it only writes out the requested block. Any blocks between the end of the table and the requested block are *not* written out. We count on the OS to implicitly fill those blocks with zeros. On Unix that creates a sparse file where the intervening blocks are not allocated. When we later write out those blocks the filesystem then has to allocate space for them. IIRC the bug reports were from Windows. I'm not sure what NTFS's behaviour with sparse files is. Now this only matters if we ever call mdextend on a block which isn't the block immediately following the end of file. Is that true? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
* Gregory Stark: > On Unix that creates a sparse file where the intervening blocks are > not allocated. When we later write out those blocks the filesystem > then has to allocate space for them. This seems to happen relatively rarely. Creating temporary holes like this usually results in heavily fragmented files on the file systems I use, and I don't see this with PostgreSQL. (It's one of my gripes with Berkeley DB.) However, I looked at the code recently and couldn't figure out *why* PostgreSQL's observed behavior is this way. 8-( -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Gregory Stark napsal(a): > On Unix that creates a sparse file where the intervening blocks are not > allocated. When we later write out those blocks the filesystem then has to > allocate space for them. IIRC the bug reports were from Windows. I'm not sure > what NTFS's behaviour with sparse files is. NTFS has sparse file feature, but how it works ... > Now this only matters if we ever call mdextend on a block which isn't the > block immediately following the end of file. Is that true? I think, that it could happens only during wal log replay, but at the end everything should be OK. Look into ReadBuffer_common there is following code: 00226 /* Substitute proper block number if caller asked for P_NEW */ 00227 if (isExtend) 00228 blockNum = smgrnblocks(smgr, forkNum); Zdenek
Gregory Stark wrote: > Now this only matters if we ever call mdextend on a block which isn't the > block immediately following the end of file. Is that true? I don't think so. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Now this only matters if we ever call mdextend on a block which isn't the > block immediately following the end of file. Is that true? Only in hash indexes. regards, tom lane