Help with cache flushing -- trying to implement TRUNCATE - Mailing list pgsql-hackers
From | Marcus Mascari |
---|---|
Subject | Help with cache flushing -- trying to implement TRUNCATE |
Date | |
Msg-id | 19990116101333.8802.rocketmail@send1e.yahoomail.com Whole thread Raw |
List | pgsql-hackers |
Hello. I was hoping someone could shed some light on the following problem. I am currently working on an intranet web site (who isn't?) and am using PostgreSQL 6.4 as the database. On a daily basis, the database needs to integrate several megs of data into tables. Since the webserver needs to be up 24 hours a day, 7 days a week, this integration process needs to occur as quickly as possible. Currently, the code which integrates the data which is ftp'd to the server from a mainframe, uses the DELETE statement to empty existing tables. It does DROP indexes before importing and recreates them after the import step. The delete takes a significant amount of time more than a DROP/CREATE TABLE pair, but makes it easier to change the table constraints, attributes, etc without changing code, and, in the future, the tables in question will be accessed from multiple users with varying permissions. As a result, the code to DROP/CREATE the tables would also have to reissue all the GRANTs necessary to maintain the current state of user permissions. I was hoping to increase the speed of the operation by adding support to PostgreSQL for the Oracle TRUNCATE command. With version 7, Oracle introduced the TRUNCATE command to quickly eliminate all the rows in a table. According to the Oracle docs, the TRUNCATE command possesses the following qualities: 1. It is a DDL statement and therefore cannot be rolled back 2. It does not fire a table's DELETE triggers This is where I am in what is my first attempt to modify PostgreSQL in any meaningful way: 1. I've added the grammer to gram.y, keywords.c, parse.h, pgindent. 2. I've added the code to be called in tcop/utility.c 3. I've added a RelationTruncate() to relcache.c (although it should probably be in creatinh.c) The RelationTruncate() command calls heap_openr() to open the relation specified in the TRUNCATE TABLE statement, calls heap_truncate(), and then calls TruncateBuffers(). Here's where I have problem #1: heap_truncate() does the following: A. elog's an error if the relation is either a System relation or a Temporary relation B. Calls ReleaseRelationBuffers() on the relation C. Calls smgrtruncate() to truncate the relation to 0 blocks in size D. Calls smgrclose() to actually close the file descriptor E. Calls smgrcommit() to force the truncate to take place F. Calls heap_close() to close the relation G. And finally calls RelationForgetRelation() to eliminate the relation from the relation cache. Okay. When I first implemented it, I basically mirrored the DROP TABLE code until the point of heap_truncate() where I truncated the relation to 0 bytes. This appeared to work. Obviously I still would have to implement a RelationTruncateIndexes() to truncate all the indexes as well, but I haven't gotten that far yet. Here's what happens: > psql emptoris emptoris=> CREATE TABLE employees ( employee VARCHAR(10) not null ); CREATE emptoris=> INSERT INTO employees values ('hello'); INSERT 17481 1 emptoris=> SELECT * FROM employees; employee -------- hello (1 row) emptoris=> TRUNCATE TABLE employees; TRUNCATE emptoris=> SELECT * FROM employees; employee -------- (0 rows) /* It took me about two hours of coding to get to */ /* that point. And I've spent two days trying to */ /* solve what happens next: */ emptoris=> INSERT INTO employees VALUES ('one'); INSERT 17505 1 emptoris=> SELECT * from employees; employee -------- hello one (2 rows) This is where I get lost! Obviously something is sitting around somewhere either in the relation cache (which I explicitly cleared using RelationFogetRelation()) or in the Shared Memory Segment's Buffer Cache. After the TRUNCATE, the 'employees' file on the filesystem is truncated to 0 bytes in size. If I close the front-end and reissue the INSERT/SELECT the truncated data remains as above. However, if I close the backend after the TRUNCATE, and restart it, the data is correctly truncated to an empty table. I can't find where this in-memory copy of the table is to delete. I've tried marking all buffers in the buffer cache whose BufferDesc's have the same dbase and relid as the relation being truncated as having been deleted: TruncateBuffers(Relation rel) { BufferDesc *buf; int i; SpinAcquire(BufMgrLock); for (i = 1; i <= NBuffers; i++) { buf = &BufferDescriptors[i - 1]; if ((buf->tag.relId.dbId== MyDatabaseId) &&(buf->tag.relId.relId == RelationGetRelid(rel))) { buf->tag.relId.relId= InvalidOid; buf->flags |= BM_DELETED; } } SpinRelease(BufMgrLock); As previously stated, I've also blown away the relation cache by calling RelationForgetRelation(). And the only reason I called smgrclose() was in a failed attempt to force a flushing of wherever this magical data is stored. I've also tried various CommitTransactions() calls, etc. All with no success. If I only knew WHERE this data is being cached, despite my room temp. I.Q., I feel I could figure out how to correctly purge it. I don't know where this data is continuing to live to purge it from the backend. I even took the liberty of perusing Stefan's (sp?) Master's Thesis. Any help would be appreciated. Even if its to let me know to not bother trying to modify a ferrari with a sledge hammer. Marcus Mascari (mascarim@yahoo.com) _________________________________________________________ DO YOU YAHOO!? Get your free @yahoo.com address at http://mail.yahoo.com
pgsql-hackers by date: