Thread: Storage space lost during an UPDATE
I ran the following query on my database: --------------- update bv_descriptions set description = REPLACE(description, '\\n', '\n'); commit; --------------- It finished fine but to my surprise when I ran "df" on the server the drive usage jumped from 44% to 60% which is additional 650MB. I tried to restart the server but it didn't do any good, eventually I tried to do a FULL VACUUM on that table and it did the trick and cleaned the lost space. I am still confused about what happened... What took these additional 650MB? I could understand this phenomenon if it happened before I did COMMIT... But why did it stay after COMMIT too? Thanks
On Saturday 03 July 2004 6:37 am, Vitaly Belman wrote: > I ran the following query on my database: > > --------------- > update bv_descriptions set description = REPLACE(description, > '\\n', '\n'); commit; > --------------- > > It finished fine but to my surprise when I ran "df" on the server > the drive usage jumped from 44% to 60% which is additional 650MB. > > I tried to restart the server but it didn't do any good, eventually > I tried to do a FULL VACUUM on that table and it did the trick and > cleaned the lost space. > > I am still confused about what happened... What took these > additional 650MB? > > I could understand this phenomenon if it happened before I did > COMMIT... But why did it stay after COMMIT too? This is expected. It has to do with MVCC, not commit. PostgreSQL creates a duplicate record for each record that was updated. This is how MVCC hangs on to "old" records that may still be in use by other transactions. A regular vacuum only checks for space within the file that can be reused by new records (non-current records that are no longer held by a transaction). Vacuum full actually reclaims physical space on the disk. By way of example: vacuum full sometable Space used by table = x update all records in sometable Space used by table = 2x (well, not actually 2x but probably somewhere around that) vacuum Space used by sometable still = 2x but there is reusable space within the table update all records Space used by sometable still somewhere around 2x - updated records filled unused space in file. vacuum full sometable space used by table = x Cheers, Steve
Hi, Is there anyone who knows about the extended query language? Nobody answered my question yesterday. Could anyone put me in touch with the person who programmed that part in the worst case? I send a "parse command." If there's a syntax error, I get an error back as expected, otherwise I don't get anything back even though the docs say I should receive a "parse complete" message back (and hopefully a "ready for query" too even though that's not mentionned). Is the extended query language operational yet? (PostgreSQL 7.4.2) Thanks, Marc
I'm sending this parse command after which the backend completely stops responding (just the session, not the whole server): Byte1('P'): "P" Int32: 41 String: "mystatement" + null byte String: "select * from mytable;" + null byte Int16: 0 I have to close and re-open the frontends' TCP/IP connection to be able to work again. PostgreSQL 7.4.2. Marc
Ah, I understand. Thanks :). Steve Crawford wrote: > This is expected. It has to do with MVCC, not commit. PostgreSQL > creates a duplicate record for each record that was updated. This is > how MVCC hangs on to "old" records that may still be in use by other > transactions. > <SNIP> > > Cheers, > Steve
"M. Bastin" <marcbastin@mindspring.com> writes: > I'm sending this parse command after which the backend completely > stops responding (just the session, not the whole server): > Byte1('P'): "P" > Int32: 41 > String: "mystatement" + null byte > String: "select * from mytable;" + null byte > Int16: 0 You haven't done anything to ensure a response message (viz, Flush or Sync). See nearby discussion about brokenness of Tcl interface... regards, tom lane
Hi, Is there a query on some system table that will list all names of prepared statements? (FYI: To those wondering about the extended query language I've got it working now and it is impressive. E.g. for an uncomplicated query like SELECT * FROM population WHERE age > 12 AND age < 20; the total execution time will be _twice as fast_ as with the simple query language. When you look at only the PostgreSQL reaction time it is more than 10 times faster, while the time for the reception and parsing of the data by the frontend remains the same of course. I haven't tried heavy, complicated queries yet but I can only imagine that will give even more spectacular results. The fact that you can work with parameters opens up a whole new dimension as well, e.g.: SELECT * FROM population WHERE age > $1 AND age < $2; I only wish there were more documentation about the extended query language. All I've found on the whole internet would hold on 2 to 3 printed pages. Did I miss something?) Thanks, Marc
"M. Bastin" <marcbastin@mindspring.com> writes: > Is there a query on some system table that will list all names of > prepared statements? No, since they are never stored on disk. We could think of providing some operation in the FE/BE protocol to list the available statements, perhaps. (In retrospect I guess it was a mistake to have the Describe message raise a hard error for no-such-statement. If it just sent back a message without aborting your transaction, you could use Describe to probe whether you'd prepared a given name yet.) Same comments apply to Portals I suppose. If you'd like to lobby to get this changed, pgsql-novice is not the place; pgsql-interfaces or pgsql-hackers would be more appropriate. regards, tom lane
At 10:37 AM -0400 7/14/04, Tom Lane wrote: >"M. Bastin" <marcbastin@mindspring.com> writes: > > Is there a query on some system table that will list all names of >> prepared statements? > >No, since they are never stored on disk. >Same comments apply to Portals I suppose. > >If you'd like to lobby to get this changed I don't believe this is a critical issue. People can keep the list in their client app just the same since prepared statements only live for the duration of a session. Thank you very much Tom for all your help so far during my first steps with the extended query protocol! Marc
M. Bastin wrote: > [...] > The fact that you can work with parameters opens up a whole new > dimension as well, e.g.: > > SELECT * FROM population WHERE age > $1 AND age < $2; > > I only wish there were more documentation about the extended query > language. All I've found on the whole internet would hold on 2 to 3 > printed pages. Did I miss something?) I'm not sure what exactly you are missing, but the page documenting the PREPARE statement (in the official Postgres docs) pretty much says it all. Also, it's easy to find (Docs -> Reference -> SQL Commands -> PREPARE). http://www.postgresql.org/docs/7.4/static/sql-prepare.html However, maybe we're actually talking about completely different things -- in that case I would like to know what you are talking about. :-) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. > Can the denizens of this group enlighten me about what the > advantages of Python are, versus Perl ? "python" is more likely to pass unharmed through your spelling checker than "perl". -- An unknown poster and Fredrik Lundh
Hi Oliver, > > I only wish there were more documentation about the extended query > > language. All I've found on the whole internet would hold on 2 to 3 > > printed pages. Did I miss something?) > >I'm not sure what exactly you are missing, but the page >documenting the PREPARE statement (in the official Postgres >docs) pretty much says it all. Also, it's easy to find >(Docs -> Reference -> SQL Commands -> PREPARE). > >http://www.postgresql.org/docs/7.4/static/sql-prepare.html > >However, maybe we're actually talking about completely >different things Yes and no. I'm looking at the subject from the frontend-backend protocol 3.0 point of view which is separated in the Extended Query language and the Simple Query language. With the PREPARE and EXECUTE statements you're accessing the Extended Query language through the Simple Query language, which requires more CPU time since your EXECUTE query itself is going to be parsed as a Simple Query first before PostgreSQL realizes it must execute a prepared statement, while when you send the commands directly through the Extended Query language you skip that parsing step. Plus the Extended Query language is far richer than just preparing and executing even though these 2 operations will probably cover most needs. The Extended Query language is also much more complex to use. E.g. you would do the EXECUTE by issuing about 5 commands--but it's still much faster. That's where I had problems and wished for more documentation and/or examples. I hope this clears up the confusion. Cheers, Marc
Hi Marc, M. Bastin wrote: > Yes and no. I'm looking at the subject from the frontend-backend > protocol 3.0 point of view which is separated in the Extended Query > language and the Simple Query language. > > With the PREPARE and EXECUTE statements you're accessing the Extended > Query language through the Simple Query language, which requires more > CPU time since your EXECUTE query itself is going to be parsed as a > Simple Query first before PostgreSQL realizes it must execute a > prepared statement, while when you send the commands directly through > the Extended Query language you skip that parsing step. Thanks for the explanation. I've only been using Postgres via the psql monitor and client applications (most of them written in Python, some in Perl), so I'm not familiar with the underlying client-server protocol. You are right that the EXECUTE statement still has to be parsed. However, I think the pasrsing overhead is small, because the EXECUTE command has a very simple structure ("EXECUTE <plan> <arguments>"), as opposed to, say, the very complicated synopsis of a SELECT command. Apart from that, the overhead of the query planner is probably much bigger, so using PREPARE + EXECUTE is probably still a great win, I think. I just wonder ... I'm currently writing a client app (in Python) which has to insert quite a lot of things. This is for a network traffic accounting system. In particular, the program reads accumulated accounting data from a file, pre-processes it and creates appropriate INSERT statements (up to several hundreds or even thousands per session). I wonder if it will be worth to PREPARE those inserts and the EXECUTE them. But I guess it won't make much of a difference, because the INSERT statements are very simple anyway. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. With Perl you can manipulate text, interact with programs, talk over networks, drive Web pages, perform arbitrary precision arithmetic, and write programs that look like Snoopy swearing.
At 10:45 PM +0200 7/15/04, Oliver Fromme wrote: >Hi Marc, > >M. Bastin wrote: > > Yes and no. I'm looking at the subject from the frontend-backend > > protocol 3.0 point of view which is separated in the Extended Query > > language and the Simple Query language. > > > > With the PREPARE and EXECUTE statements you're accessing the Extended > > Query language through the Simple Query language, which requires more > > CPU time since your EXECUTE query itself is going to be parsed as a > > Simple Query first before PostgreSQL realizes it must execute a > > prepared statement, while when you send the commands directly through > > the Extended Query language you skip that parsing step. > >Thanks for the explanation. I've only been using Postgres >via the psql monitor and client applications (most of them >written in Python, some in Perl), so I'm not familiar with >the underlying client-server protocol. > >You are right that the EXECUTE statement still has to be >parsed. However, I think the pasrsing overhead is small, It's 16,000 vs 1,500 microseconds on my system, plus the time for the client to receive and parse the data which is the same in both cases. >because the EXECUTE command has a very simple structure >("EXECUTE <plan> <arguments>"), as opposed to, say, the >very complicated synopsis of a SELECT command. Apart >from that, the overhead of the query planner is probably >much bigger, so using PREPARE + EXECUTE is probably still >a great win, I think. Certainly for complex queries, but not for simple ones like "SELECT * FROM mytable WHERE numcolumn > $1;" Parsing a simple SELECT like this seems to take about the same time as parsing an EXECUTE query. I don't think you'd win anything by replacing INSERT with EXECUTE either. >I just wonder ... I'm currently writing a client app (in >Python) which has to insert quite a lot of things. This >is for a network traffic accounting system. In particular, >the program reads accumulated accounting data from a file, >pre-processes it and creates appropriate INSERT statements >(up to several hundreds or even thousands per session). >I wonder if it will be worth to PREPARE those inserts and >the EXECUTE them. But I guess it won't make much of a >difference, because the INSERT statements are very simple >anyway. I'm not familiar with Python but if it allows you to do a "COPY mytable FROM STDIN;" then I would strongly recommend you'd use that instead of INSERT. You build your data in a file and then you COPY that file into your table in one step. You can import millions of records in a couple of minutes like that. (You need to drop your indexes on that table first and recreate them afterwards.) Best, Marc