Thread: Storage space lost during an UPDATE

Storage space lost during an UPDATE

From
Vitaly Belman
Date:
I ran the following query on my database:

---------------
update bv_descriptions set description = REPLACE(description, '\\n', '\n');
commit;
---------------

It finished fine but to my surprise when I ran "df" on the server the
drive usage jumped from 44% to 60% which is additional  650MB.

I tried to restart the server but it didn't do any good, eventually I
tried to do a FULL VACUUM on that table and it did the trick and
cleaned the lost space.

I am still confused about what happened... What took these additional 650MB?

I could understand this phenomenon if it happened before I did
COMMIT... But why did it stay after COMMIT too?

Thanks

Re: Storage space lost during an UPDATE

From
Steve Crawford
Date:
On Saturday 03 July 2004 6:37 am, Vitaly Belman wrote:
> I ran the following query on my database:
>
> ---------------
> update bv_descriptions set description = REPLACE(description,
> '\\n', '\n'); commit;
> ---------------
>
> It finished fine but to my surprise when I ran "df" on the server
> the drive usage jumped from 44% to 60% which is additional  650MB.
>
> I tried to restart the server but it didn't do any good, eventually
> I tried to do a FULL VACUUM on that table and it did the trick and
> cleaned the lost space.
>
> I am still confused about what happened... What took these
> additional 650MB?
>
> I could understand this phenomenon if it happened before I did
> COMMIT... But why did it stay after COMMIT too?

This is expected. It has to do with MVCC, not commit. PostgreSQL
creates a duplicate record for each record that was updated. This is
how MVCC hangs on to "old" records that may still be in use by other
transactions.

A regular vacuum only checks for space within the file that can be
reused by new records (non-current records that are no longer held by
a transaction). Vacuum full actually reclaims physical space on the
disk. By way of example:

vacuum full sometable

Space used by table = x

update all records in sometable

Space used by table = 2x (well, not actually 2x but probably somewhere
around that)

vacuum

Space used by sometable still = 2x but there is reusable space within
the table

update all records

Space used by sometable still somewhere around 2x - updated records
filled unused space in file.

vacuum full sometable

space used by table = x

Cheers,
Steve

Extended query language: does it work?

From
"M. Bastin"
Date:
Hi,

Is there anyone who knows about the extended query language?  Nobody
answered my question yesterday.  Could anyone put me in touch with
the person who programmed that part in the worst case?

I send a "parse command."  If there's a syntax error, I get an error
back as expected, otherwise I don't get anything back even though the
docs say I should receive a "parse complete" message back (and
hopefully a "ready for query" too even though that's not mentionned).

Is the extended query language operational yet?  (PostgreSQL 7.4.2)

Thanks,

Marc

Extended query: parse command freezes backend

From
"M. Bastin"
Date:
I'm sending this parse command after which the backend completely
stops responding (just the session, not the whole server):

   Byte1('P'): "P"
   Int32:       41
   String:     "mystatement" + null byte
   String:     "select * from mytable;" + null byte
   Int16:       0

I have to close and re-open the frontends' TCP/IP connection to be
able to work again.  PostgreSQL 7.4.2.

Marc

Re: Storage space lost during an UPDATE

From
"Vitaly Belman"
Date:
Ah, I understand. Thanks :).

Steve Crawford wrote:
> This is expected. It has to do with MVCC, not commit. PostgreSQL
> creates a duplicate record for each record that was updated. This is
> how MVCC hangs on to "old" records that may still be in use by other
> transactions.
>
<SNIP>
>
> Cheers,
> Steve


Re: Extended query: parse command freezes backend

From
Tom Lane
Date:
"M. Bastin" <marcbastin@mindspring.com> writes:
> I'm sending this parse command after which the backend completely
> stops responding (just the session, not the whole server):

>    Byte1('P'): "P"
>    Int32:       41
>    String:     "mystatement" + null byte
>    String:     "select * from mytable;" + null byte
>    Int16:       0

You haven't done anything to ensure a response message (viz, Flush or
Sync).  See nearby discussion about brokenness of Tcl interface...

            regards, tom lane

Extended query: prepared statements list?

From
"M. Bastin"
Date:
Hi,

Is there a query on some system table that will list all names of
prepared statements?

(FYI: To those wondering about the extended query language I've got
it working now and it is impressive.  E.g. for an uncomplicated query
like

    SELECT * FROM population WHERE age > 12 AND age < 20;

the total execution time will be _twice as fast_ as with the simple
query language.  When you look at only the PostgreSQL reaction time
it is more than 10 times faster, while the time for the reception and
parsing of the data by the frontend remains the same of course.

I haven't tried heavy, complicated queries yet but I can only imagine
that will give even more spectacular results.

The fact that you can work with parameters opens up a whole new
dimension as well, e.g.:

    SELECT * FROM population WHERE age > $1 AND age < $2;

I only wish there were more documentation about the extended query
language.  All I've found on the whole internet would hold on 2 to 3
printed pages.  Did I miss something?)

Thanks,

Marc

Re: Extended query: prepared statements list?

From
Tom Lane
Date:
"M. Bastin" <marcbastin@mindspring.com> writes:
> Is there a query on some system table that will list all names of
> prepared statements?

No, since they are never stored on disk.  We could think of providing
some operation in the FE/BE protocol to list the available statements,
perhaps.  (In retrospect I guess it was a mistake to have the Describe
message raise a hard error for no-such-statement.  If it just sent
back a message without aborting your transaction, you could use Describe
to probe whether you'd prepared a given name yet.)

Same comments apply to Portals I suppose.

If you'd like to lobby to get this changed, pgsql-novice is not the
place; pgsql-interfaces or pgsql-hackers would be more appropriate.

            regards, tom lane

Re: Extended query: prepared statements list?

From
"M. Bastin"
Date:
At 10:37 AM -0400 7/14/04, Tom Lane wrote:
>"M. Bastin" <marcbastin@mindspring.com> writes:
>  > Is there a query on some system table that will list all names of
>>  prepared statements?
>
>No, since they are never stored on disk.
>Same comments apply to Portals I suppose.
>
>If you'd like to lobby to get this changed

I don't believe this is a critical issue.  People can keep the list
in their client app just the same since prepared statements only live
for the duration of a session.

Thank you very much Tom for all your help so far during my first
steps with the extended query protocol!

Marc

Re: Extended query: prepared statements list?

From
Oliver Fromme
Date:
M. Bastin wrote:
 > [...]
 > The fact that you can work with parameters opens up a whole new
 > dimension as well, e.g.:
 >
 >     SELECT * FROM population WHERE age > $1 AND age < $2;
 >
 > I only wish there were more documentation about the extended query
 > language.  All I've found on the whole internet would hold on 2 to 3
 > printed pages.  Did I miss something?)

I'm not sure what exactly you are missing, but the page
documenting the PREPARE statement (in the official Postgres
docs) pretty much says it all.  Also, it's easy to find
(Docs -> Reference -> SQL Commands -> PREPARE).

http://www.postgresql.org/docs/7.4/static/sql-prepare.html

However, maybe we're actually talking about completely
different things -- in that case I would like to know what
you are talking about.  :-)

Best regards
   Oliver

--
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

 > Can the denizens of this group enlighten me about what the
 > advantages of Python are, versus Perl ?
"python" is more likely to pass unharmed through your spelling
checker than "perl".
        -- An unknown poster and Fredrik Lundh

Re: Extended query: prepared statements list?

From
"M. Bastin"
Date:
Hi Oliver,

>  > I only wish there were more documentation about the extended query
>  > language.  All I've found on the whole internet would hold on 2 to 3
>  > printed pages.  Did I miss something?)
>
>I'm not sure what exactly you are missing, but the page
>documenting the PREPARE statement (in the official Postgres
>docs) pretty much says it all.  Also, it's easy to find
>(Docs -> Reference -> SQL Commands -> PREPARE).
>
>http://www.postgresql.org/docs/7.4/static/sql-prepare.html
>
>However, maybe we're actually talking about completely
>different things

Yes and no.  I'm looking at the subject from the frontend-backend
protocol 3.0 point of view which is separated in the Extended Query
language and the Simple Query language.

With the PREPARE and EXECUTE statements you're accessing the Extended
Query language through the Simple Query language, which requires more
CPU time since your EXECUTE query itself is going to be parsed as a
Simple Query first before PostgreSQL realizes it must execute a
prepared statement, while when you send the commands directly through
the Extended Query language you skip that parsing step.

Plus the Extended Query language is far richer than just preparing
and executing even though these 2 operations will probably cover most
needs.  The Extended Query language is also much more complex to use.
E.g. you would do the EXECUTE by issuing about 5 commands--but it's
still much faster.  That's where I had problems and wished for more
documentation and/or examples.

I hope this clears up the confusion.

Cheers,

Marc

Re: Extended query: prepared statements list?

From
Oliver Fromme
Date:
Hi Marc,

M. Bastin wrote:
 > Yes and no.  I'm looking at the subject from the frontend-backend
 > protocol 3.0 point of view which is separated in the Extended Query
 > language and the Simple Query language.
 >
 > With the PREPARE and EXECUTE statements you're accessing the Extended
 > Query language through the Simple Query language, which requires more
 > CPU time since your EXECUTE query itself is going to be parsed as a
 > Simple Query first before PostgreSQL realizes it must execute a
 > prepared statement, while when you send the commands directly through
 > the Extended Query language you skip that parsing step.

Thanks for the explanation.  I've only been using Postgres
via the psql monitor and client applications (most of them
written in Python, some in Perl), so I'm not familiar with
the underlying client-server protocol.

You are right that the EXECUTE statement still has to be
parsed.  However, I think the pasrsing overhead is small,
because the EXECUTE command has a very simple structure
("EXECUTE <plan> <arguments>"), as opposed to, say, the
very complicated synopsis of a SELECT command.  Apart
from that, the overhead of the query planner is probably
much bigger, so using PREPARE + EXECUTE is probably still
a great win, I think.

I just wonder ...  I'm currently writing a client app (in
Python) which has to insert quite a lot of things.  This
is for a network traffic accounting system.  In particular,
the program reads accumulated accounting data from a file,
pre-processes it and creates appropriate INSERT statements
(up to several hundreds or even thousands per session).
I wonder if it will be worth to PREPARE those inserts and
the EXECUTE them.  But I guess it won't make much of a
difference, because the INSERT statements are very simple
anyway.

Best regards
   Oliver

--
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

With Perl you can manipulate text, interact with programs, talk over
networks, drive Web pages, perform arbitrary precision arithmetic,
and write programs that look like Snoopy swearing.

Re: Extended query: prepared statements list?

From
"M. Bastin"
Date:
At 10:45 PM +0200 7/15/04, Oliver Fromme wrote:
>Hi Marc,
>
>M. Bastin wrote:
>  > Yes and no.  I'm looking at the subject from the frontend-backend
>  > protocol 3.0 point of view which is separated in the Extended Query
>  > language and the Simple Query language.
>  >
>  > With the PREPARE and EXECUTE statements you're accessing the Extended
>  > Query language through the Simple Query language, which requires more
>  > CPU time since your EXECUTE query itself is going to be parsed as a
>  > Simple Query first before PostgreSQL realizes it must execute a
>  > prepared statement, while when you send the commands directly through
>  > the Extended Query language you skip that parsing step.
>
>Thanks for the explanation.  I've only been using Postgres
>via the psql monitor and client applications (most of them
>written in Python, some in Perl), so I'm not familiar with
>the underlying client-server protocol.
>
>You are right that the EXECUTE statement still has to be
>parsed.  However, I think the pasrsing overhead is small,

It's 16,000 vs 1,500 microseconds on my system, plus the time for the
client to receive and parse the data which is the same in both cases.

>because the EXECUTE command has a very simple structure
>("EXECUTE <plan> <arguments>"), as opposed to, say, the
>very complicated synopsis of a SELECT command.  Apart
>from that, the overhead of the query planner is probably
>much bigger, so using PREPARE + EXECUTE is probably still
>a great win, I think.

Certainly for complex queries, but not for simple ones like "SELECT *
FROM mytable WHERE numcolumn > $1;"
Parsing a simple SELECT like this seems to take about the same time
as parsing an EXECUTE query.

I don't think you'd win anything by replacing INSERT with EXECUTE either.

>I just wonder ...  I'm currently writing a client app (in
>Python) which has to insert quite a lot of things.  This
>is for a network traffic accounting system.  In particular,
>the program reads accumulated accounting data from a file,
>pre-processes it and creates appropriate INSERT statements
>(up to several hundreds or even thousands per session).
>I wonder if it will be worth to PREPARE those inserts and
>the EXECUTE them.  But I guess it won't make much of a
>difference, because the INSERT statements are very simple
>anyway.

I'm not familiar with Python but if it allows you to do a "COPY
mytable FROM STDIN;" then I would strongly recommend you'd use that
instead of INSERT.  You build your data in a file and then you COPY
that file into your table in one step.  You can import millions of
records in a couple of minutes like that.  (You need to drop your
indexes on that table first and recreate them afterwards.)

Best,

Marc