Thread: Deleting bytea, autovacuum, and 8.2/8.4 differences
Hi all, my posting on 2010-01-14 about the performance when writing bytea to disk caused a longer discussion. While the fact still holds that the overall postgresql write performance is roughly 25% of the serial I/O disk performance this was compensated for my special use case here by doing some other non-postgresql related things in parallel. Now I cannot optimize my processes any further, however, now I am facing another quite unexpected performance issue: Deleting rows from my simple table (with the bytea column) having 16 MB data each, takes roughly as long as writing them! Little more detail: * The table just has 5 unused int columns, a timestamp, OIDs, and the bytea column, no indices; the bytea storage type is 'extended', the 16 MB are compressed to approx. the half. * All the usual optimizations are done to reach better write through (pg_xlog on another disk, much tweaks to the server conf etc), however, this does not matter here, since not the absolute performance is of interest here but the fact that deleting roughly takes 100% of the writing time. * I need to write 15 rows of 16 MB each to disk in a maximum time of 15 s, which is performed here in roughly 10 seconds, however, now I am facing the problem that keeping my database tidy (deleting rows) takes another 5-15 s (10s on average), so my process exceeds the maximum time of 15s for about 5s. * Right now I am deleting like this: DELETE FROM table WHERE (CURRENT_TIMESTAMP - my_timestamp_column) > interval '2 minutes'; while it is planned to have the interval set to 6 hours in the final version (thus creating a FIFO buffer for the latest 6 hours of inserted data; so the FIFO will keep approx. 10.000 rows spanning 160-200 GB data). * This deletion SQL command was simply repeatedly executed by pgAdmin while my app kept adding the 16 MB rows. * Autovacuum is on; I believe I need to keep it on, otherwise I do not free the disk space, right? If I switch it off, the deletion time reduces from the average 10s down to 4s. * I am using server + libpq version 8.2.4, currently on WinXP. Will an upgrade to 8.4 help here? Do you have any other ideas to help me out? Oh, please... Thank You Felix
Hi there
I'm not an expert on PG's "toast" system, but a couple of thoughts inline below.
Cheers
Dave
I'm not an expert on PG's "toast" system, but a couple of thoughts inline below.
Cheers
Dave
On Sat, Mar 13, 2010 at 3:17 PM, fkater@googlemail.com <fkater@googlemail.com> wrote:
You need an index on my_timestamp_column
That's not the way to keep a 6 hour rolling buffer ... what you need to do is run the delete frequently, with interval '6 hours' in the SQL acting as the cutoff.
If you really do want to drop the entire table contents before refilling it, do a DROP TABLE and recreate it.
Are you sure you are timing the delete, and not pgAdmin re-populating some kind of buffer?
You may be running autovaccum too aggressively, it may be interfering with I/O to the tables.
Postgres vacuuming does not free disk space (in the sense of returning it to the OS), it removes old versions of rows that have been UPDATEd or DELETEd and makes that space in the table file available for new writes.
8.4 has a lot of performance improvements. It's definitely worth a shot. I'd also consider switching to another OS where you can use a 64-bit version of PG and a much bigger buffer cache.
Hi all,
my posting on 2010-01-14 about the performance when writing
bytea to disk caused a longer discussion. While the fact
still holds that the overall postgresql write performance is
roughly 25% of the serial I/O disk performance this was
compensated for my special use case here by doing some other
non-postgresql related things in parallel.
Now I cannot optimize my processes any further, however, now
I am facing another quite unexpected performance issue:
Deleting rows from my simple table (with the bytea column)
having 16 MB data each, takes roughly as long as writing
them!
Little more detail:
* The table just has 5 unused int columns, a timestamp,
OIDs, and the bytea column, no indices; the bytea storage
type is 'extended', the 16 MB are compressed to approx. the
half.
Why no indices?
* All the usual optimizations are done to reach better
write through (pg_xlog on another disk, much tweaks to the
server conf etc), however, this does not matter here, since
not the absolute performance is of interest here but the
fact that deleting roughly takes 100% of the writing time.
* I need to write 15 rows of 16 MB each to disk in a maximum
time of 15 s, which is performed here in roughly 10 seconds,
however, now I am facing the problem that keeping my
database tidy (deleting rows) takes another 5-15 s (10s on
average), so my process exceeds the maximum time of 15s for
about 5s.
* Right now I am deleting like this:
DELETE FROM table WHERE (CURRENT_TIMESTAMP -
my_timestamp_column) > interval '2 minutes';
You need an index on my_timestamp_column
while it is planned to have the interval set to 6 hours in
the final version (thus creating a FIFO buffer for the
latest 6 hours of inserted data; so the FIFO will keep
approx. 10.000 rows spanning 160-200 GB data).
That's not the way to keep a 6 hour rolling buffer ... what you need to do is run the delete frequently, with interval '6 hours' in the SQL acting as the cutoff.
If you really do want to drop the entire table contents before refilling it, do a DROP TABLE and recreate it.
* This deletion SQL command was simply repeatedly executed
by pgAdmin while my app kept adding the 16 MB rows.
Are you sure you are timing the delete, and not pgAdmin re-populating some kind of buffer?
* Autovacuum is on; I believe I need to keep it on,
otherwise I do not free the disk space, right? If I switch
it off, the deletion time reduces from the average 10s down
to 4s.
You may be running autovaccum too aggressively, it may be interfering with I/O to the tables.
Postgres vacuuming does not free disk space (in the sense of returning it to the OS), it removes old versions of rows that have been UPDATEd or DELETEd and makes that space in the table file available for new writes.
* I am using server + libpq version 8.2.4, currently on
WinXP. Will an upgrade to 8.4 help here?
8.4 has a lot of performance improvements. It's definitely worth a shot. I'd also consider switching to another OS where you can use a 64-bit version of PG and a much bigger buffer cache.
Do you have any other ideas to help me out?
Oh, please...
Thank You
Felix
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Hi Dave, thank you for your answers! Here some comments: Dave Crooke: > > * The table just has 5 unused int columns, a timestamp, > > OIDs, and the bytea column, *no indices*; the bytea storage > > type is 'extended', the 16 MB are compressed to approx. the > > half. > > > > Why no indices? Simply because the test case had just < 50 rows (deleting all rows older than 2 minues). Later on I would use indices. > > while it is planned to have the interval set to 6 hours in > > the final version (thus creating a FIFO buffer for the > > latest 6 hours of inserted data; so the FIFO will keep > > approx. 10.000 rows spanning 160-200 GB data). > > > > That's not the way to keep a 6 hour rolling buffer ... what you need to do > is run the delete frequently, with *interval '6 hours'* in the SQL acting > as the cutoff. In fact the delete was run frequently to cut everything older than 6 hours *immediately*. > If you really do want to drop the entire table contents before refilling it, > do a *DROP TABLE* and recreate it. No, I do not want to drop the whole table. > > * This deletion SQL command was simply repeatedly executed > > by pgAdmin while my app kept adding the 16 MB rows. > > > > Are you sure you are timing the delete, and not pgAdmin re-populating some > kind of buffer? Quite sure, yes. Because I launched just the delete command in pgAdmin while the rest was executed by my application outside pgAdmin, of course. > > * Autovacuum is on; I believe I need to keep it on, > > otherwise I do not free the disk space, right? If I switch > > it off, the deletion time reduces from the average 10s down > > to 4s. > > > > You may be running autovaccum too aggressively, it may be interfering with > I/O to the tables. Hm, so would should I change then? I wonder if it helps to run autovacuum less aggressive if there will not be a situation were the whole process is stopped for a while. But I'd like to understand what to change here. > 8.4 has a lot of performance improvements. It's definitely worth a shot. I'd > also consider switching to another OS where you can use a 64-bit version of > PG and a much bigger buffer cache. O.k., I'll give it a try. Thank You. Felix
A quick test:
-----
1. create table x1(x int, y bytea);
2. Load some data say with python:
cp /opt/java/src.zip ~/tmp/a.dat (19MB)
##
import psycopg2
conn = psycopg2.connect("dbname='test' user='*****' password='****' host='127.0.0.1'");
conn.cursor().execute("INSERT INTO x1 VALUES (1, %s)", (psycopg2.Binary(open("a.dat").read()),))
conn.commit()
##
3. create table x2(x int, y bytea);
4. Copy table x1 100 times to x2 (1.9GB) and monitor/measure IO:
insert into x2 select a x, y from generate_series(1,100) a, x1;
Results:
-----------
On Linux 2.6.32 with an ext3 file system on one 15K rpm disk, we saw with SystemTap that the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO and took 61 seconds (52 CPU + 9 sleep/wait for IO)
Deletion (delete from x2) took 32 seconds with 12 seconds CPU and 20 sec sleep + wait for IO. Actual disk IO was about 4GB.
Since Pg does not use the concept of rollback segments, it is unclear why deletion produces so much disk IO (4GB).
VJ
-----
1. create table x1(x int, y bytea);
2. Load some data say with python:
cp /opt/java/src.zip ~/tmp/a.dat (19MB)
##
import psycopg2
conn = psycopg2.connect("dbname='test' user='*****' password='****' host='127.0.0.1'");
conn.cursor().execute("INSERT INTO x1 VALUES (1, %s)", (psycopg2.Binary(open("a.dat").read()),))
conn.commit()
##
3. create table x2(x int, y bytea);
4. Copy table x1 100 times to x2 (1.9GB) and monitor/measure IO:
insert into x2 select a x, y from generate_series(1,100) a, x1;
Results:
-----------
On Linux 2.6.32 with an ext3 file system on one 15K rpm disk, we saw with SystemTap that the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO and took 61 seconds (52 CPU + 9 sleep/wait for IO)
Deletion (delete from x2) took 32 seconds with 12 seconds CPU and 20 sec sleep + wait for IO. Actual disk IO was about 4GB.
Since Pg does not use the concept of rollback segments, it is unclear why deletion produces so much disk IO (4GB).
VJ
Hi all,
my posting on 2010-01-14 about the performance when writing
bytea to disk caused a longer discussion. While the fact
still holds that the overall postgresql write performance is
roughly 25% of the serial I/O disk performance this was
compensated for my special use case here by doing some other
non-postgresql related things in parallel.
Now I cannot optimize my processes any further, however, now
I am facing another quite unexpected performance issue:
Deleting rows from my simple table (with the bytea column)
having 16 MB data each, takes roughly as long as writing
them!
Little more detail:
* The table just has 5 unused int columns, a timestamp,
OIDs, and the bytea column, no indices; the bytea storage
type is 'extended', the 16 MB are compressed to approx. the
half.
* All the usual optimizations are done to reach better
write through (pg_xlog on another disk, much tweaks to the
server conf etc), however, this does not matter here, since
not the absolute performance is of interest here but the
fact that deleting roughly takes 100% of the writing time.
* I need to write 15 rows of 16 MB each to disk in a maximum
time of 15 s, which is performed here in roughly 10 seconds,
however, now I am facing the problem that keeping my
database tidy (deleting rows) takes another 5-15 s (10s on
average), so my process exceeds the maximum time of 15s for
about 5s.
* Right now I am deleting like this:
DELETE FROM table WHERE (CURRENT_TIMESTAMP -
my_timestamp_column) > interval '2 minutes';
while it is planned to have the interval set to 6 hours in
the final version (thus creating a FIFO buffer for the
latest 6 hours of inserted data; so the FIFO will keep
approx. 10.000 rows spanning 160-200 GB data).
* This deletion SQL command was simply repeatedly executed
by pgAdmin while my app kept adding the 16 MB rows.
* Autovacuum is on; I believe I need to keep it on,
otherwise I do not free the disk space, right? If I switch
it off, the deletion time reduces from the average 10s down
to 4s.
* I am using server + libpq version 8.2.4, currently on
WinXP. Will an upgrade to 8.4 help here?
Do you have any other ideas to help me out?
Oh, please...
Thank You
Felix
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
"fkater@googlemail.com" <fkater@googlemail.com> wrote: > Simply because the test case had just < 50 rows (deleting > all rows older than 2 minues). Later on I would use indices. Running a performance test with 50 rows without indexes and extrapolating to a much larger data set with indexes won't tell you much. The plans chosen by the PostgreSQL optimizer will probably be completely different, and the behavior of the caches (at all levels) will be very different. >> > while it is planned to have the interval set to 6 hours in >> > the final version (thus creating a FIFO buffer for the >> > latest 6 hours of inserted data; so the FIFO will keep >> > approx. 10.000 rows spanning 160-200 GB data). This might lend itself to partitioning. Dropping a partition containing data older than six hours would be very fast. Without knowing what kinds of queries you want to run on the data, it's hard to predict the performance impact on your other operations, though. >> > * Autovacuum is on; I believe I need to keep it on, >> > otherwise I do not free the disk space, right? If I switch >> > it off, the deletion time reduces from the average 10s down >> > to 4s. >> > >> >> You may be running autovaccum too aggressively, it may be >> interfering with I/O to the tables. > > Hm, so would should I change then? I wonder if it helps to > run autovacuum less aggressive if there will not be a > situation were the whole process is stopped for a while. But > I'd like to understand what to change here. I'd be very careful about this, I've seen performance problems more often (and more dramatic) from not running it aggressively enough. Monitor performance and bloat closely when you adjust this, and make sure the data and load are modeling what you expect in production, or you'll tune for the wrong environment and likely make matters worse for the environment that really matters. -Kevin
VJK <vjkmail@gmail.com> wrote: > the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO > Deletion (delete from x2) took 32 seconds with 12 seconds CPU and > 20 sec sleep + wait for IO. Actual disk IO was about 4GB. > > Since Pg does not use the concept of rollback segments, it is > unclear why deletion produces so much disk IO (4GB). One delete would mark the xmax of the tuple, so that transactions without that transaction ID in their visible set would ignore it. The next table scan would set hint bits, which would store information within the tuple to indicate that the deleting transaction successfully committed, then the vacuum would later wake up and rewrite the page with the deleted tuples removed. If you have enough battery backed cache space on a hardware RAID controller card, and that cache is configured in write-back mode, many of these writes might be combined -- the original delete, the hint bit write, and the vacuum might all combine into one physical write to disk. What does your disk system look like, exactly? -Kevin
VJK wrote: > Since Pg does not use the concept of rollback segments, it is unclear > why deletion produces so much disk IO (4GB). With PostgreSQL's write-ahead log, MVCC and related commit log, and transactional DDL features, there's actually even more overhead that can be involved than a simple rollback segment design when you delete things: http://www.postgresql.org/docs/current/static/wal.html http://www.postgresql.org/docs/current/static/mvcc-intro.html http://wiki.postgresql.org/wiki/Hint_Bits http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis One fun thing to try here is to increase shared_buffers and checkpoint_segments, then see if the total number of writes go down. The defaults for both are really low, which makes buffer page writes that might otherwise get combined as local memory changes instead get pushed constantly to disk. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Inline:
On Mon, Mar 15, 2010 at 10:12 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
They are combined alright, I see between 170-200 MB/s IO spikes on the iotop screen which means writes to the cache -- the disk itself is capable of 110(ic)-160(oc) MB/s only, with sequential 1MB block size writes.
As I wrote before, it's actually a single 15K rpm mirrored pair that you can look at as a single disk for performance purposes. It is connected through a PERC6i controller to a Dell 2950.
The disk subsystem is not really important here. What is really interesting, why so much IO is generated during the deletion process ?
VJK <vjkmail@gmail.com> wrote:
> the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO
> Deletion (delete from x2) took 32 seconds with 12 seconds CPU and
> 20 sec sleep + wait for IO. Actual disk IO was about 4GB.
>
> Since Pg does not use the concept of rollback segments, it is
> unclear why deletion produces so much disk IO (4GB).
One delete would mark the xmax of the tuple, so that transactions
without that transaction ID in their visible set would ignore it.
The next table scan would set hint bits, which would store
information within the tuple to indicate that the deleting
transaction successfully committed, then the vacuum would later wake
up and rewrite the page with the deleted tuples removed.
I did not observe any vacuuming activity during the deletion process. However, even with vacuuming, 4GB of disk IO is rather excessive for deleting 1.9GB of data.
If you have enough battery backed cache space on a hardware RAID
controller card, and that cache is configured in write-back mode,
many of these writes might be combined -- the original delete, the
hint bit write, and the vacuum might all combine into one physical
write to disk.
They are combined alright, I see between 170-200 MB/s IO spikes on the iotop screen which means writes to the cache -- the disk itself is capable of 110(ic)-160(oc) MB/s only, with sequential 1MB block size writes.
What does your disk system look like, exactly?
As I wrote before, it's actually a single 15K rpm mirrored pair that you can look at as a single disk for performance purposes. It is connected through a PERC6i controller to a Dell 2950.
The disk subsystem is not really important here. What is really interesting, why so much IO is generated during the deletion process ?
-Kevin
Greg Smith <greg@2ndquadrant.com> writes: > VJK wrote: >> Since Pg does not use the concept of rollback segments, it is unclear >> why deletion produces so much disk IO (4GB). > With PostgreSQL's write-ahead log, MVCC and related commit log, and > transactional DDL features, there's actually even more overhead that can > be involved than a simple rollback segment design when you delete things: For an example like this one, you have to keep in mind that the toast-table rows for the large bytea value have to be marked deleted, too. Also, since I/O happens in units of pages, the I/O volume to delete a tuple is just as much as the I/O to create it. (The WAL entry for deletion might be smaller, but that's all.) So it is entirely unsurprising that "DELETE FROM foo" is about as expensive as filling the table initially. If deleting a whole table is significant for you performance-wise, you might look into using TRUNCATE instead. regards, tom lane
Matthew Wakeling <matthew@flymine.org> writes: > On Mon, 15 Mar 2010, Tom Lane wrote: >> If deleting a whole table is significant for you performance-wise, >> you might look into using TRUNCATE instead. > Might you still end up with a normal delete operation > on the TOAST table when performing a TRUNCATE on the owner table? No, you get a TRUNCATE on its toast table too. regards, tom lane
Inline:
On Mon, Mar 15, 2010 at 10:42 AM, Greg Smith <greg@2ndquadrant.com> wrote:
There does not appear to be much WAL activity. Here's the insertion of 100 rows as seen by iotop:
4.39 G 0.00 % 9.78 % postgres: writer process
5.34 G 0.00 % 5.93 % postgres: postgr~0.5.93(1212) idle
27.84 M 0.00 % 1.77 % postgres: wal writer process
144.00 K 0.00 % 0.00 % postgres: stats collector process
0.00 B 0.00 % 0.00 % postgres: autova~ launcher process
0.00 B 0.00 % 0.00 % postgres: postgr~0.5.93(4632) idle
.. and the deletion:
288.18 M 0.00 % 37.80 % postgres: writer process
3.41 G 0.00 % 19.76 % postgres: postgr~0.5.93(1212) DELETE
27.27 M 0.00 % 3.18 % postgres: wal writer process
72.00 K 0.00 % 0.03 % postgres: stats collector process
0.00 B 0.00 % 0.00 % postgres: autova~ launcher process
0.00 B 0.00 % 0.00 % postgres: postgr~0.5.93(4632) idle
So, the original 1.9 GB of useful data generate about 10GB of IO, 5 of which end up being written to the disk The deletion generates about 3.8 GB of IO all of which results in disk IO. WAL activity is about 27MB in both cases.
I read all of the above, but it does not really clarify why deletion generates so much IO.
VJK wrote:With PostgreSQL's write-ahead log, MVCC and related commit log, and transactional DDL features, there's actually even more overhead that can be involved than a simple rollback segment design when you delete things:Since Pg does not use the concept of rollback segments, it is unclear why deletion produces so much disk IO (4GB).
There does not appear to be much WAL activity. Here's the insertion of 100 rows as seen by iotop:
4.39 G 0.00 % 9.78 % postgres: writer process
5.34 G 0.00 % 5.93 % postgres: postgr~0.5.93(1212) idle
27.84 M 0.00 % 1.77 % postgres: wal writer process
144.00 K 0.00 % 0.00 % postgres: stats collector process
0.00 B 0.00 % 0.00 % postgres: autova~ launcher process
0.00 B 0.00 % 0.00 % postgres: postgr~0.5.93(4632) idle
.. and the deletion:
288.18 M 0.00 % 37.80 % postgres: writer process
3.41 G 0.00 % 19.76 % postgres: postgr~0.5.93(1212) DELETE
27.27 M 0.00 % 3.18 % postgres: wal writer process
72.00 K 0.00 % 0.03 % postgres: stats collector process
0.00 B 0.00 % 0.00 % postgres: autova~ launcher process
0.00 B 0.00 % 0.00 % postgres: postgr~0.5.93(4632) idle
So, the original 1.9 GB of useful data generate about 10GB of IO, 5 of which end up being written to the disk The deletion generates about 3.8 GB of IO all of which results in disk IO. WAL activity is about 27MB in both cases.
http://www.postgresql.org/docs/current/static/wal.html
http://www.postgresql.org/docs/current/static/mvcc-intro.html
http://wiki.postgresql.org/wiki/Hint_Bits
http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis
I read all of the above, but it does not really clarify why deletion generates so much IO.
One fun thing to try here is to increase shared_buffers and checkpoint_segments, then see if the total number of writes go down. The defaults for both are really low, which makes buffer page writes that might otherwise get combined as local memory changes instead get pushed constantly to disk.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Mon, 15 Mar 2010, Tom Lane wrote: > For an example like this one, you have to keep in mind that the > toast-table rows for the large bytea value have to be marked deleted, > too. Also, since I/O happens in units of pages, the I/O volume to > delete a tuple is just as much as the I/O to create it. (The WAL > entry for deletion might be smaller, but that's all.) So it is entirely > unsurprising that "DELETE FROM foo" is about as expensive as filling the > table initially. > > If deleting a whole table is significant for you performance-wise, > you might look into using TRUNCATE instead. What are the implications of using TRUNCATE on a table that has TOASTed data? Is TOAST all stored in one single table, or is it split up by owner table/column name? Might you still end up with a normal delete operation on the TOAST table when performing a TRUNCATE on the owner table? Matthew -- sed -e '/^[when][coders]/!d;/^...[discover].$/d;/^..[real].[code]$/!d ' <`locate dict/words`
On Mon, Mar 15, 2010 at 10:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That makes sense.
Greg Smith <greg@2ndquadrant.com> writes:For an example like this one, you have to keep in mind that the
> VJK wrote:
>> Since Pg does not use the concept of rollback segments, it is unclear
>> why deletion produces so much disk IO (4GB).
toast-table rows for the large bytea value have to be marked deleted,
too. Also, since I/O happens in units of pages, the I/O volume to
delete a tuple is just as much as the I/O to create it.
That makes sense.
regards, tom lane