Thread: Speed of lo_unlink vs. DELETE on BYTEA
body p { margin-bottom: 10pt; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF" text="#000000"> Hi, everyone. I'm working with someone who has a database application currently running under PostgreSQL 8.3. Among other things, there is a main table that is referenced by a number of other tables via a foreign key. One of those tables has a field of type "oid," which then points to a large object.  When a record in the main table is deleted, there is a rule (yes a rule -- not a trigger) in the referencing table that performs a lo_unlink on the associated object. This means that for every DELETE we perform on the main table, we're doing an lo_unlink on the large objects. This also means that if we do a mass delete from that main table, we're executing lo_unlike once for every deleted row in the main table, which is taking a heckuva long time. I ran EXPLAIN ANALYZE, and a good 40-50 percent of our time spent deleting is in the execution of this rule. I just want to check that my intuition is correct: Wouldn't it be way faster and more efficient for us to use BYTEA columns to store the data (which can get into the 20-50 MB range), and for us to just depend on ON DELETE CASCADE, rather than a rule? Or are we going to encounter performance issues regardless of which technique we use, and we need to find a way to delete these large pieces of data in the background Or should we be using large objects, and then find a way other than a rule to deal with deleting them on this sort of scale? Or (of course) am I missing another good option? Thanks for any and all advice, as usual! Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
body p { margin-bottom: 10pt; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF" text="#000000"> Hi again, everyone. I'm replying to my own posting, to add some information: I decided to do some of my own benchmarking. And if my benchmarks are at all accurate, then I'm left wondering why people use large objects at all, given their clunky API and their extremely slow speed. I'm posting my benchmarks as a sanity test, because I'm blown away by the results. I basically tried three different scenarios, each with 1,000 and 10,000 records. In each scenario, there was a table named MasterTable that contained a SERIAL "id" column and a "one_value" integer column, containing a number from generate_series, and a second table named SecondaryTable containing its own SERIAL "id" column, a "one_value" value (from generate_series, identical to the "id" column, and a "master_value" column that's a foreign key back to the main table. That is, here's the definition of the tables in the 10,000-record benchmark: CREATE TABLE MasterTable (    id           SERIAL   NOT NULL,    one_value    INTEGER  NOT NULL,       PRIMARY KEY(id) ); INSERT INTO MasterTable (one_value) values (generate_series(1,10000)); CREATE TABLE SecondaryTable (    id           SERIAL   NOT NULL,    one_value    INTEGER  NOT NULL,    master_value INTEGER  NOT NULL    REFERENCES MasterTable ON DELETE CASCADE       PRIMARY KEY(id) ); INSERT INTO SecondaryTable (master_value, one_value)  (SELECT s.a, s.a FROM generate_series(1,10000) AS s(a)); I also had two other versions of SecondaryTable: In one scenario, there is a my_blob column, of type BYTEA, containing 5 million 'x" characters. A final version had a 5-million 'x' character document loaded into a large object in SecionaryTable. The idea was simple: I wanted to see how much faster or slower it was to delete (not truncate) all of the records in MasterTable, given these different data types. Would bytea be significantly faster than large objects?   How would the cascading delete affect things? And how long does it take to pg_dump with large objects around? Here are the results, which were pretty dramatic. Basically, pg_dump seems to always be far, far slower than BYTEA columns. Again, I'm wondering whether I'm doing something wrong here, or if this explains why in my many years of using PostgreSQL, I've neither used nor been tempted to use large objects before. 1.1 1,000 records ==================                  Delete   Dump    ---------------+---------+--------  Empty content  0.172s   0.057s  bytea          0.488s   0.066s  large object   30.833s  9.275s 1.2 10,000 records ===================                  Delete     Dump       ---------------+-----------+-----------  Empty content  8.162s     0.064s     bytea          1m0.417s   0.157s     large object   4m44.501s  1m38.454s Any ideas?   If this is true, should we be warning people away from large objects in the documentation, and toward bytea? Reuven
Reuven M. Lerner wrote: > When a record in the main table is deleted, there is a rule (yes a rule -- not a trigger) in the > referencing table that performs a lo_unlink on the associated object. > I just want to check that my intuition is correct: Wouldn't it be way faster and more efficient > for us to use BYTEA columns to store the data (which can get into the 20-50 MB range), > and for us to just depend on ON DELETE CASCADE, rather than a rule? [followed by dramatic performance numbers] Could you try with a trigger instead of a rule and see if the performance is better? Yours, Laurenz Albe
body p { margin-bottom: 10pt; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF" text="#000000"> Hi, everyone. Albe wrote: <blockquote cite="mid:D960CB61B694CF459DCFB4B0128514C2049FCE78@exadv11.host.magwien.gv.at" type="cite"> Could you try with a trigger instead of a rule and see if the performance is better? Yours, Laurenz Albe Great idea. I did that, and here are the results for 10,000 records: |                          | Delete   | Dump     | |---------------------------+-----------+-----------| | Empty content            | 8.162s   | 0.064s   | | bytea                    | 1m0.417s | 0.157s   | | large object with rule   | 4m44.501s | 1m38.454s | | large object with trigger | 7m42.208s | 1m48.369s | Ideas, anyone? Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
Reuven M. Lerner wrote: > 1.1 1,000 records > ================== > > Delete Dump > ---------------+---------+-------- > Empty content 0.172s 0.057s > bytea 0.488s 0.066s > large object 30.833s 9.275s How much bytea are you dumping for it to take only 0.066s? The fact that it takes about the same time than dumping the "empty content" looks very suspicious. On my desktop machine, if I create a table with 1000 blobs containing strings of 5 million 'x', which is what I understood you basically did (perhaps I misunderstood?), then it takes about 200s to dump it with pg_dump -Fc Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
body p { margin-bottom: 10pt; margin-top: 0pt; } body p { margin-bottom: 10pt; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF" text="#000000"> Hi, everyone. Daniel Verite <a class="moz-txt-link-rfc2396E" href="mailto:daniel@manitou-mail.org"><daniel@manitou-mail.org> wrote: How much bytea are you dumping for it to take only 0.066s? The fact that it takes about the same time than dumping the "empty content" looks very suspicious. On my desktop machine, if I create a table with 1000 blobs containing strings of 5 million 'x', which is what I understood you basically did (perhaps I misunderstood?), then it takes about 200s to dump it with pg_dump -Fc OK, this is an egg-on-my-face moment with my benchmarks: I added the pg_dump timing after the "delete" timing, and so I was actually dumping the database when it was empty! Not very effective, to say the least. I've updated my benchmark, and updated the results, as well: |                          | Delete   | Dump     | Database size | Dump size | |---------------------------+-----------+-----------+---------------+-----------| | Empty content            | 0m0.151s | 0m38.875s | 88 kB        | 11K      | | bytea                    | 0m0.505s | 1m59.565s | 57 MB        | 4.7M     | | large object with rule   | 0m31.438s | 2m42.079s | 88 kB        | 4.7M     | | large object with trigger | 0m28.612s | 2m17.156s | 88 kB        | 4.7M     | ** 10,000 records |                          | Delete   | Dump      | Database size | Dump size | |---------------------------+-----------+------------+---------------+-----------| | Empty content            | 0m7.436s | 0m0.089s  | 680 kB       | 66K      | | bytea                    | 1m5.771s | 20m40.842s | 573 MB       | 47M      | | large object with rule   | 5m26.254s | 21m7.695s | 680 kB       | 47M      | | large object with trigger | 5m13.718s | 20m56.195s | 680 kB       | 47M      | It would thus appear that there's a slight edge for dumping bytea, but nothing super-amazing. Deleting, however, is still much faster with bytea than large objects. I've put my benchmark code up on GitHub for people to run and play with, to see if they can reproduce my results: https://github.com/reuven/pg-delete-benchmarks Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
At 07:43 25/09/2011, Reuven M. Lerner wrote: >Hi, everyone.=C2 Daniel Verite=20 ><mailto:daniel@manitou-mail.org><daniel@manitou-mail.org> wrote: > >It would thus appear that there's a slight edge=20 >for dumping bytea, but nothing=20 >super-amazing.=C2 Deleting, however, is still=20 >much faster with bytea than large objects. The problem you have is with=20 compression/decompression on large objects. If=20 you see at it's sizes, you get 680KB for large=20 objects and 573MB for bytea. Postgresql needs to=20 decompress them before the dump. Even worse, if=20 your dump is compressed, postgres decompress each=20 large object , dump it and recompress. For this=20 test, switch off compression on large=20 objects/toast. For long term, perhaps a request=20 to postgresql hackers to directly dump the=20 already compressed large objects. The toast maybe=20 more difficult because there are not only big=20 size columns, but any column whose* size is=20 bigger than a threshold (don't remember now, 1-2KB or similar) * Is it whose the correct word? I hope i have expressed correctly. EFME=20
At 07:43 25/09/2011, Reuven M. Lerner wrote: >Hi, everyone. Daniel Verite ><mailto:daniel@manitou-mail.org><daniel@manitou-mail.org> wrote: > >It would thus appear that there's a slight edge >for dumping bytea, but nothing >super-amazing. Deleting, however, is still >much faster with bytea than large objects. The problem you have is with compression/decompression on large objects. If you see at it's sizes, you get 680KB for large objects and 573MB for bytea. Postgresql needs to decompress them before the dump. Even worse, if your dump is compressed, postgres decompress each large object , dump it and recompress. For this test, switch off compression on large objects/toast. For long term, perhaps a request to postgresql hackers to directly dump the already compressed large objects. The toast maybe more difficult because there are not only big size columns, but any column whose* size is bigger than a threshold (don't remember now, 1-2KB or similar) * Is it whose the correct word? I hope i have expressed correctly. EFME