Re: Speed of lo_unlink vs. DELETE on BYTEA - Mailing list pgsql-general
From | Reuven M. Lerner |
---|---|
Subject | Re: Speed of lo_unlink vs. DELETE on BYTEA |
Date | |
Msg-id | 4E7BB8F3.7020608@lerner.co.il Whole thread Raw |
In response to | Speed of lo_unlink vs. DELETE on BYTEA ("Reuven M. Lerner" <reuven@lerner.co.il>) |
Responses |
Re: Speed of lo_unlink vs. DELETE on BYTEA
|
List | pgsql-general |
body p { margin-bottom: 10pt; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF" text="#000000"> Hi again, everyone. I'm replying to my own posting, to add some information: I decided to do some of my own benchmarking. And if my benchmarks are at all accurate, then I'm left wondering why people use large objects at all, given their clunky API and their extremely slow speed. I'm posting my benchmarks as a sanity test, because I'm blown away by the results. I basically tried three different scenarios, each with 1,000 and 10,000 records. In each scenario, there was a table named MasterTable that contained a SERIAL "id" column and a "one_value" integer column, containing a number from generate_series, and a second table named SecondaryTable containing its own SERIAL "id" column, a "one_value" value (from generate_series, identical to the "id" column, and a "master_value" column that's a foreign key back to the main table. That is, here's the definition of the tables in the 10,000-record benchmark: CREATE TABLE MasterTable (    id           SERIAL   NOT NULL,    one_value    INTEGER  NOT NULL,       PRIMARY KEY(id) ); INSERT INTO MasterTable (one_value) values (generate_series(1,10000)); CREATE TABLE SecondaryTable (    id           SERIAL   NOT NULL,    one_value    INTEGER  NOT NULL,    master_value INTEGER  NOT NULL    REFERENCES MasterTable ON DELETE CASCADE       PRIMARY KEY(id) ); INSERT INTO SecondaryTable (master_value, one_value)  (SELECT s.a, s.a FROM generate_series(1,10000) AS s(a)); I also had two other versions of SecondaryTable: In one scenario, there is a my_blob column, of type BYTEA, containing 5 million 'x" characters. A final version had a 5-million 'x' character document loaded into a large object in SecionaryTable. The idea was simple: I wanted to see how much faster or slower it was to delete (not truncate) all of the records in MasterTable, given these different data types. Would bytea be significantly faster than large objects?   How would the cascading delete affect things? And how long does it take to pg_dump with large objects around? Here are the results, which were pretty dramatic. Basically, pg_dump seems to always be far, far slower than BYTEA columns. Again, I'm wondering whether I'm doing something wrong here, or if this explains why in my many years of using PostgreSQL, I've neither used nor been tempted to use large objects before. 1.1 1,000 records ==================                  Delete   Dump    ---------------+---------+--------  Empty content  0.172s   0.057s  bytea          0.488s   0.066s  large object   30.833s  9.275s 1.2 10,000 records ===================                  Delete     Dump       ---------------+-----------+-----------  Empty content  8.162s     0.064s     bytea          1m0.417s   0.157s     large object   4m44.501s  1m38.454s Any ideas?   If this is true, should we be warning people away from large objects in the documentation, and toward bytea? Reuven
pgsql-general by date: