Thread: problems with large objects dump

problems with large objects dump

From
Sergio Gabriel Rodriguez
Date:
Our production database, postgres 8.4 has an approximate size of 200 GB, most of the data are large objects (174 GB), until a few months ago we used pg_dump to perform backups, took about 3-4 hours to perform all the process. Some time ago the process became interminable, take one or two days to process, we noticed that the decay process considerably to startup backup of large object, so we had to opt for physical backups.

We perform various tests on similar servers with the same version and postgres 9.2 and it is exactly the same, the database does not have other problems, nor has performance problems during everyday use.


Could someone suggest a solution? thanks

Sergio

Re: problems with large objects dump

From
Tom Lane
Date:
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> Our production database, postgres 8.4 has an approximate size of 200 GB,
> most of the data are large objects (174 GB), until a few months ago we used
> pg_dump to perform backups, took about 3-4 hours to perform all the
> process. Some time ago the process became interminable, take one or two
> days to process, we noticed that the decay process considerably to startup
> backup of large object, so we had to opt for physical backups.

Hm ... there's been some recent work to reduce O(N^2) behaviors in
pg_dump when there are many objects to dump, but I'm not sure that's
relevant to your situation, because before 9.0 pg_dump didn't treat
blobs as full-fledged database objects.  You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
release is this, anyway?

            regards, tom lane


Re: problems with large objects dump

From
Sergio Gabriel Rodriguez
Date:
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
 You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
release is this, anyway?



Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings, 

sergio. 

Re: problems with large objects dump

From
Tom Lane
Date:
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You wouldn't happen to be
>> trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
>> release is this, anyway?

> Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and
> 9.2 and the behavior is exactly the same. The production version is 8.4.9

Well, I see three different fixes for O(N^2) pg_dump performance
problems in the 8.4.x change logs since 8.4.9, so you're a bit behind
the times there.  However, all of those fixes would have been in 9.2.0,
so if you saw no improvement with a 9.2.0 pg_dump then the problem is
something else.  Can you put together a test case for somebody else to
try, or try to locate the bottleneck yourself using oprofile or perf?

            regards, tom lane


Re: problems with large objects dump

From
Sergio Gabriel Rodriguez
Date:
Hi, 
    I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?

Hp proliant Xeon G5
32 GB RAM
OS SLES 10 + logs --> raid 6
data-->raid 6

thanks!

On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
 You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
release is this, anyway?



Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings, 

sergio. 


Re: problems with large objects dump

From
Tom Lane
Date:
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
>     I tried with Postgresql 9.2 and the process used to take almost a day
> and a half, was significantly reduced to 6 hours, before failing even used
> to take four hours. My question now is, how long should it take the backup
> for a 200GB database with 80% of large objects?

It's pretty hard to say without knowing a lot more info about your system
than you provided.  One thing that would shed some light is if you spent
some time finding out where the time is going --- is the system
constantly I/O busy, or is it CPU-bound, and if so in which process,
pg_dump or the connected backend?

Also, how many large objects is that?  (If you don't know already,
"select count(*) from pg_largeobject_metadata" would tell you.)

            regards, tom lane


Re: problems with large objects dump

From
Marcos Ortiz
Date:

On 10/11/2012 05:46 PM, Sergio Gabriel Rodriguez wrote:
Hi, 
    I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?
Regards, Sergio.
That´s depends of several things.


Hp proliant Xeon G5
32 GB RAM
OS SLES 10 + logs --> raid 6
data-->raid 6
Can you share your postgresql.conf here?
Which filesystem are you using for your data directory?
What options are you using to do the backup?


thanks!

On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
 You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
release is this, anyway?



Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings, 

sergio. 



--

Marcos Luis Ortíz Valmaseda
about.me/marcosortiz
@marcosluis2186



Re: problems with large objects dump

From
Sergio Gabriel Rodriguez
Date:
On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's pretty hard to say without knowing a lot more info about your system
than you provided.  One thing that would shed some light is if you spent
some time finding out where the time is going --- is the system
constantly I/O busy, or is it CPU-bound, and if so in which process,
pg_dump or the connected backend?


 the greatest amount of time is lost in I/O busy.

database_test=# select count(*) from pg_largeobject_metadata;
  count  
---------
 5231973
(1 row)

I never use oprofile, but for a few hours into the process, I could take this report:


opreport -l /var/lib/pgsql/bin/pg_dump
Using /var/lib/oprofile/samples/ for samples directory.
CPU: Core 2, speed 2333.42 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        symbol name
1202449  56.5535  sortDumpableObjects
174626    8.2130  DOTypeNameCompare
81181     3.8181  DeflateCompressorZlib
70640     3.3223  _WriteByte
68020     3.1991  DOCatalogIdCompare
53789     2.5298  WriteInt
39797     1.8717  WriteToc
38252     1.7991  WriteDataToArchive
32947     1.5496  WriteStr
32488     1.5280  pg_qsort
30122     1.4167  dumpTableData_copy
27706     1.3031  dumpDumpableObject
26078     1.2265  dumpBlobs
25591     1.2036  _tocEntryRequired
23030     1.0831  WriteData
21171     0.9957  buildTocEntryArrays
20825     0.9794  _WriteData
18936     0.8906  _WriteBuf
18113     0.8519  BuildArchiveDependencies
12607     0.5929  findComments
11642     0.5475  EndCompressor
10833     0.5095  _CustomWriteFunc
10562     0.4968  WriteDataChunks
10247     0.4819  dumpBlob
5947      0.2797  EndBlob
5824      0.2739  _EndBlob
5047      0.2374  main
5030      0.2366  dumpComment
4959      0.2332  AllocateCompressor
4762      0.2240  dumpSecLabel
4705      0.2213  StartBlob
4052      0.1906  WriteOffset
3285      0.1545  ArchiveEntry
2640      0.1242  _StartBlob
2391      0.1125  pg_calloc
2233      0.1050  findObjectByDumpId
2197      0.1033  SetArchiveRestoreOptions
2149      0.1011  pg_strdup
1760      0.0828  getDumpableObjects
1311      0.0617  ParseCompressionOption
1288      0.0606  med3
1248      0.0587  _WriteExtraToc
944       0.0444  AssignDumpId
916       0.0431  findSecLabels
788       0.0371  pg_malloc
340       0.0160  addObjectDependency
317       0.0149  _ArchiveEntry
144       0.0068  swapfunc
72        0.0034  ScanKeywordLookup
60        0.0028  findObjectByCatalogId
41        0.0019  fmtId
27        0.0013  ExecuteSqlQuery
20       9.4e-04  dumpTable
10       4.7e-04  getTableAttrs
8        3.8e-04  fmtCopyColumnList
6        2.8e-04  shouldPrintColumn
5        2.4e-04  findObjectByOid
3        1.4e-04  dumpFunc
3        1.4e-04  format_function_signature
3        1.4e-04  getTypes
2        9.4e-05  _StartData
2        9.4e-05  buildACLCommands
2        9.4e-05  findLoop
2        9.4e-05  getTables
2        9.4e-05  parseOidArray
2        9.4e-05  selectSourceSchema
1        4.7e-05  TocIDRequired
1        4.7e-05  _EndData
1        4.7e-05  archprintf
1        4.7e-05  dumpACL
1        4.7e-05  dumpCollation
1        4.7e-05  dumpConstraint
1        4.7e-05  dumpOpr
1        4.7e-05  expand_schema_name_patterns
1        4.7e-05  findDumpableDependencies
1        4.7e-05  fmtQualifiedId
1        4.7e-05  getCollations
1        4.7e-05  getExtensions
1        4.7e-05  getFormattedTypeName
1        4.7e-05  getIndexes
1        4.7e-05  makeTableDataInfo
1        4.7e-05  vwrite_msg


thank you very much for your help

regards.

Sergio


Re: problems with large objects dump

From
Tom Lane
Date:
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's pretty hard to say without knowing a lot more info about your system
>> than you provided.  One thing that would shed some light is if you spent
>> some time finding out where the time is going --- is the system
>> constantly I/O busy, or is it CPU-bound, and if so in which process,
>> pg_dump or the connected backend?

>  the greatest amount of time is lost in I/O busy.

In that case there's not going to be a whole lot you can do about it,
probably.  Or at least not that's very practical --- I assume "buy
faster disks" isn't a helpful answer.

If the blobs are relatively static, it's conceivable that clustering
pg_largeobject would help, but you're probably not going to want to take
down your database for as long as that would take --- and the potential
gains are unclear anyway.

> I never use oprofile, but for a few hours into the process, I could take
> this report:

> 1202449  56.5535  sortDumpableObjects

Hm.  I suspect a lot of that has to do with the large objects; and it's
really overkill to treat them as full-fledged objects since they never
have unique dependencies.  This wasn't a problem when commit
c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
might be because of the additional constraints added in commit
a1ef01fe163b304760088e3e30eb22036910a495.  I wonder if it's time to try
to optimize pg_dump's handling of blobs a bit better.  But still, any
such fix probably wouldn't make a huge difference for you.  Most of the
time is going into pushing the blob data around, I think.

            regards, tom lane


Re: problems with large objects dump

From
Tom Lane
Date:
I wrote:
> Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
>> I never use oprofile, but for a few hours into the process, I could take
>> this report:
>> 1202449  56.5535  sortDumpableObjects

> Hm.  I suspect a lot of that has to do with the large objects; and it's
> really overkill to treat them as full-fledged objects since they never
> have unique dependencies.  This wasn't a problem when commit
> c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
> might be because of the additional constraints added in commit
> a1ef01fe163b304760088e3e30eb22036910a495.  I wonder if it's time to try
> to optimize pg_dump's handling of blobs a bit better.  But still, any
> such fix probably wouldn't make a huge difference for you.  Most of the
> time is going into pushing the blob data around, I think.

For fun, I tried adding 5 million empty blobs to the standard regression
database, and then did a pg_dump.  It took a bit under 9 minutes on my
workstation.  oprofile showed about 32% of pg_dump's runtime going into
sortDumpableObjects, which might make you think that's worth optimizing
... until you look at the bigger picture system-wide:

  samples|      %|
------------------
   727394 59.4098 kernel
   264874 21.6336 postgres
   136734 11.1677 /lib64/libc-2.14.90.so
    39878  3.2570 pg_dump
    37025  3.0240 libpq.so.5.6
    17964  1.4672 /usr/bin/wc
      354  0.0289 /usr/bin/oprofiled

So actually sortDumpableObjects took only about 1% of the CPU cycles.
And remember this is with empty objects.  If we'd been shoving 200GB of
data through the dump, the data pipeline would surely have swamped all
else.

So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...

            regards, tom lane


Re: problems with large objects dump

From
Sergio Gabriel Rodriguez
Date:


On Fri, Oct 12, 2012 at 10:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...


I found something similar

 samples|      %|
------------------
233391664 60.5655 no-vmlinux
 78789949 20.4461 libz.so.1.2.3
 31984753  8.3001 postgres
 21564413  5.5960 libc-2.4.so
  4086941  1.0606 ld-2.4.so
  2427151  0.6298 bash
  2355895  0.6114 libc-2.4.so
  2173558  0.5640 pg_dump
  1771931  0.4598 oprofiled

there are anything I can do to improve this? 

Thanks

Sergio