Thread: atrocious update performance

atrocious update performance

From
"Rosser Schwarz"
Date:
We're in the throes of an MS SQL to PostgreSQL migration; our databases
include a number of ~5M row tables.  We decided to take this opportunity
to clean up and slightly re-normalize our schemas, given what we've
learned about the data over its lifetime and such, else we wouldn't be
experiencing any of the following (we could instead just dump and `copy
from`).

We have a temporary table, public.tempprod, containing 4.7M rows, one
for each row in account.cust.  account.cust has, among others, two
columns, prod and subprod, which we're trying to update from tempprod
joined against prod.  The update tends to take unnecessarily long--
rather, we've had to finally kill it after its taking obscenely too
long.

The table:

# \d account.cust
                                      Table "account.cust"
  Column   |            Type             |             Modifiers
-----------+-----------------------------+----------------------------------
----
 custid    | bigint                      | not null default
           |                             |
nextval('account.custid_seq'::text)
 ownerid   | integer                     | not null
 origid    | text                        | not null
 pname     | text                        |
 fname     | text                        |
 mname     | text                        |
 lname     | text                        |
 suffix    | text                        |
 addr1     | text                        |
 addr2     | text                        |
 addr3     | text                        |
 city      | text                        |
 state     | text                        |
 zip       | text                        |
 zipplus   | text                        |
 homeph    | text                        |
 workph    | text                        |
 otherph   | text                        |
 ssn       | text                        |
 isactive  | boolean                     | default true
 createddt | timestamp without time zone | default now()
 prodid    | bigint                      |
 subprodid | bigint                      |
Indexes:
    "cust_pkey" primary key, btree (custid)
    "ix_addr1" btree (addr1) WHERE (addr1 IS NOT NULL)
    "ix_addr2" btree (addr2) WHERE (addr2 IS NOT NULL)
    "ix_city" btree (city) WHERE (city IS NOT NULL)
    "ix_fname" btree (fname) WHERE (fname IS NOT NULL)
    "ix_homeph" btree (homeph) WHERE (homeph IS NOT NULL)
    "ix_lname" btree (lname) WHERE (lname IS NOT NULL)
    "ix_mname" btree (mname) WHERE (mname IS NOT NULL)
    "ix_origid" btree (origid)
    "ix_ssn" btree (ssn) WHERE (ssn IS NOT NULL)
    "ix_state" btree (state) WHERE (state IS NOT NULL)
    "ix_workph" btree (workph) WHERE (workph IS NOT NULL)
    "ix_zip" btree (zip) WHERE (zip IS NOT NULL)

We're currently running on a dual Xeon 700 (I know, I know; it's what
we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell
Perc3 something-or-other controller).  If we can demonstrate that
PostgreSQL will meet our needs, we'll be going production on a dual
Opteron, maxed memory, with a 12-disk Fibre Channel array.

The query is:

update account.cust set prodid =
(select p.prodid from account.prod p
    join public.tempprod t on t.pool = p.origid
    where custid = t.did)

And then, upon its completion, s/prod/subprod/.

That shouldn't run overnight, should it, let alone for -days-?

In experimenting with ways of making the updates take less time, we tried
adding product and subproduct columns to tempprod, and updating those.
That seemed to work marginally better:

explain analyze update public.tempprod set prodid =
(select account.prod.prodid::bigint
    from account.prod
    where public.tempprod.pool::text = account.prod.origid::text)

Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual
time=24273.467..16090470.438 rows=4731410 loops=1)
  SubPlan
    ->  Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315
        rows=1 loops=4731410)
          ->  Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual
              time=0.126..0.305 rows=1 loops=4731410)
              Filter: (($0)::text = (origid)::text)
Total runtime: 2284551.962 ms

But then going from public.tempprod to account.cust again takes days.  I
just cancelled an update that's been running since last Thursday.
Alas, given how long the queries take to run, I can't supply an `explain
analyze`.  The `explain` seems reasonable enough:

# explain update account.cust set prodid = tempprod.prodid
    where tempprod.did = origid;

 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".did)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
       rows=4731410 width=244)
   ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
       rows=4731410 width=18)

The relevant bits from my postgreql.conf (note, we built with a BLCKSZ
of 16K):

shared_buffers = 4096
sort_mem = 32768
vacuum_mem = 32768
wal_buffers = 16384
checkpoint_segments = 64
checkpoint_timeout = 1800
checkpoint_warning = 30
commit_delay = 50000
effective_cache_size = 131072

Any advice, suggestions or comments of the "You bleeding idiot, why do
you have frob set to x?!" sort welcome.  Unfortunately, if we can't
improve this, significantly, the powers what be will probably pass
on PostgreSQL, even though everything we've done so far--with this
marked exception--performs pretty spectacularly, all told.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Rod Taylor
Date:
> # explain update account.cust set prodid = tempprod.prodid
>     where tempprod.did = origid;
>
>  Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
>    Merge Cond: (("outer".origid)::text = ("inner".did)::text)
>    ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
>        rows=4731410 width=244)
>    ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
>        rows=4731410 width=18)

I'm going to hazard a guess and say you have a number of foreign keys
that refer to account.cust.prodid? This is probably the time consuming
part -- perhaps even a missing index on one of those keys that refers to
this field.

Going the other way should be just as good for your purposes, and much
faster since you're not updating several foreign key'd fields bound to
account.cust.prodid.

UPDATE tempprod.prodid = prodid
  FROM account.cust
 WHERE temprod.did = cust.origid;



Re: atrocious update performance

From
"Rosser Schwarz"
Date:
> > # explain update account.cust set prodid = tempprod.prodid
> >     where tempprod.did = origid;

> >  Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
> >    Merge Cond: (("outer".origid)::text = ("inner".did)::text)
> >    ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
> >        rows=4731410 width=244)
> >    ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
> >        rows=4731410 width=18)

> I'm going to hazard a guess and say you have a number of foreign keys
> that refer to account.cust.prodid? This is probably the time consuming
> part -- perhaps even a missing index on one of those keys
> that refers to
> this field.

Actually, there are no foreign keys to those columns.  Once they're
populated, I'll apply a foreign key constraint and they'll refer to the
appropriate row in the prod and subprod tables, but nothing will
reference account.cust.[sub]prodid.  There are, of course, several foreign
keys referencing account.cust.custid.

> Going the other way should be just as good for your purposes, and much
> faster since you're not updating several foreign key'd fields bound to
> account.cust.prodid.

> UPDATE tempprod.prodid = prodid
>   FROM account.cust
>  WHERE temprod.did = cust.origid;

Not quite. Without this update, acount.cust.[sub]prodid are null.  The
data was strewn across multiple tables in MS SQL; we're normalizing it
into one, hence the need to populate the two columns independently.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
"Aaron Werman"
Date:
Bulk updates are generally dogs (not just in pg), so I avoid doing them by
doing faster selects and inserts. You can create a new table using 'create
table as' to produce your target results. This is real fast - avoiding the
row iteration in insert, allowing the select optimizer to run and no index
overhead. Then alter/rename, add indexes and whatever else hangs off the
table (or if you're lazy do an insert/select into the original target
table). I often see 2 orders of magnitude improvement doing this, and no
need to vacuum.

/Aaron

----- Original Message -----
From: "Rosser Schwarz" <rschwarz@totalcardinc.com>
To: <pgsql-performance@postgresql.org>
Sent: Monday, March 15, 2004 3:28 PM
Subject: [PERFORM] atrocious update performance


We're in the throes of an MS SQL to PostgreSQL migration; our databases
include a number of ~5M row tables.  We decided to take this opportunity
to clean up and slightly re-normalize our schemas, given what we've
learned about the data over its lifetime and such, else we wouldn't be
experiencing any of the following (we could instead just dump and `copy
from`).

We have a temporary table, public.tempprod, containing 4.7M rows, one
for each row in account.cust.  account.cust has, among others, two
columns, prod and subprod, which we're trying to update from tempprod
joined against prod.  The update tends to take unnecessarily long--
rather, we've had to finally kill it after its taking obscenely too
long.

The table:

# \d account.cust
                                      Table "account.cust"
  Column   |            Type             |             Modifiers
-----------+-----------------------------+----------------------------------
----
 custid    | bigint                      | not null default
           |                             |
nextval('account.custid_seq'::text)
 ownerid   | integer                     | not null
 origid    | text                        | not null
 pname     | text                        |
 fname     | text                        |
 mname     | text                        |
 lname     | text                        |
 suffix    | text                        |
 addr1     | text                        |
 addr2     | text                        |
 addr3     | text                        |
 city      | text                        |
 state     | text                        |
 zip       | text                        |
 zipplus   | text                        |
 homeph    | text                        |
 workph    | text                        |
 otherph   | text                        |
 ssn       | text                        |
 isactive  | boolean                     | default true
 createddt | timestamp without time zone | default now()
 prodid    | bigint                      |
 subprodid | bigint                      |
Indexes:
    "cust_pkey" primary key, btree (custid)
    "ix_addr1" btree (addr1) WHERE (addr1 IS NOT NULL)
    "ix_addr2" btree (addr2) WHERE (addr2 IS NOT NULL)
    "ix_city" btree (city) WHERE (city IS NOT NULL)
    "ix_fname" btree (fname) WHERE (fname IS NOT NULL)
    "ix_homeph" btree (homeph) WHERE (homeph IS NOT NULL)
    "ix_lname" btree (lname) WHERE (lname IS NOT NULL)
    "ix_mname" btree (mname) WHERE (mname IS NOT NULL)
    "ix_origid" btree (origid)
    "ix_ssn" btree (ssn) WHERE (ssn IS NOT NULL)
    "ix_state" btree (state) WHERE (state IS NOT NULL)
    "ix_workph" btree (workph) WHERE (workph IS NOT NULL)
    "ix_zip" btree (zip) WHERE (zip IS NOT NULL)

We're currently running on a dual Xeon 700 (I know, I know; it's what
we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell
Perc3 something-or-other controller).  If we can demonstrate that
PostgreSQL will meet our needs, we'll be going production on a dual
Opteron, maxed memory, with a 12-disk Fibre Channel array.

The query is:

update account.cust set prodid =
(select p.prodid from account.prod p
join public.tempprod t on t.pool = p.origid
where custid = t.did)

And then, upon its completion, s/prod/subprod/.

That shouldn't run overnight, should it, let alone for -days-?

In experimenting with ways of making the updates take less time, we tried
adding product and subproduct columns to tempprod, and updating those.
That seemed to work marginally better:

explain analyze update public.tempprod set prodid =
(select account.prod.prodid::bigint
from account.prod
where public.tempprod.pool::text = account.prod.origid::text)

Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual
time=24273.467..16090470.438 rows=4731410 loops=1)
  SubPlan
    ->  Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315
        rows=1 loops=4731410)
          ->  Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual
              time=0.126..0.305 rows=1 loops=4731410)
              Filter: (($0)::text = (origid)::text)
Total runtime: 2284551.962 ms

But then going from public.tempprod to account.cust again takes days.  I
just cancelled an update that's been running since last Thursday.
Alas, given how long the queries take to run, I can't supply an `explain
analyze`.  The `explain` seems reasonable enough:

# explain update account.cust set prodid = tempprod.prodid
where tempprod.did = origid;

 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".did)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
       rows=4731410 width=244)
   ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
       rows=4731410 width=18)

The relevant bits from my postgreql.conf (note, we built with a BLCKSZ
of 16K):

shared_buffers = 4096
sort_mem = 32768
vacuum_mem = 32768
wal_buffers = 16384
checkpoint_segments = 64
checkpoint_timeout = 1800
checkpoint_warning = 30
commit_delay = 50000
effective_cache_size = 131072

Any advice, suggestions or comments of the "You bleeding idiot, why do
you have frob set to x?!" sort welcome.  Unfortunately, if we can't
improve this, significantly, the powers what be will probably pass
on PostgreSQL, even though everything we've done so far--with this
marked exception--performs pretty spectacularly, all told.

/rls

--
Rosser Schwarz
Total Card, Inc.


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: atrocious update performance

From
Rod Taylor
Date:
On Mon, 2004-03-15 at 16:15, Rosser Schwarz wrote:
> > > # explain update account.cust set prodid = tempprod.prodid
> > >     where tempprod.did = origid;
>
> > >  Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
> > >    Merge Cond: (("outer".origid)::text = ("inner".did)::text)
> > >    ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
> > >        rows=4731410 width=244)
> > >    ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
> > >        rows=4731410 width=18)
>
> > I'm going to hazard a guess and say you have a number of foreign keys
> > that refer to account.cust.prodid? This is probably the time consuming
> > part -- perhaps even a missing index on one of those keys
> > that refers to
> > this field.
>
> Actually, there are no foreign keys to those columns.  Once they're
> populated, I'll apply a foreign key constraint and they'll refer to the
> appropriate row in the prod and subprod tables, but nothing will
> reference account.cust.[sub]prodid.  There are, of course, several foreign
> keys referencing account.cust.custid.

If there are no feign keys to it, I wouldn't expect it to take more than
10 minutes on slow hardware.

Fresh out of ideas here.



Re: atrocious update performance

From
"Rosser Schwarz"
Date:
> You can create a new table using 'create table as' to produce your
> target results. This is real fast ...
> I often see 2 orders of magnitude improvement doing this, and no
> need to vacuum.

Indeed:

"Query returned successfully with no result in 582761 ms."

Though I must say, ten minutes is nominally more than two orders of
mangitude performance improvement, versus several days.

Many thanks, Aaron.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
>> You can create a new table using 'create table as' to produce your
>> target results. This is real fast ...
>> I often see 2 orders of magnitude improvement doing this, and no
>> need to vacuum.

> Indeed:
> "Query returned successfully with no result in 582761 ms."
> Though I must say, ten minutes is nominally more than two orders of
> mangitude performance improvement, versus several days.

Hm.  There is no way that inserting a row is two orders of magnitude
faster than updating a row --- they both require storing a new row and
making whatever index entries are needed.  The only additional cost of
the update is finding the old row (not a very big deal AFAICS in the
examples you gave) and marking it deleted (definitely cheap).  So
there's something awfully fishy going on here.

I'm inclined to suspect an issue with foreign-key checking.  You didn't
give us any details about foreign key relationships your "cust" table is
involved in --- could we see those?  And the schemas of the other tables
involved?

Also, exactly which PG version is this?

            regards, tom lane

Re: atrocious update performance

From
"Aaron Werman"
Date:
The original point was about a very slow update of an entire table with a
plan that looped, and over a dozen conditional indices - vs. a 'create as'
in a CPU starved environment. I stand by my statement about observing the
orders of magnitude difference. In theory I agree that the update should be
in the same order of magnitude as the create as, but in practice I disagree.
I also think something is wrong on the logical side (besides FKs, are there
any triggers?) but was responding to the Gordian knot issue of bailing out
of pg.

Can you post a sample extract, Rosser? Otherwise, I'll try to put together a
sample of a slow mass join update.

/Aaron

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Rosser Schwarz" <rschwarz@totalcardinc.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, March 15, 2004 7:08 PM
Subject: Re: [PERFORM] atrocious update performance


> "Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> >> You can create a new table using 'create table as' to produce your
> >> target results. This is real fast ...
> >> I often see 2 orders of magnitude improvement doing this, and no
> >> need to vacuum.
>
> > Indeed:
> > "Query returned successfully with no result in 582761 ms."
> > Though I must say, ten minutes is nominally more than two orders of
> > mangitude performance improvement, versus several days.
>
> Hm.  There is no way that inserting a row is two orders of magnitude
> faster than updating a row --- they both require storing a new row and
> making whatever index entries are needed.  The only additional cost of
> the update is finding the old row (not a very big deal AFAICS in the
> examples you gave) and marking it deleted (definitely cheap).  So
> there's something awfully fishy going on here.
>
> I'm inclined to suspect an issue with foreign-key checking.  You didn't
> give us any details about foreign key relationships your "cust" table is
> involved in --- could we see those?  And the schemas of the other tables
> involved?
>
> Also, exactly which PG version is this?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: atrocious update performance

From
Greg Stark
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:

> Actually, there are no foreign keys to those columns.  Once they're
> populated, I'll apply a foreign key constraint and they'll refer to the
> appropriate row in the prod and subprod tables, but nothing will
> reference account.cust.[sub]prodid.  There are, of course, several foreign
> keys referencing account.cust.custid.

Just to be clear, the foreign key constraints they're worrying about are not
constraints on the table you're updating. They're constraints on other tables
referring to the table you're updating.

Since you're updating the column here postgres has to be sure nothing is
referring to the old value you're obliterating, and to do that it has to
select for possible records in the referencing tables referring to the value.
If there are any references in other tables referring to this column then you
need an index on the column in the referencing table to be able to update the
column in referenced table efficiently.

--
greg

Re: atrocious update performance

From
Shridhar Daithankar
Date:
Rosser Schwarz wrote:

  > shared_buffers = 4096
> sort_mem = 32768
> vacuum_mem = 32768
> wal_buffers = 16384
> checkpoint_segments = 64
> checkpoint_timeout = 1800
> checkpoint_warning = 30
> commit_delay = 50000
> effective_cache_size = 131072

You didn't mention the OS so I would take it as either linux/freeBSD.

First of all, your shared buffers are low. 4096 is 64MB with 16K block size. I
would say at least push them to 150-200MB.

Secondly your sort mem is too high. Note that it is per sort per query. You
could build a massive swap storm with such a setting.

Similarly pull down vacuum and WAL buffers to around 512-1024 each.

I know that your problem is solved by using insert rather than updates. But I
just want to point out that you still need to analyze the table to update the
statistics or the further queres will not be exactly good.

And lastly, you can bundle entire thing including creating duplicate table,
populating it, renaming original table etc in a single transaction and nobody
will notice it. I am almost sure MS-SQL can not do that. Not many databases have
trasact-safe DDLs out there..

  HTH

  Shridhar

Re: atrocious update performance

From
Tom Lane
Date:
Shridhar Daithankar <shridhar@frodo.hserus.net> writes:
> Rosser Schwarz wrote:
>> shared_buffers = 4096
>> sort_mem = 32768
>> vacuum_mem = 32768
>> wal_buffers = 16384
>> checkpoint_segments = 64
>> checkpoint_timeout = 1800
>> checkpoint_warning = 30
>> commit_delay = 50000
>> effective_cache_size = 131072

> First of all, your shared buffers are low. 4096 is 64MB with 16K block
> size. I would say at least push them to 150-200MB.

Check.  Much more than that isn't necessarily better though.
shared_buffers = 10000 is frequently mentioned as a "sweet spot".

> Secondly your sort mem is too high. Note that it is per sort per query. You
> could build a massive swap storm with such a setting.

Agreed, but I doubt that has anything to do with the immediate
problem, since he's not testing parallel queries.

> Similarly pull down vacuum and WAL buffers to around 512-1024 each.

The vacuum_mem setting here is 32Mb, which seems okay to me, if not on
the low side.  Again though it's not his immediate problem.

I agree that the wal_buffers setting is outlandishly large; I can't see
any plausible reason for it to be more than a few dozen.  I don't know
whether oversized wal_buffers can directly cause any performance issues,
but it's certainly not a well-tested scenario.

The other setting I was going to comment on is checkpoint_warning;
it seems mighty low in comparison to checkpoint_timeout.  If you are
targeting a checkpoint every half hour, I'd think you'd want the system
to complain about checkpoints spaced more closely than several minutes.

But with the possible exception of wal_buffers, I can't see anything in
these settings that explains the originally complained-of performance
problem.  I'm still wondering about foreign key checks.

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> But with the possible exception of wal_buffers, I can't see
> anything in
> these settings that explains the originally complained-of performance
> problem.  I'm still wondering about foreign key checks.

Many of the configs I posted were fairly wild values, set to gather
data points for further tweaking.  Unfortunately, with this query
there hasn't been time for many passes, and I've too much else on my
plate to try concocting demonstration cases.  The postmaster's been
hupped with more sane values, but I experienced this same issue with
the defaults.

As for foreign keys, three tables refer to account.cust; all of them
refer to account.cust.custid, the pk.  One of those tables has several
hundred thousand rows, many more to come; the others are empty.  Unless
I've woefully misunderstood, the presence or absence of a foreign key
referring to one column should be moot for updates writing another
column, shouldn't it?

To answer your (and others') question, Tom, 7.4.1 on 2.4.20-18.9smp.
Red Hat, I believe.  I was handed the machine, which is also in use
for lightweight production stuff:  intranet webserver, rinky-dink
MySQL doo-dads, &c.  I'm sure that has an impact, usurping the disk
heads and such--maybe even more than I'd expect--but I can't imagine
that'd cause an update to one 4.7M row table, from another 4.7M row
table, both clustered on a join column that maps one-to-one between
them, to take days.  I'm baffled; everything else is perfectly snappy,
given the hardware.  Anything requiring a sequential scan over one of
the big tables is a slog, but that's to be expected and hence all the
indices.

Watching iostat, I've observed a moderately cyclic read-big, write-
big pattern, wavelengths generally out of phase, interspersed with
smaller, almost epicycles--from the machine's other tasks, I'm sure.
top has postmaster's cpu usage rarely breaking 25% over the course
of the query's execution, and spending most of its time much lower;
memory usage hovers somewhere north of 500MB.

In what little time I had to stare at a disturbingly matrix-esque
array of terminals scrolling sundry metrics, I didn't notice a
correlation between cpu usage spikes and peaks in the IO cycle's
waveforms.  For whatever that's worth.

The other tables involved are:

# \d account.acct
                                      Table "account.acct"
   Column   |            Type             |        Modifiers
------------+-----------------------------+---------------------------------
----
 acctid     | bigint                      | not null default
                                          |
nextval('account.acctid_seq'::text)
 custid     | bigint                      |
 acctstatid | integer                     | not null
 acctno     | character varying(50)       |
 bal        | money                       |
 begdt      | timestamp without time zone | not null
 enddt      | timestamp without time zone |
 debtid     | character varying(50)       |
Indexes:
    "acct_pkey" primary key, btree (acctid)
    "ix_acctno" btree (acctno) WHERE (acctno IS NOT NULL)
Foreign-key constraints:
    "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
         ON UPDATE CASCADE ON DELETE RESTRICT
    "$2" FOREIGN KEY (acctstatid) REFERENCES account.acctstat(acctstatid)
         ON UPDATE CASCADE ON DELETE RESTRICT

# \d account.note
                                      Table "account.note"
  Column   |            Type             |           Modifiers
-----------+-----------------------------+----------------------------------
---
 noteid    | bigint                      | not null default
                                         |
nextval('account.noteid_seq'::text)
 custid    | bigint                      | not null
 note      | text                        | not null
 createddt | timestamp without time zone | not null default now()
Indexes:
    "note_pkey" primary key, btree (noteid)
Foreign-key constraints:
    "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
         ON UPDATE CASCADE ON DELETE RESTRICT

# \d account.origacct
               Table "account.origacct"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 custid      | bigint                      |
 lender      | character varying(50)       |
 chgoffdt    | timestamp without time zone |
 opendt      | timestamp without time zone |
 offbureaudt | timestamp without time zone |
 princbal    | money                       |
 intbal      | money                       |
 totbal      | money                       |
 lastpayamt  | money                       |
 lastpaydt   | timestamp without time zone |
 debttype    | integer                     |
 debtid      | character varying(10)       |
 acctno      | character varying(50)       |
Foreign-key constraints:
    "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
         ON UPDATE CASCADE ON DELETE RESTRICT

And the table we were joining to get the new values for prodid and
subprodid:

# \d tempprod
            Table "public.tempprod"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 debtid    | character varying(10) | not null
 pool      | character varying(10) | not null
 port      | character varying(10) | not null
 subprodid | bigint                |
 prodid    | bigint                |
Indexes:
    "ix_debtid" btree (debtid)

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> As for foreign keys, three tables refer to account.cust; all of them
> refer to account.cust.custid, the pk.  One of those tables has several
> hundred thousand rows, many more to come; the others are empty.  Unless
> I've woefully misunderstood, the presence or absence of a foreign key
> referring to one column should be moot for updates writing another
> column, shouldn't it?

Well, that is the crux of the issue, and also why I was asking about
versions.  It's only been since 7.3.4 or so that we skip checking FKs on
update.

Looking at the code, though, the update check is only skipped if the
previous version of the row predates the current transaction.
(Otherwise we can't be sure that the value was ever checked.)  This
means that slow FK checks could be your problem if the application is
set up to issue multiple UPDATEs affecting the same row(s) during a
single transaction.  I'm not clear on whether that applies to you or not.

And anyway the bottom line is: have you got indexes on the columns
*referencing* account.cust.custid?  If not you probably ought to add
them, since without them there will definitely be some slow cases.

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> ...slow FK checks could be your problem if the application is set
> up to issue multiple UPDATEs affecting the same row(s) during a
> single transaction.  I'm not clear on whether that applies to you
> or not.

It shouldn't.  It's just one large batch update that should be hitting
every row serially.

> And anyway the bottom line is: have you got indexes on the columns
> *referencing* account.cust.custid?

No.  I'd've sworn I had one on account.acct.custid, since that table
is popupated (currently ~500K rows), but it's not.

$ time psql tci -c "explain analyze select * from account.acct where
custid = 257458"
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on acct  (cost=0.00..7166.68 rows=2 width=71) (actual
 time=1047.122..1047.122 rows=0 loops=1)
   Filter: (custid = 257458)
 Total runtime: 1047.362 ms
(3 rows)


real    0m1.083s
user    0m0.010s
sys     0m0.000s

If it is looking up the custid in account.acct for each row, that's,
say, 1 seconds per lookup, for 4.7 million lookups, for, if my math
is right (4,731,410 / 3600 / 24) 54 days.  I suppose that tracks, but
that doesn't make sense, given what you said about the fk checks,
above.

Of course, if I index the column and amend the query to say "where
custid = 194752::bigint" I get back much saner numbers:

                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using ix_fk_acct_custid on acct  (cost=0.00..3.34 rows=2
 width=71) (actual time=0.126..0.141 rows=2 loops=1)
   Index Cond: (custid = 194752::bigint)
 Total runtime: 0.314 ms
(3 rows)


real    0m0.036s
user    0m0.010s
sys     0m0.000s

Which would still take just under two days.

$ time psql tci -c "explain analyze update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid"

But if I'm not touching the column referenced from account.acct, why
would it be looking there at all?  I've got an explain analyze of the
update running now, but until it finishes, I can't say for certain
what it's doing.  explain, alone, says:

$ time psql tci -c "explain update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid;"
                              QUERY PLAN
---------------------------------------------------------------------
 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
       rows=4731410 width=236)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26)
(4 rows)


real    0m26.965s
user    0m0.010s
sys     0m0.000s

which shows it not hitting account.acct at all.  (And why did it take
the planner 20-some seconds to come up with that query plan?)

tempprod doesn't have an index either, but then it doesn't reference
account.cust; instead, the update would be done by joining the two on
debtid/origid, which map one-to-one, are both indexed, and with both
tables clustered on those indices--exactly as was the CREATE TABLE AS
Aaron suggested elsethread.

Unfortunately, this isn't the only large update we'll have to do.  We
receive a daily, ~100K rows file that may have new values for any field
of any row in account.cust, .acct or sundry other tables.  The process
of updating from that file is time-critical; it must run in minutes, at
the outside.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> But if I'm not touching the column referenced from account.acct, why
> would it be looking there at all?  I've got an explain analyze of the
> update running now, but until it finishes, I can't say for certain
> what it's doing.  explain, alone, says:

EXPLAIN won't tell you anything about triggers that might get fired
during the UPDATE, so it's not much help for investigating possible
FK performance problems.  EXPLAIN ANALYZE will give you some indirect
evidence: the difference between the total query time and the total time
reported for the topmost plan node represents the time spent running
triggers and physically updating the tuples.  I suspect we are going
to see a big difference.

> which shows it not hitting account.acct at all.  (And why did it take
> the planner 20-some seconds to come up with that query plan?)

It took 20 seconds to EXPLAIN?  That's pretty darn odd in itself.  I'm
starting to think there must be something quite whacked-out about your
installation, but I haven't got any real good ideas about what.

(I'm assuming of course that there weren't a ton of other jobs eating
CPU while you tried to do the EXPLAIN.)

[ thinks for awhile... ]  The only theory that comes to mind for making
the planner so slow is oodles of dead tuples in pg_statistic.  Could I
trouble you to run
    vacuum full verbose pg_statistic;
and send along the output?

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> EXPLAIN won't tell you anything about triggers that might get fired
> during the UPDATE, so it's not much help for investigating possible
> FK performance problems.  EXPLAIN ANALYZE will give you some indirect
> evidence: the difference between the total query time and the total time
> reported for the topmost plan node represents the time spent running
> triggers and physically updating the tuples.  I suspect we are going
> to see a big difference.

It's still running.

> It took 20 seconds to EXPLAIN?  That's pretty darn odd in itself.

It struck me, too.

> I'm starting to think there must be something quite whacked-out about
> your installation, but I haven't got any real good ideas about what.

Built from source.  configure arguments:

./configure --prefix=/var/postgresql --bindir=/usr/bin
--enable-thread-safety --with-perl --with-python --with-openssl
--with-krb5=/usr/kerberos

I can answer more specific questions; otherwise, I'm not sure what to
look for, either.  If we could take the machine out of production (oh,
hell; I think I just volunteered myself for weekend work) long enough
to reinstall everything to get a fair comparison...

So far as I know, though, it's a more or less stock Red Hat.  2.4.20-
something.

> (I'm assuming of course that there weren't a ton of other jobs eating
> CPU while you tried to do the EXPLAIN.)

CPU's spiked sopradically, which throttled everything else, but it never
stays high.  top shows the current explain analyze running between 50-
ish% and negligible.  iostat -k 3 shows an average of 3K/sec written, for
a hundred-odd tps.

I can't get any finer-grained than that, unfortunately; the machine was
handed to me with a single, contiguous filesystem, in production use.

> [ thinks for awhile... ]  The only theory that comes to mind
> for making
> the planner so slow is oodles of dead tuples in pg_statistic.  Could I
> trouble you to run
>       vacuum full verbose pg_statistic;
> and send along the output?

INFO:  vacuuming "pg_catalog.pg_statistic"
INFO:  "pg_statistic": found 215 removable, 349 nonremovable row versions
in 7 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 72 to 8132 bytes long.
There were 3 unused item pointers.
Total free space (including removable row versions) is 91572 bytes.
0 pages are or will become empty, including 0 at the end of the table.
7 pages containing 91572 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.71 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  215 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_statistic": moved 120 row versions, truncated 7 to 5 pages
DETAIL:  CPU 0.03s/0.01u sec elapsed 0.17 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  120 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16408"
INFO:  "pg_toast_16408": found 12 removable, 12 nonremovable row versions
in 5 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 660 to 8178 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 91576 bytes.
2 pages are or will become empty, including 0 at the end of the table.
5 pages containing 91576 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  12 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  "pg_toast_16408": moved 10 row versions, truncated 5 to 3 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

Having never more than glanced at the output of "vacuum verbose", I
can't say whether that makes the cut for oodles.  My suspicion is no.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> Having never more than glanced at the output of "vacuum verbose", I
> can't say whether that makes the cut for oodles.  My suspicion is no.

Nope, it sure doesn't.  We occasionally see people who don't know they
need to vacuum regularly and have accumulated hundreds or thousands of
dead tuples for every live one :-(.  That's clearly not the issue here.

I'm fresh out of ideas, and the fact that this is a live server kinda
limits what we can do experimentally ... but clearly, *something* is
very wrong.

Well, when you don't know what to look for, you still have to look.
One possibly useful idea is to trace the kernel calls of the backend
process while it does that ridiculously long EXPLAIN --- think you could
try that?

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

[trace]

`strace -p 21882` run behind the below query and plan ... below that.

# explain update account.cust set prodid = tempprod.prodid, subprodid =
tempprod.subprodid where origid = tempprod.debtid;
                      QUERY PLAN
-------------------------------------------------------------------------
 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
       rows=4731410 width=236)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26)
(4 rows)

----------

recv(9, "Q\0\0\0}explain update account.cust"..., 8192, 0) = 126
gettimeofday({1079482151, 106228}, NULL) = 0
brk(0)                                  = 0x82d9000
brk(0x82db000)                          = 0x82db000
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 8
_llseek(8, 212402176, [212402176], SEEK_SET) = 0
write(8, "\342\1\0\0\0\314\374\6\24\0\0\0\214\7pG\360\177\1\200\320"...,
32768) = 32768
close(8)                                = 0
open("/var/lib/pgsql/data/base/495616/16635", O_RDWR|O_LARGEFILE) = 8
read(8, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 60817408, [60817408], SEEK_SET) = 0
write(10, "\342\1\0\0`\334\5\7\24\0\0\0t\0010x\360\177\1\200\330\377"...,
32768) = 32768
close(10)                               = 0
read(8, "\334\1\0\0h\217\270n\24\0\0\0H\0H|\360\177\1\200@\376\220"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 130777088, [130777088], SEEK_SET) = 0
write(10, "\342\1\0\0<\341\7\7\24\0\0\0004\t0I\360\177\1\200\330\377"...,
32768) = 32768
close(10)                               = 0
open("/var/lib/pgsql/data/base/495616/16595", O_RDWR|O_LARGEFILE) = 10
read(10, "\334\1\0\0\360\216\270n\24\0\0\0X\0@y\0\200\1\200\320\371"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 11
_llseek(11, 145915904, [145915904], SEEK_SET) = 0
write(11, "\342\1\0\0\300\350\n\7\24\0\0\0\224\6\310Z\360\177\1\200"...,
32768) = 32768
close(11)                               = 0
open("/var/lib/pgsql/data/base/495616/16614", O_RDWR|O_LARGEFILE) = 11
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 148570112, [148570112], SEEK_SET) = 0
write(12, "\342\1\0\0\274\365\22\7\24\0\0\0X\3\234o\360\177\1\200"...,
32768)
= 32768
close(12)                               = 0
_llseek(11, 98304, [98304], SEEK_SET)   = 0
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\34\0\234\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 251789312, [251789312], SEEK_SET) = 0
write(12, "\342\1\0\0l\366\23\7\24\0\0\0\364\10\260J\360\177\1\200"...,
32768)
= 32768
close(12)                               = 0
_llseek(11, 32768, [32768], SEEK_SET)   = 0
read(11, "\340\1\0\0\324\231\273\241\24\0\0\0\234\5\330\26\360\177"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 117309440, [117309440], SEEK_SET) = 0
write(12, "\342\1\0\0d\36)\7\24\0\0\0000\tHI\360\177\1\200\330\377"...,
32768)
= 32768
close(12)                               = 0
open("/var/lib/pgsql/data/base/495616/1259", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 32768, [32768], SEEK_SET)   = 0
read(12, "\334\1\0\0\324v-p\24\0\0\0000\3\304\3\0\200\1\200<\377"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 13
_llseek(13, 247824384, [247824384], SEEK_SET) = 0
write(13, "\342\1\0\0h *\7\24\0\0\0\204\4dm\360\177\1\200\340\377"...,
32768)
= 32768
close(13)                               = 0
open("/var/lib/pgsql/data/base/495616/16613", O_RDWR|O_LARGEFILE) = 13
read(13, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 204472320, [204472320], SEEK_SET) = 0
write(14, "\342\1\0\0\314\272:\7\24\0\0\0\324\t\354K\360\177\1\200"...,
32768)
= 32768
close(14)                               = 0
read(13, "\340\1\0\0X\231\273\241\24\0\0\0\370\6Dk\360\177\1\200"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 152010752, [152010752], SEEK_SET) = 0
write(14, "\342\1\0\0p\277<\7\24\0\0\0\364\n\220I\360\177\1\200\334"...,
32768) = 32768
close(14)                               = 0
open("/var/lib/pgsql/data/base/495616/16610", O_RDWR|O_LARGEFILE) = 14
read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 86441984, [86441984], SEEK_SET) = 0
write(15, "\342\1\0\0\330B?\7\24\0\0\0\370\6 N\360\177\1\200\310\377"...,
32768) = 32768
close(15)                               = 0
_llseek(14, 98304, [98304], SEEK_SET)   = 0
read(14, "\340\1\0\0,l\257\241\24\0\0\0(\0\250\177\360\177\1\200"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 121896960, [121896960], SEEK_SET) = 0
write(15, "\342\1\0\0\264\303?\7\24\0\0\0\234\tHP\360\177\1\200\334"...,
32768) = 32768
close(15)                               = 0
_llseek(14, 65536, [65536], SEEK_SET)   = 0
read(14, "\334\1\0\0\310u\252n\23\0\0\0\234\20\320=\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 41549824, [41549824], SEEK_SET) = 0
write(15, "\342\1\0\0\0\312B\7\24\0\0\0\234\7\350T\360\177\1\200\330"...,
32768) = 32768
close(15)                               = 0
open("/var/lib/pgsql/data/base/495616/1249", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 229376, [229376], SEEK_SET) = 0
read(15, "O\1\0\0\214\241\200\0\23\0\0\0\364\3\0\4\0\200\1\200\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 57147392, [57147392], SEEK_SET) = 0
write(16, "\342\1\0\0004\320G\7\24\0\0\0\374\7\200P\360\177\1\200"...,
32768)
= 32768
close(16)                               = 0
_llseek(15, 163840, [163840], SEEK_SET) = 0
read(15, "\21\1\0\0\214\3\224R\23\0\0\0\364\3\0\4\0\200\1\200\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 241893376, [241893376], SEEK_SET) = 0
write(16, "\342\1\0\0\220TK\7\24\0\0\0,\t`I\360\177\1\200\330\377"...,
32768)
= 32768
close(16)                               = 0
_llseek(12, 0, [0], SEEK_SET)           = 0
read(12, "O\1\0\0\350\340\316,\23\0\0\0X\3\230\3\0\200\1\200d\304"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834171", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 88702976, [88702976], SEEK_SET) = 0
write(16, "\342\1\0\0\324\326K\7\24\0\0\0`\v\370E\360\177\1\200\334"...,
32768) = 32768
close(16)                               = 0
_llseek(14, 32768, [32768], SEEK_SET)   = 0
read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\234\20\320=\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 152043520, [152043520], SEEK_SET) = 0
write(16, "\342\1\0\0\220fU\7\24\0\0\0l\n\320K\360\177\1\200\334\377"...,
32768) = 32768
close(16)                               = 0
_llseek(15, 0, [0], SEEK_SET)           = 0
read(15, "\0\0\0\0\20\0\0\0\1\0\0\0\364\3\0\4\0\200\1\200\200\377"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 70025216, [70025216], SEEK_SET) = 0
write(16, "\342\1\0\0\370\rk\7\24\0\0\0 \10\250O\360\177\1\200\330"...,
32768)
= 32768
close(16)                               = 0
read(15, "\0\0\0\0\20\0\0\0\1\0\0\0\364\3\0\4\0\200\1\200\200\377"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16
_llseek(16, 152764416, [152764416], SEEK_SET) = 0
write(16, "\342\1\0\0008\222m\7\24\0\0\0\370\10\230J\360\177\1\200"...,
32768)
= 32768
close(16)                               = 0
open("/var/lib/pgsql/data/base/495616/16630", O_RDWR|O_LARGEFILE) = 16
read(16, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 17
_llseek(17, 143753216, [143753216], SEEK_SET) = 0
write(17, "\342\1\0\0\314!w\7\24\0\0\0\20\t\10J\360\177\1\200\330"...,
32768)
= 32768
close(17)                               = 0
read(16, "\340\1\0\0\340\204\264\241\24\0\0\0H\2Ty\360\177\1\200"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 17
_llseek(17, 192512000, [192512000], SEEK_SET) = 0
write(17, "\342\1\0\0`\253y\7\24\0\0\0\250\7\330G\360\177\1\200\324"...,
32768) = 32768
close(17)                               = 0
open("/var/lib/pgsql/data/base/495616/16390", O_RDWR|O_LARGEFILE) = 17
read(17, "\334\1\0\0t\242\23p\24\0\0\0\0\2\210\2\0\200\1\200\24\377"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/16396", O_RDWR|O_LARGEFILE) = 18
_llseek(18, 0, [32768], SEEK_END)       = 0
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 19
_llseek(19, 63471616, [63471616], SEEK_SET) = 0
write(19, "\342\1\0\0\2444\200\7\24\0\0\0$\10\240O\360\177\1\200\330"...,
32768) = 32768
close(19)                               = 0
_llseek(18, 0, [0], SEEK_SET)           = 0
read(18, "\0\0\0\0\20\0\0\0\1\0\0\0$\0\240}\0\200\1\200h\3770\1\320"...,
32768) = 32768
brk(0)                                  = 0x82db000
brk(0x82dd000)                          = 0x82dd000
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 19
_llseek(19, 64290816, [64290816], SEEK_SET) = 0
write(19, "\342\1\0\0<\265\200\7\24\0\0\0d\t`Q\360\177\1\200\334\377"...,
32768) = 32768
close(19)                               = 0
open("/var/lib/pgsql/data/base/495616/16605", O_RDWR|O_LARGEFILE) = 19
read(19, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 20
_llseek(20, 152731648, [152731648], SEEK_SET) = 0
write(20, "\342\1\0\0\264=\206\7\24\0\0\0\370\10\230J\360\177\1\200"...,
32768) = 32768
close(20)                               = 0
read(19, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3pq\360\177\1\200\300\363"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 20
_llseek(20, 150274048, [150274048], SEEK_SET) = 0
write(20, "\342\1\0\0\230\310\212\7\24\0\0\0\210\7lO\360\177\1\200"...,
32768)
= 32768
close(20)                               = 0
open("/var/lib/pgsql/data/base/495616/16398", O_RDWR|O_LARGEFILE) = 20
read(20, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3`_\0\200\1\200\334\377H"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 21
_llseek(21, 260046848, [260046848], SEEK_SET) = 0
write(21, "\342\1\0\0\4\322\220\7\24\0\0\0\264\2\320r\360\177\1\200"...,
32768) = 32768
close(21)                               = 0
open("/var/lib/pgsql/data/base/495616/16639", O_RDWR|O_LARGEFILE) = 21
read(21, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 22
_llseek(22, 174424064, [174424064], SEEK_SET) = 0
write(22, "\342\1\0\0\200\\\225\7\24\0\0\0D\t$H\360\177\1\200\330"...,
32768)
= 32768
close(22)                               = 0
read(21, "\0\0\0\0\20\0\0\0\1\0\0\0\200\t\254c\360\177\1\200\344"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 22
_llseek(22, 109084672, [109084672], SEEK_SET) = 0
write(22, "\342\1\0\0\310\335\226\7\24\0\0\0 \10\250O\360\177\1\200"...,
32768) = 32768
close(22)                               = 0
open("/var/lib/pgsql/data/base/495616/16392", O_RDWR|O_LARGEFILE) = 22
read(22, "\0\0\0\0\20\0\0\0\1\0\0\0X\3\350\3\0\200\1\200h\3770\1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 23
_llseek(23, 200900608, [200900608], SEEK_SET) = 0
write(23, "\342\1\0\0\314\344\232\7\24\0\0\0\344\7\304G\360\177\1"...,
32768)
= 32768
close(23)                               = 0
open("/var/lib/pgsql/data/base/495616/16606", O_RDWR|O_LARGEFILE) = 23
read(23, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 24
_llseek(24, 85426176, [85426176], SEEK_SET) = 0
write(24, "\342\1\0\0\30\345\232\7\24\0\0\0\264\7\360V\360\177\1\200"...,
32768) = 32768
close(24)                               = 0
read(23, "\0\0\0\0\20\0\0\0\1\0\0\0H\1 {\360\177\1\200P\377 \0@\377"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 24
_llseek(24, 156729344, [156729344], SEEK_SET) = 0
write(24, "\342\1\0\0\260e\233\7\24\0\0\0\30\n\334M\360\177\1\200"...,
32768)
= 32768
close(24)                               = 0
open("/var/lib/pgsql/data/base/495616/16400", O_RDWR|O_LARGEFILE) = 24
read(24, "\0\0\0\0\20\0\0\0\1\0\0\0H\1,u\0\200\1\200\334\377H\0\270"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 25
_llseek(25, 92995584, [92995584], SEEK_SET) = 0
write(25, "\342\1\0\0\244i\235\7\24\0\0\0\360\ttO\360\177\1\200\324"...,
32768) = 32768
close(25)                               = 0
open("/var/lib/pgsql/data/base/495616/16607", O_RDWR|O_LARGEFILE) = 25
read(25, "\0\0\0\0\264\341\v\t\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 26
_llseek(26, 209387520, [209387520], SEEK_SET) = 0
write(26, "\342\1\0\0<m\237\7\24\0\0\0\\\7\214H\360\177\1\200\320"...,
32768)
= 32768
close(26)                               = 0
read(25, "N\1\0\0X\227`\236\23\0\0\0\334\0\320|\360\177\1\200\340"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 26
_llseek(26, 108363776, [108363776], SEEK_SET) = 0
write(26, "\342\1\0\0\334\375\251\7\24\0\0\0\24\10`K\360\177\1\200"...,
32768)
= 32768
close(26)                               = 0
brk(0)                                  = 0x82dd000
brk(0x82de000)                          = 0x82de000
open("/var/lib/pgsql/data/base/495616/16384", O_RDWR|O_LARGEFILE) = 26
read(26, "N\1\0\0\244\1H\332\23\0\0\0\360\0\244N\0\200\1\200\270"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 27
_llseek(27, 85000192, [85000192], SEEK_SET) = 0
write(27, "\342\1\0\0\364\0\254\7\24\0\0\0008\txQ\360\177\1\200\334"...,
32768) = 32768
close(27)                               = 0
read(11, "\334\1\0\0\344\3422.\23\0\0\0t\1\320e\360\177\1\200\244"...,
32768)
= 32768
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
=
0x404b8000
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 27
_llseek(27, 255524864, [255524864], SEEK_SET) = 0
write(27, "\342\1\0\0\34\3\256\7\24\0\0\0\374\10\200J\360\177\1\200"...,
32768) = 32768
close(27)                               = 0
open("/var/lib/pgsql/data/base/495616/16640", O_RDWR|O_LARGEFILE) = 27
read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 202047488, [202047488], SEEK_SET) = 0
write(28, "\342\1\0\0\240\20\263\7\24\0\0\0\220\7\30H\360\177\1\200"...,
32768) = 32768
close(28)                               = 0
_llseek(27, 98304, [98304], SEEK_SET)   = 0
read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\34\0\224\177\360\177\1\200\350"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 141524992, [141524992], SEEK_SET) = 0
write(28, "\342\1\0\0$\36\274\7\24\0\0\0p\10(L\360\177\1\200\324\377"...,
32768) = 32768
close(28)                               = 0
_llseek(27, 32768, [32768], SEEK_SET)   = 0
read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\314\5\330\7\360\177\1\200\234"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 149422080, [149422080], SEEK_SET) = 0
write(28, "\342\1\0\0\250\36\274\7\24\0\0\0\214\3\230n\360\177\1\200"...,
32768) = 32768
close(28)                               = 0
_llseek(22, 65536, [65536], SEEK_SET)   = 0
read(22, "\0\0\0\0\20\0\0\0\1\0\0\0\370\2(\22\0\200\1\200h\3770\1"...,
32768)
= 32768
brk(0)                                  = 0x82de000
brk(0x82e2000)                          = 0x82e2000
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 125075456, [125075456], SEEK_SET) = 0
write(28, "\342\1\0\0\0\237\274\7\24\0\0\0<\t\0I\360\177\1\200\330"...,
32768)
= 32768
close(28)                               = 0
read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\314\3\3301\360\177\1\200\234"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 199000064, [199000064], SEEK_SET) = 0
write(28, "\342\1\0\0\304&\301\7\24\0\0\0\310\nlJ\360\177\1\200\334"...,
32768) = 32768
close(28)                               = 0
_llseek(22, 32768, [32768], SEEK_SET)   = 0
read(22, "\0\0\0\0\20\0\0\0\1\0\0\0X\3\350\3\0\200\1\200h\3770\1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 28
_llseek(28, 146145280, [146145280], SEEK_SET) = 0
write(28, "\342\1\0\0\224\252\303\7\24\0\0\0\200\2 r\360\177\1\200"...,
32768)
= 32768
close(28)                               = 0
open("/var/lib/pgsql/data/base/495616/16652", O_RDWR|O_LARGEFILE) = 28
read(28, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 29
_llseek(29, 244252672, [244252672], SEEK_SET) = 0
write(29, "\342\1\0\0\0003\310\7\24\0\0\0\260\0074H\360\177\1\200"...,
32768)
= 32768
close(29)                               = 0
read(28, "\340\1\0\0\30*\262\241\24\0\0\0\210\4\224r\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 29
_llseek(29, 2228224, [2228224], SEEK_SET) = 0
write(29, "\342\1\0\0h\264\310\7\24\0\0\0\34\10\300O\360\177\1\200"...,
32768)
= 32768
close(29)                               = 0
open("/var/lib/pgsql/data/base/495616/1247", O_RDWR|O_LARGEFILE) = 29
read(29, "\0\0\0\0\244\5\201\0\v\0\0\0H\3\224\3\0\200\1\200h\377"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 30
_llseek(30, 20316160, [20316160], SEEK_SET) = 0
write(30, "\342\1\0\0@\270\312\7\24\0\0\0P\10\210N\360\177\1\200\330"...,
32768) = 32768
close(30)                               = 0
open("/var/lib/pgsql/data/base/495616/16612", O_RDWR|O_LARGEFILE) = 30
read(30, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 31
_llseek(31, 12058624, [12058624], SEEK_SET) = 0
write(31, "\342\1\0\0\340\301\320\7\24\0\0\0l\7 N\360\177\1\200\334"...,
32768) = 32768
close(31)                               = 0
read(30, "\0\0\0\0\20\0\0\0\1\0\0\0\320\2\0u\360\177\1\200p\372 "..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 31
_llseek(31, 281968640, [281968640], SEEK_SET) = 0
write(31, "\342\1\0\0$\317\331\7\24\0\0\0\270\1,w\360\177\1\200\334"...,
32768) = 32768
close(31)                               = 0
open("/var/lib/pgsql/data/base/495616/16418", O_RDWR|O_LARGEFILE) = 31
read(31, "\0\0\0\0\20\0\0\0\1\0\0\0\320\2\354a\0\200\1\200\324\377"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 32
_llseek(32, 336003072, [336003072], SEEK_SET) = 0
write(32, "\342\1\0\0\264Z\340\7\24\0\0\0D\0104H\360\177\1\200\330"...,
32768)
= 32768
close(32)                               = 0
open("/var/lib/pgsql/data/base/495616/16641", O_RDWR|O_LARGEFILE) = 32
read(32, "\0\0\0\0\\\242I\0\10\0\0\0\24\0\360\177\360\177\1\200b"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 33
_llseek(33, 138903552, [138903552], SEEK_SET) = 0
write(33, "\342\1\0\0\300u\355\7\24\0\0\0\334\4\10h\360\177\1\200"...,
32768)
= 32768
close(33)                               = 0
_llseek(32, 98304, [98304], SEEK_SET)   = 0
read(32, "\0\0\0\0\\\242I\0\10\0\0\0\34\0\334\177\360\177\1\200\350"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 33
_llseek(33, 6062080, [6062080], SEEK_SET) = 0
write(33, "\342\1\0\0t~\360\7\24\0\0\0\4\10PP\360\177\1\200\330\377"...,
32768) = 32768
close(33)                               = 0
_llseek(32, 32768, [32768], SEEK_SET)   = 0
read(32, "\0\0\0\0\200\20\276\0\v\0\0\0\0\17,S\360\177\1\200\344"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 33
_llseek(33, 17596416, [17596416], SEEK_SET) = 0
write(33, "\342\1\0\0\314\376\360\7\24\0\0\0\24\10\360O\360\177\1"...,
32768)
= 32768
close(33)                               = 0
open("/var/lib/pgsql/data/base/495616/1255", O_RDWR|O_LARGEFILE) = 33
_llseek(33, 458752, [458752], SEEK_SET) = 0
read(33, "\0\0\0\0\270\10\276\0\v\0\0\0\300\1\370\1\0\200\1\2000"..., 32768)
=
32768
brk(0)                                  = 0x82e2000
brk(0x82e4000)                          = 0x82e4000
brk(0)                                  = 0x82e4000
brk(0x82e6000)                          = 0x82e6000
brk(0)                                  = 0x82e6000
brk(0x82e7000)                          = 0x82e7000
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 34
_llseek(34, 242122752, [242122752], SEEK_SET) = 0
write(34, "\342\1\0\0\340\201\362\7\24\0\0\0\224\2\334r\360\177\1"...,
32768)
= 32768
close(34)                               = 0
read(15, "\0\0\0\0\320r\37\0\5\0\0\0\364\3\0\4\0\200\1\200\200\377"...,
32768)
= 32768
brk(0)                                  = 0x82e7000
brk(0x82e8000)                          = 0x82e8000
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 34
_llseek(34, 242810880, [242810880], SEEK_SET) = 0
write(34, "\342\1\0\0d\202\362\7\24\0\0\0P\3\304o\360\177\1\200\334"...,
32768) = 32768
close(34)                               = 0
open("/var/lib/pgsql/data/base/495616/16629", O_RDWR|O_LARGEFILE) = 34
read(34, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 35
_llseek(35, 33292288, [33292288], SEEK_SET) = 0
write(35, "\342\1\0\0\234\231\375\7\24\0\0\0(\10xO\360\177\1\200\330"...,
32768) = 32768
close(35)                               = 0
read(34, "\340\1\0\0\244\204\264\241\24\0\0\0H\2Ty\360\177\1\200"..., 32768)
=
32768
brk(0)                                  = 0x82e8000
brk(0x82e9000)                          = 0x82e9000
brk(0)                                  = 0x82e9000
brk(0x82eb000)                          = 0x82eb000
brk(0)                                  = 0x82eb000
brk(0x82ec000)                          = 0x82ec000
brk(0)                                  = 0x82ec000
brk(0x82ed000)                          = 0x82ed000
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 35
_llseek(35, 4456448, [4456448], SEEK_SET) = 0
write(35, "\342\1\0\0\364\31\376\7\24\0\0\0\34\10\300O\360\177\1\200"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/16647", O_RDWR|O_LARGEFILE) = 36
read(36, "\0\0\0\0\4\307}\0\v\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
brk(0)                                  = 0x82ed000
brk(0x82ee000)                          = 0x82ee000
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 37
_llseek(37, 265158656, [265158656], SEEK_SET) = 0
write(37, "\342\1\0\0@\34\377\7\24\0\0\0\224\7lG\360\177\1\200\324"...,
32768)
= 32768
read(36, "\336\1\0\0000\327V\272\24\0\0\0\210\5 j\360\177\1\200@"..., 32768)
=
32768
_llseek(35, 161415168, [161415168], SEEK_SET) = 0
write(35, "\342\1\0\0p\35\0\10\24\0\0\0D\t\320H\360\177\1\200\330"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/16408", O_RDWR|O_LARGEFILE) = 38
read(38, "\336\1\0\0\224\273V\272\24\0\0\0H\2h\2\0\200\1\2004\377"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 39
_llseek(39, 133332992, [133332992], SEEK_SET) = 0
write(39, "\342\1\0\0\34\340(\10\24\0\0\0p\3\334l\360\177\1\200\330"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/16604", O_RDWR|O_LARGEFILE) = 40
read(40, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
_llseek(39, 244875264, [244875264], SEEK_SET) = 0
write(39, "\342\1\0\0\264\343*\10\24\0\0\0L\2Ht\360\177\1\200\334"...,
32768)
= 32768
read(40, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3pq\360\177\1\200\340\366"...,
32768)
= 32768
_llseek(37, 119635968, [119635968], SEEK_SET) = 0
write(37, "\342\1\0\0\350k.\10\24\0\0\0\334\6(N\360\177\1\200\324"...,
32768)
= 32768
_llseek(38, 65536, [65536], SEEK_SET)   = 0
read(38, "\336\1\0\0\214\272V\272\24\0\0\0\334\0\370\0\0\200\1\200"...,
32768)
= 32768
_llseek(37, 103841792, [103841792], SEEK_SET) = 0
write(37, "\342\1\0\0\24t3\10\24\0\0\0\300\6@M\360\177\1\200\324\377"...,
32768) = 32768
_llseek(38, 32768, [32768], SEEK_SET)   = 0
read(38, "\336\1\0\0\260\325V\272\24\0\0\0000\2\200\2\0\200\1\200"...,
32768)
= 32768
brk(0)                                  = 0x82ee000
brk(0x82f0000)                          = 0x82f0000
brk(0)                                  = 0x82f0000
brk(0x82f2000)                          = 0x82f2000
gettimeofday({1079482178, 920849}, NULL) = 0
brk(0)                                  = 0x82f2000
brk(0x82f4000)                          = 0x82f4000
brk(0)                                  = 0x82f4000
brk(0x82f6000)                          = 0x82f6000
brk(0)                                  = 0x82f6000
brk(0x82fa000)                          = 0x82fa000
brk(0)                                  = 0x82fa000
brk(0x8302000)                          = 0x8302000
_llseek(37, 79331328, [79331328], SEEK_SET) = 0
write(37, "\342\1\0\0\200\3747\10\24\0\0\0\300\0068N\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/16653", O_RDWR|O_LARGEFILE) = 41
read(41, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 42
_llseek(42, 262144, [262144], SEEK_SET) = 0
write(42, "\342\1\0\0000\3758\10\24\0\0\0\0\6Xb\360\177\1\200\320"...,
32768)
= 32768
read(41, "\340\1\0\0\224*\262\241\24\0\0\0\210\4T+\360\177\1\200"..., 32768)
=
32768
brk(0)                                  = 0x8302000
brk(0x8304000)                          = 0x8304000
gettimeofday({1079482178, 957454}, NULL) = 0
gettimeofday({1079482178, 957580}, NULL) = 0
send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988,
0)
= 988
send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988,
0)
= 988
send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988,
0)
= 988
send(6, "\4\0\0\0\274\1\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\6\0"..., 444, 0)
=
444
send(9, "T\0\0\0#\0\1QUERY PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377"..., 394,
0) = 394
recv(9, "X\0\0\0\4", 8192, 0)           = 5
exit_group(0)                           = ?

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> `strace -p 21882` run behind the below query and plan ... below that.

Hmm ... that took 20 seconds eh?

It is a fairly interesting trace.  It shows that the backend needed to
read 63 system catalog pages (that weren't already in shared memory),
which is not too unreasonable I think ... though I wonder if more of
them shouldn't have been in memory already.  The odd thing is that for
*every single read* it was necessary to first dump out a dirty page
in order to make a buffer free.  That says you are running with the
entire contents of shared buffer space dirty at all times.  That's
probably not the regime you want to be operating in.  I think we already
suggested increasing shared_buffers.  You might also want to think about
not using such a large checkpoint interval.  (The background-writing
logic already committed for 7.5 should help this problem, but it's not
there in 7.4.)

Another interesting fact is that the bulk of the writes were "blind
writes", involving an open()/write()/close() sequence instead of keeping
the open file descriptor around for re-use.  This is not too surprising
in a freshly started backend, I guess; it's unlikely to have had reason
to create a relation descriptor for the relations it may have to dump
pages for.  In some Unixen, particularly Solaris, open() is fairly
expensive and so blind writes are bad news.  I didn't think it was a big
problem in Linux though.  (This is another area we've improved for 7.5:
there are no more blind writes.  But that won't help you today.)

What's not immediately evident is whether the excess I/O accounted for
all of the slowdown.  Could you retry the strace with -r and -T options
so we can see how much time is being spent inside and outside the
syscalls?

            regards, tom lane

Re: atrocious update performance

From
"Aaron Werman"
Date:
Quick observations:

1. We have an explanation for what's going on, based on the message being
exactly 666 lines long :-)
2. I'm clueless on the output, but perhaps Tom can see something. A quick
glance shows that the strace seemed to run 27 seconds, during which it did:
  count| call
  -------|---------
      84 | _llseek
      40 | brk
      54 | close
      88 | open
      63 | read
in other words, nothing much (though it did *a lot* of opens and closes of
db files to do nothing ).

Can you do another strace for a few minutes against the actual update query
adding the -c/-t options and control-c out?

----- Original Message -----
From: "Rosser Schwarz" <rschwarz@totalcardinc.com>
To: "'Tom Lane'" <tgl@sss.pgh.pa.us>
Cc: <pgsql-performance@postgresql.org>
Sent: Tuesday, March 16, 2004 7:20 PM
Subject: Re: [PERFORM] atrocious update performance


while you weren't looking, Tom Lane wrote:

[trace]

`strace -p 21882` run behind the below query and plan ... below that.

# explain update account.cust set prodid = tempprod.prodid, subprodid =
tempprod.subprodid where origid = tempprod.debtid;

Re: atrocious update performance

From
Richard Huxton
Date:
On Tuesday 16 March 2004 00:08, Tom Lane wrote:
>
> I'm inclined to suspect an issue with foreign-key checking.  You didn't
> give us any details about foreign key relationships your "cust" table is
> involved in --- could we see those?  And the schemas of the other tables
> involved?

Two questions Tom:
1. Do the stats tables record FK checks, or just explicit table accesses?
2. If not, should they?

If the only real activity is this update then simple before/after views of the
stats might be revealing.

--
  Richard Huxton
  Archonet Ltd

Re: atrocious update performance

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> Two questions Tom:
> 1. Do the stats tables record FK checks, or just explicit table accesses?

The stats record everything, IIRC.

> If the only real activity is this update then simple before/after
> views of the stats might be revealing.

That's quite a good thought, though since Rosser's system is live it
might be hard to get a view of just one query's activity.

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> What's not immediately evident is whether the excess I/O accounted for
> all of the slowdown.  Could you retry the strace with -r and -T options
> so we can see how much time is being spent inside and outside the
> syscalls?

Unlike the previous run (this is a trace of the explain), this one went
immediately.  No delay.

I also have, per Aaron's request, a trace -ct against the backend running
the explain analyze.  I killed it well before "a few minutes"; it's just
shy of 900K.  I don't think I'll be forwarding that on to the list, though
I can put it up on a web server somewhere easily enough.

Try <http://www.totalcardinc.com/pg/postmaster.trace>.

# `strace -rT -p 25075`
     0.000000 read(0, "\r", 1)          = 1 <5.514983>
     5.516215 write(1, "\n", 1)         = 1 <0.000034>
     0.000545 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013>
     0.000200 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig icanon echo ...})
= 0
 <0.000032>
     0.000162 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000013>
     0.000120 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART,
0x420
276f8}, {0x401ec910, [], SA_RESTORER, 0x420276f8}, 8) = 0 <0.000015>
     0.000154 rt_sigaction(SIGTERM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000014>
     0.000136 rt_sigaction(SIGQUIT, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000134 rt_sigaction(SIGALRM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000012>
     0.000164 rt_sigaction(SIGTSTP, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000140 rt_sigaction(SIGTTOU, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000135 rt_sigaction(SIGTTIN, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000135 rt_sigaction(SIGWINCH, {SIG_DFL}, {0x401ec9d0, [],
SA_RESTORER, 0x
420276f8}, 8) = 0 <0.000014>
     0.000250 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 <0.000013>
     0.000138 send(3, "Q\0\0\0}explain update account.cust"..., 126, 0) =
126 <0
.000032>
     0.000164 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 <0.000013>
     0.000132 poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) =
1 <
0.222093>
     0.222388 recv(3, "T\0\0\0#\0\1QUERY
PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377
"..., 16384, 0) = 394 <0.000031>
     0.000360 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo
...})
 = 0 <0.000019>
     0.000137 ioctl(1, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo
...})
 = 0 <0.000013>
     0.000135 ioctl(1, TIOCGWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0,
ws_ypixel
=0}) = 0 <0.000015>
     0.000175 write(1, "                                "..., 92) = 92
<0.000038
>
     0.000184 write(1, "--------------------------------"..., 92) = 92
<0.000025
>
     0.000154 write(1, " Merge Join  (cost=0.00..232764."..., 59) = 59
<0.000023
>
     0.000136 write(1, "   Merge Cond: ((\"outer\".origid)"..., 65) = 65
<0.0000
23>
     0.000134 write(1, "   ->  Index Scan using ix_origi"..., 88) = 88
<0.000025
>
     0.000129 write(1, "   ->  Index Scan using ix_debti"..., 91) = 91
<0.000025
>
     0.000136 write(1, "(4 rows)\n", 9) = 9 <0.000022>
     0.000116 write(1, "\n", 1)         = 1 <0.000021>
     0.000144 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000013>
     0.000121 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART,
0x420
276f8}, {0x804d404, [], SA_RESTORER|SA_RESTART, 0x420276f8}, 8) = 0
<0.000015>
     0.000208 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013>
     0.000129 ioctl(0, TIOCGWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0,
ws_ypixel
=0}) = 0 <0.000013>
     0.000102 ioctl(0, TIOCSWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0,
ws_ypixel
=0}) = 0 <0.000014>
     0.000105 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo
...})
 = 0 <0.000013>
     0.000127 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig -icanon -echo
...}) =
 0 <0.000028>
     0.000147 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000012>
     0.000114 rt_sigaction(SIGINT, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {0
x804d404, [], SA_RESTORER|SA_RESTART, 0x420276f8}, 8) = 0 <0.000012>
     0.000149 rt_sigaction(SIGTERM, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000013>
     0.000136 rt_sigaction(SIGQUIT, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000012>
     0.000136 rt_sigaction(SIGALRM, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000012>
     0.000136 rt_sigaction(SIGTSTP, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000013>
     0.000136 rt_sigaction(SIGTTOU, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000012>
     0.000136 rt_sigaction(SIGTTIN, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.000013>
     0.000212 rt_sigaction(SIGWINCH, {0x401ec9d0, [], SA_RESTORER,
0x420276f8},
{SIG_DFL}, 8) = 0 <0.000012>
     0.000188 write(1, "\r\rtci=# \rtci=# ", 15) = 15 <0.000019>
     0.000112 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000012>
     0.000110 read(0, "\\", 1)          = 1 <18.366895>
    18.368284 write(1, "\rtci=# \\\rtci=# \\", 16) = 16 <0.000029>
     0.000134 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000013>
     0.000125 read(0, "q", 1)           = 1 <0.117572>
     0.117719 write(1, "\rtci=# \\q\rtci=# \\q", 18) = 18 <0.000020>
     0.000118 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000012>
     0.000107 read(0, "\r", 1)          = 1 <1.767409>
     1.767604 write(1, "\n", 1)         = 1 <0.000032>
     0.000140 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013>
     0.000138 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig icanon echo ...})
= 0
 <0.000030>
     0.000143 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000013>
     0.000111 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART,
0x420
276f8}, {0x401ec910, [], SA_RESTORER, 0x420276f8}, 8) = 0 <0.000014>
     0.000153 rt_sigaction(SIGTERM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000134 rt_sigaction(SIGQUIT, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000134 rt_sigaction(SIGALRM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000133 rt_sigaction(SIGTSTP, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000134 rt_sigaction(SIGTTOU, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000013>
     0.000134 rt_sigaction(SIGTTIN, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.000012>
     0.000134 rt_sigaction(SIGWINCH, {SIG_DFL}, {0x401ec9d0, [],
SA_RESTORER, 0x
420276f8}, 8) = 0 <0.000014>
     0.001271 rt_sigaction(SIGINT, {SIG_DFL}, {0x804d404, [],
SA_RESTORER|SA_RES
TART, 0x420276f8}, 8) = 0 <0.000013>
     0.000532 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 <0.000014>
     0.000145 send(3, "X\0\0\0\4", 5, 0) = 5 <0.000028>
     0.000126 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 <0.000013>
     0.000140 close(3)                  = 0 <0.000033>
     0.000147 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_DFL}, 8) = 0 <0.000013>
     0.000197 open("/var/lib/pgsql/.psql_history", O_WRONLY|O_CREAT|O_TRUNC,
060
0) = 3 <0.000168>
     0.000694 write(3, "\\d payment.batch\nalter sequence "..., 16712) =
16712 <
0.000209>
     0.000311 close(3)                  = 0 <0.000057>
     0.055587 munmap(0x40030000, 4096)  = 0 <0.000032>
     0.000130 exit_group(0)             = ?

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
>> Could you retry the strace with -r and -T options

> Unlike the previous run (this is a trace of the explain), this one went
> immediately.  No delay.

Hm.  It looks like you mistakenly traced psql rather than the backend,
but since the delay went away we wouldn't have learned anything anyhow.
Have you got any idea what conditions may have changed between seeing
delay and not seeing delay?

> I also have, per Aaron's request, a trace -ct against the backend running
> the explain analyze.  I killed it well before "a few minutes"; it's just
> shy of 900K.  I don't think I'll be forwarding that on to the list, though
> I can put it up on a web server somewhere easily enough.
> Try <http://www.totalcardinc.com/pg/postmaster.trace>.

This is pretty odd too.  It looks like it's doing checkpoints every so
often (look for the writes to pg_control), which a backend engaged in
a long-running query surely ought not be doing.  Need to think about
why that might be...

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> Hm.  It looks like you mistakenly traced psql rather than the backend,
> but since the delay went away we wouldn't have learned
> anything anyhow.
> Have you got any idea what conditions may have changed between seeing
> delay and not seeing delay?

None, offhand.  I have noticed that when a large query is running,
the machine can sporadically just freeze--or at least take inordinately
long for some other process, be it top or ls, another query, or whatever
to start.  Nothing looks saturated when it happens, and, while you can
count on it to happen, it's not consistent enough to reproduce.

> This is pretty odd too.  It looks like it's doing checkpoints every so
> often (look for the writes to pg_control), which a backend engaged in
> a long-running query surely ought not be doing.  Need to think about
> why that might be...

Does the fact that all the reads and writes are 32K mean anything out
of the ordinary?  $PGSRC/src/include/pg_config_manual.h has BLCKSZ
#defined to 16384.  I was running previously with a 32K BLCKSZ, but
that turned out to be rather sub-optimal for as heavily indexed as our
tables are.  I've dumped and rebuilt several times since then.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
> while you weren't looking, Tom Lane wrote:
>> Have you got any idea what conditions may have changed between seeing
>> delay and not seeing delay?

> None, offhand.  I have noticed that when a large query is running,
> the machine can sporadically just freeze--or at least take inordinately
> long for some other process, be it top or ls, another query, or whatever
> to start.  Nothing looks saturated when it happens, and, while you can
> count on it to happen, it's not consistent enough to reproduce.

Interesting.  You should leave "vmstat 1" running in the background and
see if you can correlate these freezes with bursts of disk I/O or swap.
I saw a couple of delays in your big strace that seemed odd --- a couple
of one-second-plus intervals, and a four-second-plus interval, with no
obvious reason for them.  Perhaps the same issue?

> Does the fact that all the reads and writes are 32K mean anything out
> of the ordinary?  $PGSRC/src/include/pg_config_manual.h has BLCKSZ
> #defined to 16384.  I was running previously with a 32K BLCKSZ, but
> that turned out to be rather sub-optimal for as heavily indexed as our
> tables are.  I've dumped and rebuilt several times since then.

I hate to break it to you, but that most definitely means you are
running with BLCKSZ = 32K.  Whatever you thought you were rebuilding
didn't take effect.

I agree that the larger blocksize is of dubious value.  People used to
do that back when the blocksize limited your row width, but these days
I think you're probably best off with the standard 8K.

Another thing that's fairly striking is the huge bursts of WAL activity
--- your trace shows that the thing is writing entire WAL segments (16
MB) at one go, rather than dribbling it out a page or two at a time as
the code is intended to do.  I think what is happening is that you have
wal_buffers = 1024 (correct?) yielding 32MB of WAL buffers, and since
there are no other transactions happening, nothing gets written until
you hit the "write when the buffers are half full" heuristic.  I would
suggest knocking wal_buffers down to something closer to the default
(maybe 4 or 8 buffers) to reduce these I/O storms.  (Memo to hackers:
we need to see to it that the new background writer process takes some
responsibility for pushing out filled WAL buffers, not only data
buffers.)

If the big EXPLAIN ANALYZE is still running, would you get a dump of its
open files (see "lsof -p") and correlate those with the tables being
used in the query?  I'm trying to figure out what the different writes
and reads represent.

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Tom Lane wrote:

> I hate to break it to you, but that most definitely means you are
> running with BLCKSZ = 32K.  Whatever you thought you were rebuilding
> didn't take effect.

I saw that and thought so.  The other day, I was rooting around in
$PGDATA, and saw a lot of 32K files and wondered for a moment, too.
If that's the case, though, that's ... weird.

> I agree that the larger blocksize is of dubious value.  People used to
> do that back when the blocksize limited your row width, but these days
> I think you're probably best off with the standard 8K.

I'd been experimenting with larger blocksizes after we started seeing
a lot of seqscans in query plans.  32K proved quickly that it hurts
index scan performance, so I was--I thought--trying 16.

> If the big EXPLAIN ANALYZE is still running, would you get a dump of its
> open files (see "lsof -p") and correlate those with the tables being
> used in the query?  I'm trying to figure out what the different writes
> and reads represent.

It looks rather like it's hitting the foreign keys; one of the files
that shows is the account.note table, which has an fk to the pk of the
table being updated.  The file's zero size, but it's open.  The only
reason it should be open is if foreign keys are being checked, yes?

You'd said that the foreign keys were only checked if last-change is
after current-query, as of 7.3.4, yes?  `rpm -qa postgresql` comes up
with 7.3.2-3, which makes no sense, 'cos I know I removed it before
installing current; I remember making sure no-one was using pg on this
machine, and remember saying rpm -e.

Regardless, something thinks it's still there.  Is there any way that
it is, and that I've somehow been running 7.3.2 all along?  `which
psql`, &c show the bindir from my configure, but I'm not sure that's
sufficient.

How would I tell?  I don't remember any of the binaries having a
--version argument.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
"Rosser Schwarz"
Date:
I wrote:

> Regardless, something thinks it's still there.  Is there any way that
> it is, and that I've somehow been running 7.3.2 all along?  `which
> psql`, &c show the bindir from my configure, but I'm not sure that's
> sufficient.

The weird thing is that I know I never built 7.3.anything with 32K
BLCKSZ, never built 7.3.anything at all.  If 7.3 were installed, would
it have any problem reading a 7.4 cluster?

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
Tom Lane
Date:
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes:
>> Regardless, something thinks it's still there.  Is there any way that
>> it is, and that I've somehow been running 7.3.2 all along?  `which
>> psql`, &c show the bindir from my configure, but I'm not sure that's
>> sufficient.

"select version()" is the definitive test for backend version.

> The weird thing is that I know I never built 7.3.anything with 32K
> BLCKSZ, never built 7.3.anything at all.  If 7.3 were installed, would
> it have any problem reading a 7.4 cluster?

7.3 would refuse to start on a 7.4 cluster, and vice versa.

            regards, tom lane

Re: atrocious update performance

From
Greg Spiegelberg
Date:
I've been following this thread closely as I have the same problem
with an UPDATE.  Everything is identical here right down to the
strace output.

Has anyone found a workaround or resolved the problem?  If not,
I have test systems here which I can use to help up test and explore.

Greg

--
Greg Spiegelberg
  Sr. Product Development Engineer
  Cranel, Incorporated.
  Phone: 614.318.4314
  Fax:   614.431.8388
  Email: gspiegelberg@Cranel.com
Cranel. Technology. Integrity. Focus.



Re: atrocious update performance

From
"Rosser Schwarz"
Date:
Greg Spiegelberg wrote:

> I've been following this thread closely as I have the same problem
> with an UPDATE.  Everything is identical here right down to the
> strace output.

> Has anyone found a workaround or resolved the problem?  If not,
> I have test systems here which I can use to help up test and explore.

I'm still gathering data.  The explain analyze I'd expected to finish
Thursday afternoon hasn't yet.  I'm going to kill it and try a few
smaller runs, increasing in size, until the behavior manifests.

Will advise.

/rls


Re: atrocious update performance

From
Greg Spiegelberg
Date:
Rosser Schwarz wrote:
> Greg Spiegelberg wrote:
>
>
>>I've been following this thread closely as I have the same problem
>>with an UPDATE.  Everything is identical here right down to the
>>strace output.
>
>
>>Has anyone found a workaround or resolved the problem?  If not,
>>I have test systems here which I can use to help up test and explore.
>
>
> I'm still gathering data.  The explain analyze I'd expected to finish
> Thursday afternoon hasn't yet.  I'm going to kill it and try a few
> smaller runs, increasing in size, until the behavior manifests.
>
> Will advise.

I've replaced my atrocious UPDATE with the following.

begin;
-- Drop all contraints
alter table ORIG drop constraint ...;
-- Drop all indexes
drop index ...;
-- Update
update ORIG set column=... where...;
commit;

Problem is when I recreate the indexes and add the constraints back
on ORIG I end up with the same long running process.  The original
UPDATE runs for about 30 minutes on a table of 400,000 with the
WHERE matching about 70% of the rows.  The above runs for about 2
minutes without adding the constraints or indexes however adding the
constraints and creating the dropped indexes negates any gain.

RedHat 7.3 + Kernel 2.4.24 + ext3 + PostgreSQL 7.3.5
Dual PIII 1.3'ishGHz, 2GB Memory
U160 OS drives and a 1Gbps test SAN on a Hitachi 9910

Greg

--
Greg Spiegelberg
  Sr. Product Development Engineer
  Cranel, Incorporated.
  Phone: 614.318.4314
  Fax:   614.431.8388
  Email: gspiegelberg@Cranel.com
Cranel. Technology. Integrity. Focus.



Re: atrocious update performance

From
Tom Lane
Date:
Greg Spiegelberg <gspiegelberg@cranel.com> writes:
> RedHat 7.3 + Kernel 2.4.24 + ext3 + PostgreSQL 7.3.5
                                      ^^^^^^^^^^^^^^^^

Please try 7.4.

            regards, tom lane

Re: atrocious update performance

From
"Rosser Schwarz"
Date:
Greg Spiegelberg wrote:

> > Will advise.

After creating 100, 1K, 10K, 100K and 1M-row subsets of account.cust and
the corresponding rows/tables with foreign key constraints referring to
the table, I'm unable to reproduce the behavior at issue.

explain analyze looks like the following, showing the query run with the
join column indexed and not, respectively:

# explain analyze update test.cust100 set prodid = tempprod.prodid,
subprodid = tempprod.subprodid where origid = tempprod.debtid;
-- with index
                        QUERY PLAN
-----------------------------------------------------------------------
 Merge Join  (cost=0.00..25.64 rows=500 width=220) (actual
 time=0.241..13.091 rows=100 loops=1)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid_cust100 on cust100  (cost=0.00..11.50
       rows=500 width=204) (actual time=0.125..6.465 rows=100 loops=1)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26) (actual time=0.057..1.497 rows=101 loops=1)
 Total runtime: 34.067 ms
(5 rows)

-- without index
                        QUERY PLAN
----------------------------------------------------------------------
 Merge Join  (cost=7.32..16.71 rows=100 width=220) (actual
 time=4.415..10.918 rows=100 loops=1)
   Merge Cond: (("outer".debtid)::text = "inner"."?column22?")
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26) (actual time=0.051..1.291 rows=101 loops=1)
   ->  Sort  (cost=7.32..7.57 rows=100 width=204) (actual
       time=4.311..4.450 rows=100 loops=1)
         Sort Key: (cust100.origid)::text
         ->  Seq Scan on cust100  (cost=0.00..4.00 rows=100 width=204)
             (actual time=0.235..2.615 rows=100 loops=1)
 Total runtime: 25.031 ms
(7 rows)

With the join column indexed, it takes roughly .32ms/row on the first
four tests (100.. 100K), and about .48ms/row on 1M rows.  Without the
index, it runs 100 rows @ .25/row, 1000 @ .26, 10000 @ .27, 100000 @
.48 and .5 @ 1M rows.

In no case does the query plan reflect foreign key validation.  Failing
any other suggestions for diagnosis in the soon, I'm going to nuke the
PostgreSQL install, scour it from the machine and start from scratch.
Failing that, I'm going to come in some weekend and re-do the machine.

> Problem is when I recreate the indexes and add the constraints back
> on ORIG I end up with the same long running process.  The original
> UPDATE runs for about 30 minutes on a table of 400,000 with the
> WHERE matching about 70% of the rows.  The above runs for about 2
> minutes without adding the constraints or indexes however adding the
> constraints and creating the dropped indexes negates any gain.

Is this a frequently-run update?

In my experience, with my seemingly mutant install, dropping indices
and constraints to shave 14/15 off the update time would be worth the
effort.  Just script dropping, updating and recreating into one large
transaction.  It's a symptom-level fix, but re-creating the fifteen
indices on one of our 5M row tables doesn't take 28 minutes, and your
hardware looks to be rather less IO and CPU bound than ours.  I'd also
second Tom's suggestion of moving to 7.4.

/rls


Re: atrocious update performance

From
"Rosser Schwarz"
Date:
After deinstalling and scrubbing PostgreSQL from my server and doing
a clean build using a vanilla 7.4.2 tree, I'm rather more confident
that foreign key validation is at cause in my performance problems.

I recreated my schemas and ran the original update, with foreign
keys referring to the identity column of the target table.  The
update took roughly two days, as I'd predicted based on my analysis
of the previous installation.  (I can't say how long with certainty,
beyond that it finished some time between when I left work one night
and came in the next morning, the second day after starting the
query.)  I'm not sure what was wrong with the previous install, such
that the update took several days; two-ish days is long enough.

Just this morning, however, I created a copy of the target table (all
4.7M rows), with absolutely no foreign keys referring to it, and ran
the update against the copy.  That update took 2300 seconds.  The
join columns were indexed in both cases.

I'm in the process of migrating the machine to run kernel 2.6.4,
following the thread started by Gary, though I suspect that the
kernel revision is moot with respect to whether or not foreign keys
are being incorrectly validated.  I can keep the 2.4 kernel and
modules around to run using the current versions for testing
purposes, though any such work would necessarily be off-hours.

Please advise of anything I can do to help narrow down the specific
cause of the issue; I know just enough C to be mildly dangerous.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
"Kevin Barnard"
Date:
On 5 Apr 2004 at 12:05, Rosser Schwarz wrote:

> Just this morning, however, I created a copy of the target table (all
> 4.7M rows), with absolutely no foreign keys referring to it, and ran
> the update against the copy.  That update took 2300 seconds.  The
> join columns were indexed in both cases.

Have you added indexes for the custid column for tables account.acct accunt.orgacct
and note?

I haven't followed the entire thread but it you have cascading FK on those tables
without an index on the column that could cause your delay.

Kevin Barnard
SpeedFC



Re: atrocious update performance

From
"Rosser Schwarz"
Date:
while you weren't looking, Kevin Barnard wrote:

> Have you added indexes for the custid column for tables
> account.acct accunt.orgacct and note?

They were indexed in the original case, yes.  There was no
need to index them in today's test case, as that was done
purely in attempt to rule in or out foreign key validation
as the cause of the performance hit.  No foreign keys that
might be validated, no need to index the foreign key columns.

> I haven't followed the entire thread but it you have
> cascading FK on those tables without an index on the
> column that could cause your delay.

The issue is that the foreign keys are being validated at
all, when the column being referenced by those foreign keys
(account.cust.custid) is never touched.

Regardless of whether or not the referencing columns are
indexed, validating them at all--in this specific case--is
broken.  The column they refer to is never touched; they
should remain utterly ignorant of whatever happens to other
columns in the same row.

/rls

--
Rosser Schwarz
Total Card, Inc.


Re: atrocious update performance

From
"scott.marlowe"
Date:
On Mon, 5 Apr 2004, Kevin Barnard wrote:

> On 5 Apr 2004 at 12:05, Rosser Schwarz wrote:
>
> > Just this morning, however, I created a copy of the target table (all
> > 4.7M rows), with absolutely no foreign keys referring to it, and ran
> > the update against the copy.  That update took 2300 seconds.  The
> > join columns were indexed in both cases.
>
> Have you added indexes for the custid column for tables account.acct accunt.orgacct
> and note?
>
> I haven't followed the entire thread but it you have cascading FK on those tables
> without an index on the column that could cause your delay.

also make sure the fk/pk types match, or the index likely won't get used
anyway.


possible improvement between G4 and G5

From
Qing Zhao
Date:
Hi,

We have got a G5  64-bit processor to replace an old G4 32-bit
processor.  Given everything
else equal, should we see a big improvement on PG's performance?

The other question I have is that, when I tried different size for
shared_buffer ( i used 10,000,
1,000, 528, 256) and Max connections=32, it gives me error when I tried
to start PG using
pg_ctl start as postgres. It kept saying this is bigger than the system
Shared Memory.  So finally
I started PG using SystemStarter start PostgreSQL and it seems starting
OK. Any idea?


Thanks a lot!

Qing Zhao


Re: atrocious update performance

From
Stephan Szabo
Date:
On Mon, 5 Apr 2004, Rosser Schwarz wrote:

> while you weren't looking, Kevin Barnard wrote:
>
> > Have you added indexes for the custid column for tables
> > account.acct accunt.orgacct and note?
>
> They were indexed in the original case, yes.  There was no
> need to index them in today's test case, as that was done
> purely in attempt to rule in or out foreign key validation
> as the cause of the performance hit.  No foreign keys that
> might be validated, no need to index the foreign key columns.
>
> > I haven't followed the entire thread but it you have
> > cascading FK on those tables without an index on the
> > column that could cause your delay.
>
> The issue is that the foreign keys are being validated at
> all, when the column being referenced by those foreign keys
> (account.cust.custid) is never touched.
>
> Regardless of whether or not the referencing columns are
> indexed, validating them at all--in this specific case--is
> broken.  The column they refer to is never touched; they
> should remain utterly ignorant of whatever happens to other
> columns in the same row.

It shouldn't be checking the other table if the values of the key column
hadn't changed. The ri_KeysEqual check should be causing it to return just
before actually doing the check on the other table (it still does a few
things before then but nothing that should be particularly expensive). In
some simple tests on my 7.4.2 machine, this appears to work for me on pk
cascade updates. It would be interesting to know if it's actually doing
any checks for you, you might be able to poke around the triggers
(backend/utils/adt/ri_triggers.c).

Re: possible improvement between G4 and G5

From
Tom Lane
Date:
Qing Zhao <qzhao@quotefx.net> writes:
> We have got a G5 64-bit processor to replace an old G4 32-bit
> processor.  Given everything else equal, should we see a big
> improvement on PG's performance?

Nope.  Database performance typically depends on disk performance first,
and RAM size second.  A 64-bit processor might help by allowing you to
install more RAM, but you didn't say that you had.

> The other question I have is that, when I tried different size for
> shared_buffer ( i used 10,000, 1,000, 528, 256) and Max
> connections=32, it gives me error when I tried to start PG using
> pg_ctl start as postgres. It kept saying this is bigger than the
> system Shared Memory.

Out-of-the-box, Mac OS X has a very low SHMMAX limit.  See the PG admin
docs or the mail list archives about how to increase it.  You should do
this --- most people find that you want to set shared_buffers to 1000 or
10000 or so for best performance.

            regards, tom lane

Re: possible improvement between G4 and G5

From
"Jim C. Nasby"
Date:
On Tue, Apr 06, 2004 at 01:47:22AM -0400, Tom Lane wrote:
> Qing Zhao <qzhao@quotefx.net> writes:
> > We have got a G5 64-bit processor to replace an old G4 32-bit
> > processor.  Given everything else equal, should we see a big
> > improvement on PG's performance?
>
> Nope.  Database performance typically depends on disk performance first,
> and RAM size second.  A 64-bit processor might help by allowing you to
> install more RAM, but you didn't say that you had.

Memory bandwidth is a consideration too, so you might see some
performance improvements on a G5. We recently debated between Xeons and
Opterons in a new PGSQL server and a little poking around on the lists
indicated that the Opterons did perform better, presumably due to the
increased memory bandwidth. Incidentally, this is why you need about 2x
the CPUs on Sun hardware vs RS6000 hardware for database stuff (and that
gets expensive if you're paying per CPU!).
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: possible improvement between G4 and G5

From
"Aaron Werman"
Date:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Qing Zhao" <qzhao@quotefx.net>
Cc: <pgsql-performance@postgresql.org>
Sent: Tuesday, April 06, 2004 1:47 AM
Subject: Re: [PERFORM] possible improvement between G4 and G5


> Qing Zhao <qzhao@quotefx.net> writes:
> > We have got a G5 64-bit processor to replace an old G4 32-bit
> > processor.  Given everything else equal, should we see a big
> > improvement on PG's performance?
>
> Nope.  Database performance typically depends on disk performance first,
> and RAM size second.

I'm surprised by this thought. I tend to hit CPU bottlenecks more often than
I/O ones. In most applications, db I/O is a combination of buffer misses and
logging, which are both reasonably constrained. RAM size seems to me to be
the best way to improve performance, and then CPU which is needed to perform
the in-memory searching, locking, versioning,  and processing, and finally
I/O (this is not the case in small I/O subsystems - if you have less than a
dozen drives, you're easily I/O bound). I/O is often the thing I tune first,
because I can do it in place without buying hardware.

Conceptually, an RDBMS converts slow random I/O into in memory processing
and sequential logging writes. If successful, it should reduce the I/O
overhead.

/Aaron

Re: possible improvement between G4 and G5

From
Josh Berkus
Date:
Aaron,

> I'm surprised by this thought. I tend to hit CPU bottlenecks more often than
> I/O ones. In most applications, db I/O is a combination of buffer misses and
> logging, which are both reasonably constrained.

Not my experience at all.  In fact, the only times I've seen modern platforms
max out the CPU was when:
a) I had bad queries with bad plans, or
b) I had reporting queires that did a lot of calculation for display (think
OLAP).

Otherwise, on the numerous servers I administrate, RAM spikes, and I/O
bottlenecks, but the CPU stays almost flat.

Of course, most of my apps are large databases (i.e. too big for RAM) with a
heavy transaction-processing component.

What kind of applications are you running?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: possible improvement between G4 and G5

From
"Aaron Werman"
Date:

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Aaron Werman" <awerman2@hotmail.com>; "Qing Zhao" <qzhao@quotefx.net>;
"Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-performance@postgresql.org>
Sent: Tuesday, April 06, 2004 2:52 PM
Subject: Re: [PERFORM] possible improvement between G4 and G5


> Aaron,
>
> > I'm surprised by this thought. I tend to hit CPU bottlenecks more often
than
> > I/O ones. In most applications, db I/O is a combination of buffer misses
and
> > logging, which are both reasonably constrained.
>
> Not my experience at all.  In fact, the only times I've seen modern
platforms
> max out the CPU was when:
> a) I had bad queries with bad plans, or
> b) I had reporting queires that did a lot of calculation for display
(think
> OLAP).
>
> Otherwise, on the numerous servers I administrate, RAM spikes, and I/O
> bottlenecks, but the CPU stays almost flat.
>
> Of course, most of my apps are large databases (i.e. too big for RAM) with
a
> heavy transaction-processing component.
>
> What kind of applications are you running?
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>

<hot air>

I do consulting, so they're all over the place and tend to be complex. Very
few fit in RAM, but still are very buffered. These are almost all backed
with very high end I/O subsystems, with dozens of spindles with battery
backed up writethrough cache and gigs of buffers, which may be why I worry
so much about CPU. I have had this issue with multiple servers.

Consider an analysis db with 10G data. Of that, 98% of the access is read
and only 2% write (that is normal for almost anything that is not order
entry, even transaction processing with thorough cross validation). Almost
all the queries access 10%, or 1G of the data. Of the reads, they average ~3
level b-trees, with the first 2 levels certainly cached, and the last ones
often cached. Virtually all the I/O activity is logical reads against
buffer. A system with a 100 transactions which on average access 200 rows
does 98% of 200 rows x 100 transactions x 3 logical I/Os per read = 58,800
logical reads, of which actually maybe a hundred are physical reads.  It
also does 2% of 200 rows x 100 transactions x (1 table logical I/O and say 2
index logical writes) per write = 1,200 logical writes to log, of which
there are 100 transaction commit synch writes, and in reality less than that
because of queuing against logs (there are also 1,200 logical writes
deferred to checkpoint, of which it is likely to only be 40 physical writes
because of page overlaps).

Transaction processing is a spectrum between activity logging, and database
centric design. The former, where actions are stored in the database is
totally I/O bound with the engine acting as a thin layer of logical to
physical mapping. Database centric processing makes the engine a functional
server of discrete actions - and is a big CPU hog.

What my CPU tends to be doing is a combination of general processing,
complex SQL processing: nested loops and sorting and hashing and triggers
and SPs.

I'm curious about you having flat CPU, which is not my experience. Are your
apps mature and stable?

</hot air>

/Aaron

Re: possible improvement between G4 and G5

From
Josh Berkus
Date:
Aaron,

> I do consulting, so they're all over the place and tend to be complex. Very
> few fit in RAM, but still are very buffered. These are almost all backed
> with very high end I/O subsystems, with dozens of spindles with battery
> backed up writethrough cache and gigs of buffers, which may be why I worry
> so much about CPU. I have had this issue with multiple servers.

Aha, I think this is the difference.  I never seem to be able to get my
clients to fork out for adequate disk support.   They are always running off
single or double SCSI RAID in the host server; not the sort of setup you
have.

> What my CPU tends to be doing is a combination of general processing,
> complex SQL processing: nested loops and sorting and hashing and triggers
> and SPs.

I haven't noticed SPs to be particularly CPU-hoggish, more RAM.

> I'm curious about you having flat CPU, which is not my experience. Are your
> apps mature and stable?

Well, "flat" was a bit of an exaggeration ... there are spikes ... but average
CPU load is < 30%.    I think the difference is that your clients listen to
you about disk access.   Mine are all too liable to purchase a quad-Xeon
machine but with an Adaptec RAID-5 card with 4 drives, and *then* call me and
ask for advice.

As a result, most intensive operations don't tend to swamp the CPU because
they are waiting for disk.

I have noticed the limitiations on RAM for 64 vs. 32, as I find it easier to
convince a client to get 8GB RAM than four-channel RAID with 12 drives,
mostly because the former is cheaper.   Linux 2.4 + Bigmem just doesn't cut
it for making effective use of > 3GB of RAM.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: possible improvement between G4 and G5

From
Vivek Khera
Date:
>>>>> "JB" == Josh Berkus <josh@agliodbs.com> writes:

JB> Aaron,
>> I do consulting, so they're all over the place and tend to be complex. Very
>> few fit in RAM, but still are very buffered. These are almost all backed
>> with very high end I/O subsystems, with dozens of spindles with battery
>> backed up writethrough cache and gigs of buffers, which may be why I worry
>> so much about CPU. I have had this issue with multiple servers.

JB> Aha, I think this is the difference.  I never seem to be able to
JB> get my clients to fork out for adequate disk support.  They are
JB> always running off single or double SCSI RAID in the host server;
JB> not the sort of setup you have.

Even when I upgraded my system to a 14-spindle RAID5 with 128M cache
and 4GB RAM on a dual Xeon system, I still wind up being I/O bound
quite often.

I think it depends on what your "working set" turns out to be.  My
workload really spans a lot more of the DB than I can end up caching.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/