Thread: trying to use CLUSTER

trying to use CLUSTER

From
"Sahagian, David"
Date:
Version=3D9.1.7

INFO: clustering "my_cool_table"  using sequential scan and sort
INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row versions=
 in 49762 pages
Detail: 1689396 dead row versions cannot be removed yet.
CPU 9.80s/4.98u sec elapsed 175.92 sec.

INFO: clustering "my_cool_table"  using sequential scan and sort
INFO: "my_cool_table": found 7552 removable, 21732 nonremovable row version=
s in 50007 pages
Detail: 11482 dead row versions cannot be removed yet.
CPU 0.01s/0.23u sec elapsed 36.29 sec.

INFO: clustering "my_cool_table"  using index scan on "pk_cool"
INFO: "my_cool_table": found 621462 removable, 36110 nonremovable row versi=
ons in 26135 pages
Detail: 25128 dead row versions cannot be removed yet.
CPU 0.02s/0.35u sec elapsed 0.79 sec.

So my_cool_table gets inserted into (but not updated) by regular processes =
doing their smallish CRUD transactions.

Concurrently, ONE process repeatedly "sweeps" a chunk of rows from the tabl=
e every few seconds.
(ie, it does delete...returning, and then commits the sweep)
Note that if the table has not many rows, then all the rows will be swept t=
ogether.

It is possible for something to go wrong resulting in:
  the table still being filled, but no longer being swept.

When the sweeping finally gets re-started, it must now chomp down a very la=
rge table.
When it finally sweeps down to near zero rows remaining, my idea was to do =
a CLUSTER on the table.
My expectation is that a VERY SMALL percentage of the row versions would ac=
tually get written to the new table!

My hope is that a smaller heap is better, now that the rate of sweeping is =
back to the rate of filling,
with the assumption that it will stay this way 99% of the time.


Can somebody tell me why some "dead row versions cannot be removed yet" ?
I assume that means CLUSTER must write them to the new table ?

It seems very costly to do the CLUSTER, if the new table is not really goin=
g to be a tiny fraction of the old table.
Is there a way for me to discover the approx number of "non-removables" BEF=
ORE I do the CLUSTER ?
? Some pg_table query ? maybe after an analyze ?

Also, does the use of [index scan on "pk_cool"] basically depend on the rat=
io of removable/nonremovable row versions ?

Thanks,
-dvs-

Re: trying to use CLUSTER

From
Albe Laurenz
Date:
David Sahagian wrote:
> Version=3D9.1.7
>=20
> INFO: clustering "my_cool_table"  using sequential scan and sort
> INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row versio=
ns in 49762 pages
> Detail: 1689396 dead row versions cannot be removed yet.
> CPU 9.80s/4.98u sec elapsed 175.92 sec.

> My expectation is that a VERY SMALL percentage of the row versions would =
actually get written to the
> new table!

> Can somebody tell me why some "dead row versions cannot be removed yet" ?
>=20
> I assume that means CLUSTER must write them to the new table ?

I would say so.  The dead rows probably cannot be removed because
of a long running transaction.

Is there a reason why you use CLUSTER and not VACUUM FULL?
Does VACUUM FULL show the same symptoms (dead row versions
cannot be removed yet)?

> Is there a way for me to discover the approx number of "non-removables" B=
EFORE I do the CLUSTER ?
> ? Some pg_table query ? maybe after an analyze ?

SELECT n_live_tup, n_dead_tup
FROM pg_stat_all_tables
WHERE relname =3D 'my_cool_table';

> Also, does the use of [index scan on "pk_cool"] basically depend on the r=
atio of
> removable/nonremovable row versions ?

I guess it will use whatever is cheaper (faster).

Yours,
Laurenz Albe

trying to use CLUSTER

From
Jeff Janes
Date:
On Tuesday, February 12, 2013, Sahagian, David wrote:

> Version=9.1.7****
>
> ** **
>
> INFO: clustering "my_cool_table"  using sequential scan and sort****
>
> INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row
> versions in 49762 pages****
>
> Detail: 1689396 dead row versions cannot be removed yet.****
>
> CPU 9.80s/4.98u sec elapsed 175.92 sec. ****
>
> **
>
...


> ** **
>
> Can somebody tell me why some "dead row versions cannot be removed yet" ?*
> ***
>
> I assume that means CLUSTER must write them to the new table ?
>

It means that while the CLUSTER itself considers them dead, they might
still look alive to some older transaction, so they need to be copied.
 (That older transaction must not have touched the table yet, or else it
would hold a lock that would prevent the CLUSTER from taking place).  You
might want to hunt down the source of those long-lived transactions and try
to eliminated them.



> It seems very costly to do the CLUSTER, if the new table is not really
> going to be a tiny fraction of the old table.****
>
> Is there a way for me to discover the approx number of "non-removables"
> BEFORE I do the CLUSTER ?
>

Not that I can think of.  Well, other than doing a VACUUM VERBOSE, but that
itself would be very costly to do and wasteful if it most of the table is
dead and immediately going to be CLUSTERed anyway.

You might be able to use the contrib module pageinspect to come up with
your own sampling technique.  Although I don't immediately see how to do
that.

>  **
>
> Also, does the use of [index scan on "pk_cool"] basically depend on the
> ratio of removable/nonremovable row versions ?
>

I don't think so.  It mostly depends on the correlation (i.e. how well it
is already clustered), the maintenance_work_mem, and the size the table in
pages, and the estimated size/number of the "live" rows.   (The true costs
depend on the live+nonremovable, but the planner does not have that number
available to it so it uses "live" instead in making the estimates)

Cheers,

Jeff