Thread: sequences and "addval('myseq', value)"

sequences and "addval('myseq', value)"

From
pgsql@mohawksoft.com
Date:
I have an idea, let me know what you all think.

rather than do:

select sum( field) from largetable

I want (need) to be able to do something like this:

select setval('myseq', select sum(field) from largetable);

and in a trigger

SELECT addval('myseq', field_size);

I have already created this function and am testing it. I basically took
"nextval" and assigned "byval" to a parameter and removed the value
caching code.

Anyway, I'm not quite getting the idea of caching sequence values. I
understand the performance benefits, but it seems problematic across
multiple backends, almost ensuring "holes" in the sequence of numbers.




Re: sequences and "addval('myseq', value)"

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> Anyway, I'm not quite getting the idea of caching sequence values. I
> understand the performance benefits, but it seems problematic across
> multiple backends, almost ensuring "holes" in the sequence of numbers.

The point is to reduce lock contention on the sequence table.  Since
lack-of-holes is explicitly *not* a design goal, there is no downside
that I see.
        regards, tom lane


Re: sequences and "addval('myseq', value)"

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> Anyway, I'm not quite getting the idea of caching sequence values. I
>> understand the performance benefits, but it seems problematic across
>> multiple backends, almost ensuring "holes" in the sequence of numbers.
>
> The point is to reduce lock contention on the sequence table.  Since
> lack-of-holes is explicitly *not* a design goal, there is no downside
> that I see.
>
I knew that, but it wasn't until I thought of using a sequence as a shared
variable that it sort of hit me.

The question is, what do you think of an "addval" function for sequences.
As used:

Executed in a trigger:

select addval('mysum', val);

Executed:
select currval('mysum');

Instead of:

select sum(val) from largetable;


The problem I, and I know many other people are having, is that large sums
can not be obtained without a table scan. A summary table can be created,
but if you have any sort of volume, you have to vacuum the summary table
constantly.

Using the sequence construct as sort of an accumulator just might fit the
bill for this sort of thing.




Re: sequences and "addval('myseq', value)"

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> The question is, what do you think of an "addval" function for sequences.

It would have to be defined as "do the same thing as N successive
nextval calls would do", which would not be especially useful in the
presence of caching.

> The problem I, and I know many other people are having, is that large sums
> can not be obtained without a table scan. A summary table can be created,
> but if you have any sort of volume, you have to vacuum the summary table
> constantly.
> Using the sequence construct as sort of an accumulator just might fit the
> bill for this sort of thing.

How would it work at all?  Suppose your transaction rolls back after
doing the insert or update.  The table won't have changed ... but the
sequence change doesn't roll back.  You can't make such an easy end run
around the fundamental MVCC/locking problem.
        regards, tom lane


Re: sequences and "addval('myseq', value)"

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> The question is, what do you think of an "addval" function for
>> sequences.
>
> It would have to be defined as "do the same thing as N successive
> nextval calls would do", which would not be especially useful in the
> presence of caching.
>
>> The problem I, and I know many other people are having, is that large
>> sums
>> can not be obtained without a table scan. A summary table can be
>> created,
>> but if you have any sort of volume, you have to vacuum the summary table
>> constantly.
>> Using the sequence construct as sort of an accumulator just might fit
>> the
>> bill for this sort of thing.
>
> How would it work at all?  Suppose your transaction rolls back after
> doing the insert or update.  The table won't have changed ... but the
> sequence change doesn't roll back.  You can't make such an easy end run
> around the fundamental MVCC/locking problem.
>

I know that MVCC is an issue, but for this particular class of problem, it
isn't so much. Given any rollback, one should recalculate the value. This
sort of thing is one of those "no elegant solution" problems. A table scan
can't work because it takes too long and a summary table can't work
because you have to vacuum all the time and/or suffer the cost of all the
dead tuples waiting to be vacuumed.

If I understand correctly, and I've sort of just worked on this
assumption, a sequence does not nessisarily produce a sequence of numbers.
It produces a succession of numbers that are guarenteed to increase, but
not nessisarily with a specific interval (usually one).

The "addval" is similar, you can add a value to a sequence. It should be
OK, but is not guarented to be MVCC. The admin or developer will have to
correct any rollbacks. With all its imperfections, it could offer the
software developer who basically has full control over what is being done
to the database to maintain a value which can be used in OLAP or summary
information without forcing a tablescan and without incurring the cost of
constantly updating a row in a table.

I really can't think of a more MVCC compatible way of solving this
problem. I'd love a more elegant solution, but it escapes me, and I know a
lot of developers have this same problem. In fact, when ever I have to
defend my choice of PG over something like MySQL, it is almost invariably
thrown in my face.


Re: sequences and "addval('myseq', value)"

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> If I understand correctly, and I've sort of just worked on this
> assumption, a sequence does not nessisarily produce a sequence of numbers.
> It produces a succession of numbers that are guarenteed to increase, but
> not nessisarily with a specific interval (usually one).

It produces a series of numbers that are guaranteed to be *different*.
The always-increasing property is valid as far as the numbers delivered
in a single session go, but it doesn't hold across backends, so I think
the extent to which you can rely on it is pretty limited.

> The "addval" is similar, you can add a value to a sequence. It should be
> OK, but is not guarented to be MVCC. The admin or developer will have to
> correct any rollbacks.

This strikes me as a complete nonstarter.  How would a failed
transaction fix anything, or even tell anyone else that there was now
something needing to be fixed?  You'd have to rely on the client to
start a fresh transaction and do the fixup.  In practice the total would
get out of sync in no time.

Also, I don't see how you would actually use this without needing addval
to accept negative increments (for decrementing totals on delete, etc).
That seems to create a whole new set of semantic issues, because now you
no longer have the "it's the same as N consecutive nextval's" rule to
define the behavior for you.
        regards, tom lane


Re: sequences and "addval('myseq', value)"

From
Bruno Wolff III
Date:
On Tue, Jun 08, 2004 at 10:18:42 -0400, pgsql@mohawksoft.com wrote:
> 
> If I understand correctly, and I've sort of just worked on this
> assumption, a sequence does not nessisarily produce a sequence of numbers.
> It produces a succession of numbers that are guarenteed to increase, but
> not nessisarily with a specific interval (usually one).

They are only guarenteed to increase within a single session and that assumes
that the sequence doesn't roll over. Some people may need to worry about
32 bit sequences rolling over.


Re: sequences and "addval('myseq', value)"

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> If I understand correctly, and I've sort of just worked on this
>> assumption, a sequence does not nessisarily produce a sequence of
>> numbers.
>> It produces a succession of numbers that are guarenteed to increase, but
>> not nessisarily with a specific interval (usually one).
>
> It produces a series of numbers that are guaranteed to be *different*.
> The always-increasing property is valid as far as the numbers delivered
> in a single session go, but it doesn't hold across backends, so I think
> the extent to which you can rely on it is pretty limited.

Yup.

>
>> The "addval" is similar, you can add a value to a sequence. It should be
>> OK, but is not guarented to be MVCC. The admin or developer will have to
>> correct any rollbacks.
>
> This strikes me as a complete nonstarter.

Tom, I have to chuckle here. You HATE every suggestion I ever make. I
can't think of one thing I've suggested over the years that was ever met
with enthusiasm. Never change. :-)


> How would a failed
> transaction fix anything, or even tell anyone else that there was now
> something needing to be fixed?  You'd have to rely on the client to
> start a fresh transaction and do the fixup.  In practice the total would
> get out of sync in no time.

This is very true, and "expected." In some uses, the "client" is quite in
control of the database. Imagine this:

begin
...
...
...
...
commit

if(no error)   select addval('mysum', val);

In this case it is updated after the commit. It is known NOT to be a
reliable number, but it is good enough for a summary, and in practice


>
> Also, I don't see how you would actually use this without needing addval
> to accept negative increments (for decrementing totals on delete, etc).
> That seems to create a whole new set of semantic issues, because now you
> no longer have the "it's the same as N consecutive nextval's" rule to
> define the behavior for you.

actually, the code does accept negative numbers. Also, I removed the
caching code.

This is *NOT* a perfect or elegant solution. There is, however, an
important problem. How do you maintain a value that is visable to the
database, but does not incure the cost of a huge number of updates or a
full table scan? I'm talking about systems that need to take thousands of
inserts per minute?

Given a controlled environment, "addval" could do this. In fact, I can't
see anything BUT a controlled environment needing something like this.



Re: sequences and "addval('myseq', value)"

From
Greg Stark
Date:
pgsql@mohawksoft.com writes:

> This is *NOT* a perfect or elegant solution. There is, however, an
> important problem. How do you maintain a value that is visable to the
> database, but does not incure the cost of a huge number of updates or a
> full table scan? I'm talking about systems that need to take thousands of
> inserts per minute?

Hm, the real question here is how does postgres handle a particular
circumstance, namely a table with a small number of very busy rows taking
hundreds or thousands of updates per second, with few rollbacks.

In the actual case the busy records are always read in short transactions
(single-record reads) so there's a small number of active records at any time,
probably usually 1, at any time.

Currently postgres handles this by creating a new version for every update and
then delaying the cleanup of the old version until a vacuum can be done.

Oracle on the other hand creates new versions and stores the old version in a
rollback segment. So only the most recent version is in the heap. This incurs
cleanup overhead in the critical path of the transaction.

I don't think it makes sense to move to the Oracle-style method of course. But
perhaps there's some way to move the cleanup closer to the end of the
transaction.

I'm thinking something along the lines of the bgwriter vacuuming every block
as it's written out. That forces every update to generate a vacuum of that
same block at some point in the near future.

I think something like this has been proposed before, and the objection was
that it would force index pages to be read in the midst of bgwriter trying to
get pages out.

But if we're talking about a small, very busy table then the index will
presumably be entirely in memory anyways. Perhaps every page in memory could
hold an update count and trigger an immediate vacuum if 2 updates have been
performed on the page while still in the shared pool, or some heuristic like
that.

But I think it would be worth experimenting with things like this. If
something like it could relieve a lot of the vacuum related angst then it
might be worth some other angst about extra i/o in bgwriter.

-- 
greg



Re: sequences and "addval('myseq', value)"

From
Jan Wieck
Date:
On 6/8/2004 11:46 AM, pgsql@mohawksoft.com wrote:

>>
>> This strikes me as a complete nonstarter.
> 
> Tom, I have to chuckle here. You HATE every suggestion I ever make. I
> can't think of one thing I've suggested over the years that was ever met
> with enthusiasm. Never change. :-)

I happen to agree with Tom on this entire thread. I do not think that 
sequences should be abused as a replacement for global shared variables.

I do think that PostgreSQL should instead have some sort of shared 
memory segment for user variables. The accessor functions would place 
locks and the like, and would have mechanisms like UNDO on rollback. 
With all the limitations that has, inflexible size, garbage collection 
and so on, it would yet be far superior to anything you've come up with 
here.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: sequences and "addval('myseq', value)"

From
pgsql@mohawksoft.com
Date:
> On 6/8/2004 11:46 AM, pgsql@mohawksoft.com wrote:
>
>>>
>>> This strikes me as a complete nonstarter.
>>
>> Tom, I have to chuckle here. You HATE every suggestion I ever make. I
>> can't think of one thing I've suggested over the years that was ever met
>> with enthusiasm. Never change. :-)
>
> I happen to agree with Tom on this entire thread. I do not think that
> sequences should be abused as a replacement for global shared variables.
>
> I do think that PostgreSQL should instead have some sort of shared
> memory segment for user variables. The accessor functions would place
> locks and the like, and would have mechanisms like UNDO on rollback.
> With all the limitations that has, inflexible size, garbage collection
> and so on, it would yet be far superior to anything you've come up with
> here.

My original suggestion was to have some sort of global variable system. I
thought using an existing construct would have been more palletable.

I was wrong.
>
>
> Jan
>
> --
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #================================================== JanWieck@Yahoo.com #
>



Re: sequences and "addval('myseq', value)"

From
Christopher Browne
Date:
In the last exciting episode, pgsql@mohawksoft.com wrote:
>> pgsql@mohawksoft.com writes:
>>> Anyway, I'm not quite getting the idea of caching sequence values. I
>>> understand the performance benefits, but it seems problematic across
>>> multiple backends, almost ensuring "holes" in the sequence of numbers.
>>
>> The point is to reduce lock contention on the sequence table.  Since
>> lack-of-holes is explicitly *not* a design goal, there is no downside
>> that I see.
>>
> I knew that, but it wasn't until I thought of using a sequence as a shared
> variable that it sort of hit me.
>
> The question is, what do you think of an "addval" function for sequences.
> As used:
>
> Executed in a trigger:
>
> select addval('mysum', val);
>
> Executed:
> select currval('mysum');
>
> Instead of:
>
> select sum(val) from largetable;

That seems completely ridiculous, as far as I can see.

The fundamental problem with is that sequence updates (at least the
ones resulting from pulling values from them) "commit themselves"
irrespective of whether the transaction does.

> The problem I, and I know many other people are having, is that
> large sums can not be obtained without a table scan. A summary table
> can be created, but if you have any sort of volume, you have to
> vacuum the summary table constantly.
>
> Using the sequence construct as sort of an accumulator just might
> fit the bill for this sort of thing.

No, it would not.

What would fit the bill would be the use of some sort of "accumulator
table" that would get an insert every time the main table got touched.
The inserts would be within the same transaction context, and
therefore "MVCC-valid."

create table summary_table ( id serial not null unique, tname text not null, rows integer
);

create table large_table ( id serial not null unique, name text, txn_date timestamptz default now(), quantity integer
notnull, units text not null, price numeric(12,2), value numeric(12,2)
 
);

insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);
insert into large_table (name, quantity, units, price, value) values
('foo', 1, 'units', 25, 25);
insert into large_table (name, quantity, units, price, value) values
('foo', 2, 'units', 25, 50);
insert into large_table (name, quantity, units, price, value) values
('foo', 3, 'units', 25, 75);
insert into large_table (name, quantity, units, price, value) values
('foo', 4, 'units', 25, 100);


create or replace function tsummary_add() returns trigger as
'begin  insert into summary_table (tname, rows) values (''large_table'',
1);       return NULL;end;' language 'plpgsql';

create or replace function tsummary_del() returns trigger as
'begin  insert into summary_table (tname, rows) values (''large_table'',
-1);       return NULL;end;' language 'plpgsql';

create or replace function tsummary_get () returns integer as
'declare  prec record;begin  select sum(rows) as sum into prec from summary_table where tname =
''large_table'';  return prec.sum;end;' language 'plpgsql';

create or replace function tsummary_rewrite(integer) returns integer
as
'declare  prec record;  c_max integer;  total integer;begin  if $1 > 1 then     c_max := 1000;  else     c_max := $1;
endif;  for prec in select rows, id from summary_table where tname =
 
''large_table'' limit c_max loop     total := total + prec.rows;     delete from summary_table where id = prec.id;  end
loop; insert into summary_table (tname, rows) values (''large_table'',
 
total);  return c_max;end;' language 'plpgsql';

begin;
insert into summary_table (tname, rows) values ('large_table', (select count(*) from large_table));
create trigger tsummary_add after insert on large_table for each row
execute procedure tsummary_add(); 
create trigger tsummary_del after delete on large_table for each row
execute procedure tsummary_del(); 
commit;

Every time you add a row to large_table, an entry goes into
summary_table for it.  Once in a while, you want to run
tsummary_rewrite() to shorten summary_table.

This isn't _quite_ right, but it's the kind of approach that could
work.
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/nonrdbms.html
"Open  Software and  freeing source  code isn't  socialism.   It isn't
socialist.  It's neither socialist nor capitalist; it just is."
-- Arthur <afrain@usa.net>


Why frequently updated tables are an issue

From
pgsql@mohawksoft.com
Date:
OK, the problem I am having with whole discussion, on several fronts, is
the idea of performance. If performance and consistent behavior were not
*important* issues to a project, a summary table would work fine, and I
could just vacuum frequently.

Currently a client needs to vacuum two summary tables at least once every
two seconds. The performace of the system slowly declines with each
summary update, until the next vacuum. After a vacuum, the transaction
comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and
we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes
get up to whole seconds.

There is an important issue here. Yes, MVCC is good. I agree, and no one
is arguing against it in a general case, however, there are classes of
problems in which MVCC, or at least PostgreSQL's implementation of it, is
not the best solution.

There are two basic problems which are fundimental issues I've had with
PostgreSQL over the years: summary tables and session tables.

The summary tables take the place of a "select sum(col) from table" where
table is very small. The amount of vacuuming required and the steady
degradation of performance prior to each vacuum is a problem that could be
addressed by some global variable system.

The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.

Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.

Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.

PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.




Re: Why frequently updated tables are an issue

From
James Robinson
Date:
On Jun 10, 2004, at 10:30 AM, pgsql@mohawksoft.com wrote:

> Prior to lazy vacuum, this was impossible.

Do you know for sure that lazy vacuum and/or autovacuum does
not indeed solve / alleviate the symptoms of the general problem
of very high rate table updates?

Back to lurking!

----
James Robinson
Socialserve.com



Re: Why frequently updated tables are an issue

From
Shridhar Daithankar
Date:
pgsql@mohawksoft.com wrote:
> The session table is a different issue, but has the same problems. You
> have an active website, hundreds or thousands of hits a second, and you
> want to manage sessions for this site. Sessions are created, updated many
> times, and deleted. Performance degrades steadily until a vacuum. Vacuum
> has to be run VERY frequently. Prior to lazy vacuum, this was impossible.
> 
> Both session tables and summary tables have another thing in common, they
> are not vital data, they hold transitive state information. Yea, sure,
> data integrity is important, but if you lose these values, you can either
> recreate it or it isn't too important.
> 
> Why put that is a database at all? Because, in the case of sessions
> especially, you need to access this information for other operations. In
> the case of summary tables, OLAP usually needs to join or include this
> info.
> 
> PostgreSQL's behavior on these cases is poor. I don't think anyone who has
> tried to use PG for this sort of thing will disagree, and yes it is
> getting better. Does anyone else consider this to be a problem? If so, I'm
> open for suggestions on what can be done. I've suggested a number of
> things, and admittedly they have all been pretty weak ideas, but they were
> potentially workable.

There is another as-of-non-feasible and hence rejected approach. Vacuum in 
postgresql is tied to entire relations/objects since indexes do not have 
transaction visibility information.

It has been suggested in past to add such a visibility to index tuple header so 
that index and heaps can be cleaned out of order. In such a case other backround 
processes such as background writer and soon-to-be integrated autovacuum daemon 
can vacuum pages/buffers rather than relations. That way most used things will 
remain clean and cost of cleanup will remain outside crtical transaction 
processing path.

However increasing index footprint seems to be a tough sell. Besides FSM would 
need some rework to accomodate/autotune it's behaviour.

I am quoting from memory, so don't flame me if I misquote it. Just adding to 
make this complete. Only from performance point of view, it could solve quite 
some problems, at least in theory.
 Shridhar


Re: Why frequently updated tables are an issue

From
"Glen Parker"
Date:
> It has been suggested in past to add such a visibility to index
> tuple header so
> that index and heaps can be cleaned out of order. In such a case
> other backround

It seems to me that the benefit of this wouldn't be all that impressive
*when accessing the cache*, which is the problem this discussion is about.
Disk access would occur more commonly with large tables, which I'll ignore.
Let's say total scan time for a query on a very dirty table is 100ms.  It
seems safe to assume that the scan time for the index would be *roughly*
half that of the heap.  If visibilty could be determined by looking at just
the index tuple, you'd cut you query scan time down to 50ms.  When the clean
table case is 7ms total scan time, the difference between 50 and 100 ms is
not much of an issue; either way, it's still way to high!

> However increasing index footprint seems to be a tough sell.


And rightly so, IMO.

Glen Parker



Re: Why frequently updated tables are an issue

From
Shridhar Daithankar
Date:
Glen Parker wrote:

>>It has been suggested in past to add such a visibility to index
>>tuple header so
>>that index and heaps can be cleaned out of order. In such a case
>>other backround
> It seems to me that the benefit of this wouldn't be all that impressive
> *when accessing the cache*, which is the problem this discussion is about.

I doubt. I have seen examnples on general list where people have thousands of 
dead *pages* for few hundred live tuples. If it is a problem with cache access, 
it will spill to disk as the problem grows.

I don't think postgresql cache is that bad. No matter how you do it RAM is RAM. 
Problem is with disk bandwidth. See past discussions about vacuum delay patch 
and improvement it brought around.

Vacuum costs disk bandwidth and that affects performance. That remains a fact.

> Disk access would occur more commonly with large tables, which I'll ignore.
> Let's say total scan time for a query on a very dirty table is 100ms.  It
> seems safe to assume that the scan time for the index would be *roughly*
> half that of the heap.  If visibilty could be determined by looking at just
> the index tuple, you'd cut you query scan time down to 50ms.  When the clean
> table case is 7ms total scan time, the difference between 50 and 100 ms is
> not much of an issue; either way, it's still way to high!
>>However increasing index footprint seems to be a tough sell.
> And rightly so, IMO.

Mee too. Unless somebody comes up with patch that demonstrates the improvement. 
Obviously people can live with cost of mandatory vacuum so this is not high 
priority. But one day it will be.

OTOH if the perceived benefit is not there, at least it is proven that it is not 
there. I plan to do it when I find time. But again, I don't face the problem 
myself(I don't even use postgresql for anything important for that matter) so 
haven't bothered spending any time on it.

As long as it is not high priority, it is going to be a tough sell. Thats not 
unusual.
 Shridhar


Re: Why frequently updated tables are an issue

From
Jan Wieck
Date:
On 6/10/2004 10:37 AM, Shridhar Daithankar wrote:

> pgsql@mohawksoft.com wrote:
>> The session table is a different issue, but has the same problems. You
>> have an active website, hundreds or thousands of hits a second, and you
>> want to manage sessions for this site. Sessions are created, updated many
>> times, and deleted. Performance degrades steadily until a vacuum. Vacuum
>> has to be run VERY frequently. Prior to lazy vacuum, this was impossible.
>> 
>> Both session tables and summary tables have another thing in common, they
>> are not vital data, they hold transitive state information. Yea, sure,
>> data integrity is important, but if you lose these values, you can either
>> recreate it or it isn't too important.
>> 
>> Why put that is a database at all? Because, in the case of sessions
>> especially, you need to access this information for other operations. In
>> the case of summary tables, OLAP usually needs to join or include this
>> info.
>> 
>> PostgreSQL's behavior on these cases is poor. I don't think anyone who has
>> tried to use PG for this sort of thing will disagree, and yes it is
>> getting better. Does anyone else consider this to be a problem? If so, I'm
>> open for suggestions on what can be done. I've suggested a number of
>> things, and admittedly they have all been pretty weak ideas, but they were
>> potentially workable.
> 
> There is another as-of-non-feasible and hence rejected approach. Vacuum in 
> postgresql is tied to entire relations/objects since indexes do not have 
> transaction visibility information.
> 
> It has been suggested in past to add such a visibility to index tuple header so 
> that index and heaps can be cleaned out of order. In such a case other backround 
> processes such as background writer and soon-to-be integrated autovacuum daemon 
> can vacuum pages/buffers rather than relations. That way most used things will 
> remain clean and cost of cleanup will remain outside crtical transaction 
> processing path.

This is not feasable because at the time you update or delete a row you 
would have to visit all it's index entries. The performance impact on 
that would be immense.

But a per relation bitmap that tells if a block is a) free of dead 
tuples and b) all remaining tuples in it are frozen could be used to let 
vacuum skip them (there can't be anything to do). The bit would get 
reset whenever the block is marked dirty. This would cause vacuum to 
look at mainly recently touched blocks, likely to be found in the buffer 
cache anyway and thus dramatically reduce the amount of IO and thereby 
make high frequent vacuuming less expensive.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Why frequently updated tables are an issue

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> But a per relation bitmap that tells if a block is a) free of dead 
> tuples and b) all remaining tuples in it are frozen could be used to let 
> vacuum skip them (there can't be anything to do). The bit would get 
> reset whenever the block is marked dirty. This would cause vacuum to 
> look at mainly recently touched blocks, likely to be found in the buffer 
> cache anyway and thus dramatically reduce the amount of IO and thereby 
> make high frequent vacuuming less expensive.

I don't think it would help very much to define a bit like that --- I
can't believe that very many pages would contain only frozen tuples,
unless you were to adopt an aggressive policy of using VACUUM FREEZE
a lot.

It might be interesting though to have some kind of "fast vacuum" mode
that doesn't worry about freezing tuples, but only reclaiming dead ones.
This could look at only recently touched pages, with perhaps the
cooperation of the bgwriter to keep track of candidate pages.  (You'd
still have to do full-database scans for freezable tuples occasionally ...
but not very often.)

The main thing I don't see how to handle efficiently is getting rid of
the index entries for dead rows.  The present indexam API is designed
for getting rid of index entries wholesale, but you'd need something
that works better for retail removal to approach vacuuming this way.
        regards, tom lane


Re: Why frequently updated tables are an issue

From
Jan Wieck
Date:
On 6/12/2004 3:45 PM, Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> But a per relation bitmap that tells if a block is a) free of dead 
>> tuples and b) all remaining tuples in it are frozen could be used to let 
>> vacuum skip them (there can't be anything to do). The bit would get 
>> reset whenever the block is marked dirty. This would cause vacuum to 
>> look at mainly recently touched blocks, likely to be found in the buffer 
>> cache anyway and thus dramatically reduce the amount of IO and thereby 
>> make high frequent vacuuming less expensive.
> 
> I don't think it would help very much to define a bit like that --- I
> can't believe that very many pages would contain only frozen tuples,
> unless you were to adopt an aggressive policy of using VACUUM FREEZE
> a lot.

I thought this implies an aggressive policy of freezing everything by 
default. But I guess there is something I am not aware of that makes 
aggressive freezing a bad thing.

> 
> It might be interesting though to have some kind of "fast vacuum" mode
> that doesn't worry about freezing tuples, but only reclaiming dead ones.
> This could look at only recently touched pages, with perhaps the
> cooperation of the bgwriter to keep track of candidate pages.  (You'd
> still have to do full-database scans for freezable tuples occasionally ...
> but not very often.)

Wouldn't that screw the current FSM population mechanisms? Not that my 
suggestions above wouldn't do that either :-)

> 
> The main thing I don't see how to handle efficiently is getting rid of
> the index entries for dead rows.  The present indexam API is designed
> for getting rid of index entries wholesale, but you'd need something
> that works better for retail removal to approach vacuuming this way.

Which is a problem so fundamentally embedded in the index implementation 
itself that is sounds more like a wholesale replacement of the index am 
than some nifty changes.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Why frequently updated tables are an issue

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> On 6/12/2004 3:45 PM, Tom Lane wrote:
>> I don't think it would help very much to define a bit like that --- I
>> can't believe that very many pages would contain only frozen tuples,
>> unless you were to adopt an aggressive policy of using VACUUM FREEZE
>> a lot.

> I thought this implies an aggressive policy of freezing everything by 
> default. But I guess there is something I am not aware of that makes 
> aggressive freezing a bad thing.

Well, it means extra I/O to freeze tuples that you otherwise probably
never would.  So it's not obvious that aggressive freezing in hopes of
saving cycles later is a win.

>> It might be interesting though to have some kind of "fast vacuum" mode
>> that doesn't worry about freezing tuples, but only reclaiming dead ones.

> Wouldn't that screw the current FSM population mechanisms? Not that my 
> suggestions above wouldn't do that either :-)

Yeah, that's another "wholesale" mechanism that we'd have to look at
refining.
        regards, tom lane


Re: Why frequently updated tables are an issue

From
Bruce Momjian
Date:
Jan Wieck wrote:
> On 6/12/2004 3:45 PM, Tom Lane wrote:
> 
> > Jan Wieck <JanWieck@Yahoo.com> writes:
> >> But a per relation bitmap that tells if a block is a) free of dead 
> >> tuples and b) all remaining tuples in it are frozen could be used to let 
> >> vacuum skip them (there can't be anything to do). The bit would get 
> >> reset whenever the block is marked dirty. This would cause vacuum to 
> >> look at mainly recently touched blocks, likely to be found in the buffer 
> >> cache anyway and thus dramatically reduce the amount of IO and thereby 
> >> make high frequent vacuuming less expensive.
> > 
> > I don't think it would help very much to define a bit like that --- I
> > can't believe that very many pages would contain only frozen tuples,
> > unless you were to adopt an aggressive policy of using VACUUM FREEZE
> > a lot.
> 
> I thought this implies an aggressive policy of freezing everything by 
> default. But I guess there is something I am not aware of that makes 
> aggressive freezing a bad thing.

Why are frozen tuples significant?  I assume any page that has no dead
tuples could be skipped by vacuum.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Why frequently updated tables are an issue

From
Bruce Momjian
Date:
Is there a TODO here?  No one explained why frozen tuples are important.

---------------------------------------------------------------------------

Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > On 6/12/2004 3:45 PM, Tom Lane wrote:
> >> I don't think it would help very much to define a bit like that --- I
> >> can't believe that very many pages would contain only frozen tuples,
> >> unless you were to adopt an aggressive policy of using VACUUM FREEZE
> >> a lot.
> 
> > I thought this implies an aggressive policy of freezing everything by 
> > default. But I guess there is something I am not aware of that makes 
> > aggressive freezing a bad thing.
> 
> Well, it means extra I/O to freeze tuples that you otherwise probably
> never would.  So it's not obvious that aggressive freezing in hopes of
> saving cycles later is a win.
> 
> >> It might be interesting though to have some kind of "fast vacuum" mode
> >> that doesn't worry about freezing tuples, but only reclaiming dead ones.
> 
> > Wouldn't that screw the current FSM population mechanisms? Not that my 
> > suggestions above wouldn't do that either :-)
> 
> Yeah, that's another "wholesale" mechanism that we'd have to look at
> refining.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Why frequently updated tables are an issue

From
Manfred Spraul
Date:
pgsql@mohawksoft.com wrote a few months ago:

>PostgreSQL's behavior on these cases is poor. I don't think anyone who has
>tried to use PG for this sort of thing will disagree, and yes it is
>getting better. Does anyone else consider this to be a problem? If so, I'm
>open for suggestions on what can be done. I've suggested a number of
>things, and admittedly they have all been pretty weak ideas, but they were
>potentially workable.
>
>  
>
What about a dblink style interface to a non-MVCC SQL database?
I think someone on this list mentioned that there are open source 
in-memory SQL databases.

--   Manfred


Re: Why frequently updated tables are an issue

From
David Fetter
Date:
On Wed, Oct 20, 2004 at 07:10:35PM +0200, Manfred Spraul wrote:
> pgsql@mohawksoft.com wrote a few months ago:
> 
> >PostgreSQL's behavior on these cases is poor.  I don't think anyone
> >who has tried to use PG for this sort of thing will disagree, and
> >yes it is getting better.  Does anyone else consider this to be a
> >problem?  If so, I'm open for suggestions on what can be done.
> >I've suggested a number of things, and admittedly they have all
> >been pretty weak ideas, but they were potentially workable.
> >
> What about a dblink style interface to a non-MVCC SQL database?  I
> think someone on this list mentioned that there are open source
> in-memory SQL databases.

Funny you should mention this.  I'm working on a generalized one that
uses PL/PerlU at

http://fetter.org/pg_ps/

Thanks to Josh Berkus for design ideas & implementation roadmap, and
to Josh Drake, Andrew Dunstan, Abhijit Menon-Sen and (of course) Tom
Lane for the infrastructure that makes this doable. :)

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!