Re: Slow count(*) again... - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: Slow count(*) again...
Date
Msg-id 4CB41D8E.2010302@gmail.com
Whole thread Raw
In response to Re: Slow count(*) again...  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-performance
12.10.10 11:14, Craig Ringer написав(ла):
> On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:
>
>> BTW: There is a lot of talk about MVCC, but is next solution possible:
>> 1) Create a page information map that for each page in the table will
>> tell you how may rows are within and if any write (either successful or
>> not) were done to this page. This even can be two maps to make second
>> one really small (a bit per page) - so that it could be most time
>> in-memory.
>> 2) When you need to to count(*) or index check - first check if there
>> were no writes to the page. If not - you can use count information from
>> page info/index data without going to the page itself
>> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
>> I using terminology correctly?).
>
> Part of this already exists. It's called the visibility map, and is
> present in 8.4 and above. It's not currently used for queries, but can
> potentially be used to aid some kinds of query.
>
> http://www.postgresql.org/docs/8.4/static/storage-vm.html
>
>> In this case all read-only (archive) data will be this bit off and
>> index/count(*) will be really fast.
>
> A count with any joins or filter criteria would still have to scan all
> pages with visible tuples in them.
If one don't use parittioning. With proper partitioning, filter can
simply select a partitions.

Also filtering can be mapped on the index lookup. And if one could join
index hash and visibility map, much like two indexes can be bit joined
now, count can be really fast for all but non-frozen tuples.
> So the visibility map helps speed up scanning of bloated tables, but
> doesn't provide a magical "fast count" except in the utterly trivial
> "select count(*) from tablename;" case, and can probably only be used
> for accurate results when there are no read/write transactions
> currently open.
Why so? You simply has to recount the pages that are marked dirty in
usual way. But count problem usually occurs when there are a lot of
archive data (you need to count over 100K records) that is not modified.

Best regards, Vitalii Tymchyshyn

pgsql-performance by date:

Previous
From: david@lang.hm
Date:
Subject: Re: Slow count(*) again...
Next
From: Craig Ringer
Date:
Subject: Re: Slow count(*) again...