fast count(*) through statistics collector - Mailing list pgsql-hackers

From Artem Yazkov
Subject fast count(*) through statistics collector
Date
Msg-id 264101906.20080319101654@kemnet.ru
Whole thread Raw
Responses Re: fast count(*) through statistics collector
List pgsql-hackers
Hi hackers,

I'm novice in PostgreSQL codebase (and in English too :-)), but I'd be
glad to make a modest contribution to this great project.

By  viewing  this  list,  I see a lot of discussions on the problem of
"fast  count  (*)",  but acceptable decision have not been formulated.
Well,  I  make  bold  to  propose  own  view on the problem. It can be
described as: "Improve existing infrastructure of statistics collector
and use its for caching the number of rows in the table."

I plan to do the following steps to implement this principle:

1.  Eliminate  500  ms  lag  between transaction commit and statistics
portion   transfer   from  backend  to  collector.  For  this  purpose
pgstat_report_tabstat()  function  must  call in "force" mode only. We
pay   therefor   increased   backend<-->collector   traffic.  As  some
compensation could be invited to send TabStatusArray items, which have
not  changed with the last shipment. This measure will reduce the size
of a messages.
I  see  here  one  more  pitfall:  new transaction can start after the
changes  made  earlier  transaction became visible for other backends,
but  before  the statistics collector managed to take and process data
(despite  the  forced  transfer).  To  avoid  this,  one  may transfer
information  before  the  changes  made  transaction  will be visible,
collector, in one's turn, apply this info after that.
It  is  also  possible  that the use of shared memory instead of pipes
will help increase productivity.

2.  Eliminate  500 ms lag between recieve statistics portion and write
pgstat.stat file. Realize the next todo item for this purpose:
"Allow   statistics  collector  information  to  be  pulled  from  the
collector  process  directly,  rather  than requiring the collector to
write a filesystem file twice a second".
As  an  additional  effect,  we  will  be able to reduce the burden on
I/O channels.

3.  Field n_live_tuples of PgStat_StatTabEntry structure now holds the
number of inserted - deleted tuples for successful transactions, which
are  known  to  collector.  But we need field, which would contain the
number of inserted - deleted tuples for ALL successful transactions in
the  history  of  the  table,  or  it would be undefined (e.g. -1). If
n_live_tuples  not suited for this role, creating additional field. In
any case, I will call this field "live_tuples counter" below.

4.  Values in the live_tuples counters be questioned, if there was any
interruption  of  statistics collection. Therefore, if trac_counts was
set  to  false  in  cluster-wide  or the collector process crash, then
live_tuples  become  undefined  for  all  tables  in  the  cluster. If
pg_stat_reset() call, then live_tuples become undefined for all tables
in  DB.  If pg_stat_clear_snapshot() call, or trac_counts set to false
during user session, then live_tuples counters should undefine for all
tables covered during this transaction/session. If compile such a list
of tables is not possible, well, for all tables in DB.

5.  If  live_tuples  counter  contain  undefined value, but statistics
collector  work  normal,  the  counter  must be restored through first
seqscan.

I  hope  that  these  steps  will  give us mvcc-compliant counters and
overhead cost will increase little.

The next step is relatively simple:

6.  In  the  optimizer/plan/planagg.c  file  add a function similar to
optimize_minmax_aggregates   ()   that   return   null  for  undefined
tuples_count  counters  (and count(*) determine by regular way through
seqscan) or plan for computation such as:
   PgStat_StatTabEntry.live_tuples        +   PgStat_TableCounts.t_new_lived_tuples  +
PgStat_TableXactStatus.tuples_inserted-   PgStat_TableXactStatus.tuples_deleted
 

Restrictions:
1.   Uninterrupted   supply  of  statistics  collector  necessary  for
efficient use of this algorithm.
2. Works only for simplest queries like:    select count (*) from regular_table


Any comments are welcome

--   regards,  Artem  Yazkov



pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Text search selectivity improvements (was Re: Google Summer of Code 2008)
Next
From: "Warren Turkal"
Date:
Subject: Re: timestamp datatype cleanup