Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: shared-memory based stats collector
Date
Msg-id 20200323.171713.509041210070242668.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: shared-memory based stats collector  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Thank you for looking this.

At Thu, 19 Mar 2020 16:51:59 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> > This seems like a failure prone API.
> 
> If I understand correctly, the only purpose of the seqscan_running
> variable is to control that behaviour ^^^.  That is, to make
> dshash_delete_entry() keep the partition lock if you delete an entry
> while doing a seq scan.  Why not get rid of that, and provide a
> separate interface for deleting while scanning?
> dshash_seq_delete(dshash_seq_status *scan, void *entry).  I suppose it
> would be most common to want to delete the "current" item in the seq
> scan, but it could allow you to delete anything in the same partition,
> or any entry if using the "consistent" mode.  Oh, I see that Andres
> said the same thing later.

The attached v25 in [1] is the new version.

> > Why does this patch add the consistent mode? There's no users currently?
> > Without it's not clear that we need a seperate _term function, I think?
> 
> +1, let's not do that if we don't need it!

Yes, it is removed.

> > The fact that you're locking the per-database entry unconditionally once
> > for each table almost guarantees contention - and you're not using the
> > 'conditional lock' approach for that. I don't understand.
> 
> Right, I also noticed that:

I think I fixed all cases except drop or something like that needs
exclusive lock.

> So pgstat_update_tabentry() goes to great trouble to take locks
> conditionally, but then pgstat_update_dbentry() immediately does:
> 
>     LWLockAcquire(&dbentry->lock, LW_EXCLUSIVE);
>     dbentry->n_tuples_returned += stat->t_counts.t_tuples_returned;
>     dbentry->n_tuples_fetched += stat->t_counts.t_tuples_fetched;
>     dbentry->n_tuples_inserted += stat->t_counts.t_tuples_inserted;
>     dbentry->n_tuples_updated += stat->t_counts.t_tuples_updated;
>     dbentry->n_tuples_deleted += stat->t_counts.t_tuples_deleted;
>     dbentry->n_blocks_fetched += stat->t_counts.t_blocks_fetched;
>     dbentry->n_blocks_hit += stat->t_counts.t_blocks_hit;
>     LWLockRelease(&dbentry->lock);
> 
> Why can't we be "lazy" with the dbentry stats too?  Is it really
> important for the table stats and DB stats to agree with each other?
> Even if it were, your current coding doesn't achieve that: the table
> stats are updated before the DB stat under different locks, so I'm not
> sure why it can't wait longer.

It is done lazy way.

> Hmm.  Even if you change the above code use a conditional lock, I am
> wondering (admittedly entirely without data) if this approach is still
> too clunky: even trying and failing to acquire the lock creates
> contention, just a bit less.  I wonder if it would make sense to make
> readers do more work, so that writers can avoid contention.  For
> example, maybe PgStat_StatDBEntry could hold an array of N sets of
> counters, and readers have to add them all up.  An advanced version of

I thought that kind of solution but that needs more memory multipled
by the number of backends. If the contention is not negligible, we can
go back to stats collector process connected via sockets then share
the result on shared memory. The motive was the file I/O on reading
stats on backens.

> this idea would use a reasonably fresh copy of something like
> sched_getcpu() and numa_node_of_cpu() to select a partition to
> minimise contention and cross-node traffic, with a portable fallback
> based on PID or something.  CPU core/node awareness is something I
> haven't looked into too seriously, but it's been on my mind to solve
> some other problems.

I have got asked about the CPU core/node awareness several times.  It
might have a certain degree of needs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: error context for vacuum to include block number
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?