Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id CAH2-WznMsUdEvQbpHjox+Gqjz8bn7DgxkEjUGp6iL-0O6UTzGg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jul 7, 2017 at 5:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> How is that possible?  In a Zipfian distribution, no matter how big
> the table is, almost all of the updates will be concentrated on a
> handful of rows - and updates to any given row are necessarily
> serialized, or so I would think.  Maybe MongoDB can be fast there
> since there are no transactions, so it can just lock the row slam in
> the new value and unlock the row, all (I suppose) without writing WAL
> or doing anything hard.

If you're not using the Wired Tiger storage engine, than the locking
is at the document level, which means that a Zipfian distribution is
no worse than any other as far as lock contention goes. That's one
possible explanation. Another is that indexed organized tables
naturally have much better locality, which matters at every level of
the memory hierarchy.

> I'm more curious about why we're performing badly than I am about a
> general-purpose random_zipfian function.  :-)

I'm interested in both. I think that a random_zipfian function would
be quite helpful for modeling certain kinds of performance problems,
like CPU cache misses incurred at the page level.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Wong, Yi Wen"
Date:
Subject: [HACKERS] replication_slot_catalog_xmin not explicitly initialized whencreating procArray
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] [WIP] Zipfian distribution in pgbench