Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Alik Khilazhev
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id 46958A39-D273-456D-A2D6-E6655BA2B4DC@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
List pgsql-hackers
Hello, Fabien!

Your description is not very precise. What version of Postgres is used? If there is a decline, compared to which version? Is there a link to these results?

Benchmark have been done in master v10. I am attaching image with results:
.

Indeed, the function computation is over expensive, and the numerical precision of the implementation is doubtful.

If there is no better way to compute this function, ISTM that it should be summed in reverse order to accumulate small values first, from (1/n)^s + ... + (1/2)^ s. As 1/1 == 1, the corresponding term is 1, no point in calling pow for this one, so it could be:

      double ans = 0.0;
      for (i = n; i >= 2; i--)
            ans += pow(1. / i, theta);
      return 1.0 + ans;

You are right, it’s better to reverse order.

If the functions when actually used is likely to be called with different parameters, then some caching beyond the last value would seem in order. Maybe a small fixed size array?

However, it should be somehow thread safe, which does not seem to be the case with the current implementation. Maybe a per-thread cache? Or use a lock only to update a shared cache? At least it should avoid locking to read values…

Yea, I forget about thread-safety. I will implement per-thread cache with small fixed array.

Given the explanations, the random draw mostly hits values at the beginning of the interval, so when the number of client goes higher one just get locking contention on the updated row?

Yes, exactly. 

ISTM that also having the tps achieved with a flat distribution would allow to check this hypothesis.

On Workload A with uniform distribution PostgreSQL shows better results than MongoDB and MySQL(see attachment). Also you can notice that for small number of clients  type of distribution does not affect on tps on MySQL. 


And it’s important to mention that postgres run with option synchronous_commit=off, to satisfy  durability MongoDB writeConcern=1&journaled=false. In this mode there is possibility to lose all changes in the last second. If we run postgres with max durability MongoDB will lag far behind. 
---
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: [HACKERS] Re: retry shm attach for windows (WAS: Re: OK, so culicidae is*still* broken)
Next
From: Michael Paquier
Date:
Subject: [HACKERS] Re: retry shm attach for windows (WAS: Re: OK, so culicidae is*still* broken)