[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking inserializable transactions - Mailing list pgsql-hackers

From Kevin Grittner
Subject [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking inserializable transactions
Date
Msg-id CACjxUsNBTsHn-QBB-vQSi6++jvgqfTHZrqYhBKt9f9jko3r5VA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking inserializable transactions  ("Mengxing Liu" <liu-mx15@mails.tsinghua.edu.cn>)
Responses [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions  ("Mengxing Liu" <liu-mx15@mails.tsinghua.edu.cn>)
List pgsql-hackers
On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
<liu-mx15@mails.tsinghua.edu.cn> wrote:

> My name is Mengxing Liu. I am interested in the project "Eliminate
> O(N^2) scaling from rw-conflict tracking in serializable
> transactions”. After discussing with Kevin off-list, I think it's
> time to post discussion here. I am afraid that there are two things
> that I need your help. Thank you very much.

Welcome to the -hackers list!  This is a key part of how development
happens in the community.  Don't be shy about posting questions and
ideas, but also expect fairly blunt discussion to occur at times;
don't let that put you off.

>>> So the task is clear. We can use a tree-like or hash-like data
>>> structure to speed up this function.
>>
>> Right -- especially with a large number of connections holding a
>> large number of conflicts.  In one paper with high concurrency, they
>> found over 50% of the CPU time for PostgreSQL was going to these
>> functions (including functions called by them).  This seems to me to
>> be due to the O(N^2) (or possibly worse) performance from the number
>> of connections.
>
> Anyone knows the title of this paper? I want to reproduce its
> workloads.

I seem to remember there being a couple other papers or talks, but
this is probably the most informative:

http://sydney.edu.au/engineering/it/research/tr/tr693.pdf

>> Remember, I think most of the work here is going to be in
>> benchmarking.  We not only need to show improvements in simple test
>> cases using readily available tools like pgbench, but think about
>> what types of cases might be worst for the approach taken and show
>> that it still does well -- or at least not horribly.  It can be OK
>> to have some slight regression in an unusual case if the common
>> cases improve a lot, but any large regression needs to be addressed
>> before the patch can be accepted.  There are some community members
>> who are truly diabolical in their ability to devise "worst case"
>> tests, and we don't want to be blind-sided by a bad result from one
>> of them late in the process.
>>
>
> Are there any documents or links introducing how to test and
> benchmark PostgreSQL? I may need some time to learn about it.

There is pgbench:

https://www.postgresql.org/docs/devel/static/pgbench.html

A read-only load and a read/write mix should both be tested,
probably with a few different scales and client counts to force the
bottleneck to be in different places.  The worst problems have been
seen with 32 or more cores on 4 or more sockets with a large number
of active connections.  I don't know whether you have access to a
machine capable of putting this kind of stress on it (perhaps at
your university?), but if not, the community has access to various
resources we should be able to schedule time on.

It may pay for you to search through the archives of the last year
or two to look for other benchmarks and see what people have
previously done.

--
Kevin Grittner



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: [HACKERS] Need a builtin way to run all tests faster manner
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] Index usage for elem-contained-by-const-range clauses