[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions - Mailing list pgsql-hackers

From Mengxing Liu
Subject [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions
Date
Msg-id 1ec4e20c.1d454.15ac062fdde.Coremail.liu-mx15@mails.tsinghua.edu.cn
Whole thread Raw
In response to [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking inserializable transactions  (Kevin Grittner <kgrittn@gmail.com>)
Responses [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict trackingin serializable transactions  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers


> -----Original Messages-----
> From: "Kevin Grittner" <kgrittn@gmail.com>
> Sent Time: 2017-03-12 04:24:29 (Sunday)
> To: "Mengxing Liu" <liu-mx15@mails.tsinghua.edu.cn>
> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
>
> On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
> <liu-mx15@mails.tsinghua.edu.cn> wrote:
>
> > My name is Mengxing Liu. I am interested in the project "Eliminate
> > O(N^2) scaling from rw-conflict tracking in serializable
> > transactions”. After discussing with Kevin off-list, I think it's
> > time to post discussion here. I am afraid that there are two things
> > that I need your help. Thank you very much.
>
> Welcome to the -hackers list!  This is a key part of how development
> happens in the community.  Don't be shy about posting questions and
> ideas, but also expect fairly blunt discussion to occur at times;
> don't let that put you off.
>

Thanks, Kevin. I will keep that in mind.

> >>> So the task is clear. We can use a tree-like or hash-like data
> >>> structure to speed up this function.
> >>
> >> Right -- especially with a large number of connections holding a
> >> large number of conflicts.  In one paper with high concurrency, they
> >> found over 50% of the CPU time for PostgreSQL was going to these
> >> functions (including functions called by them).  This seems to me to
> >> be due to the O(N^2) (or possibly worse) performance from the number
> >> of connections.
> >
> > Anyone knows the title of this paper? I want to reproduce its
> > workloads.
>
> I seem to remember there being a couple other papers or talks, but
> this is probably the most informative:
>
> http://sydney.edu.au/engineering/it/research/tr/tr693.pdf
>

Thanks, It's exactly what I need.

> >> Remember, I think most of the work here is going to be in
> >> benchmarking.  We not only need to show improvements in simple test
> >> cases using readily available tools like pgbench, but think about
> >> what types of cases might be worst for the approach taken and show
> >> that it still does well -- or at least not horribly.  It can be OK
> >> to have some slight regression in an unusual case if the common
> >> cases improve a lot, but any large regression needs to be addressed
> >> before the patch can be accepted.  There are some community members
> >> who are truly diabolical in their ability to devise "worst case"
> >> tests, and we don't want to be blind-sided by a bad result from one
> >> of them late in the process.
> >>
> >
> > Are there any documents or links introducing how to test and
> > benchmark PostgreSQL? I may need some time to learn about it.
>
> There is pgbench:
>
> https://www.postgresql.org/docs/devel/static/pgbench.html
>
> A read-only load and a read/write mix should both be tested,
> probably with a few different scales and client counts to force the
> bottleneck to be in different places.  The worst problems have been
> seen with 32 or more cores on 4 or more sockets with a large number
> of active connections.  I don't know whether you have access to a
> machine capable of putting this kind of stress on it (perhaps at
> your university?), but if not, the community has access to various
> resources we should be able to schedule time on.
>

There is a NUMA machine ( 120 cores, 8 sockets) in my lab. I think it's enough to put this kind of stress.
Anyway, I would ask for help if necessary.

> It may pay for you to search through the archives of the last year
> or two to look for other benchmarks and see what people have
> previously done.
>

I would try.



--
Mengxing Liu











pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Write Ahead Logging for Hash Indexes
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)