Re: Patch: ResourceOwner optimization for tables with many partitions - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Patch: ResourceOwner optimization for tables with many partitions
Date
Msg-id 20160125191827.615fe277@fujitsu
Whole thread Raw
In response to Re: Patch: ResourceOwner optimization for tables with many partitions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Patch: ResourceOwner optimization for tables with many partitions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello, Tom

> Also, there are defined ways to convert between Datum format and
> other formats (DatumGetPointer() etc).  This code isn't using 'em.

Fixed. But I was not sure how to deal with File and Buffer types since
they are ints (see fd.h and buf.h) and there is no DatumGetInt macro,
only DatumGetInt32/Int64. I don't know if there is a good reason for
this. So for now I just added these definitions right into resowner.c:

```
/* Convert File to Datum */
#define FileGetDatum(file) ((Datum)(file))

/* Convert Datum to File */
#define DatumGetFile(datum)((File)(datum))

/* Convert Buffer to Datum */
#define BufferGetDatum(buffer)((Datum)(buffer))

/* Convert Datum to Buffer */
#define DatumGetBuffer(datum)((Buffer)(datum))
```

... to make all code look similar and all intentions --- clear.

I have a feeling you could suggest a better solution :)

> This is certainly not doing what I had in mind for communication
> between ResourceOwnerGetAny and ResourceOwnerRelease, because the
> latter pays zero attention to "lastidx", but instead does a blind
> hash search regardless.
>
> In general, I'm suspicious of the impact of this patch on "normal"
> cases with not-very-large resource arrays.  It might be hard to
> measure that because in such cases resowner.c is not a bottleneck;
> but that doesn't mean I want to give up performance in cases that
> perform well today.  You could probably set up a test harness that
> exercises ResourceOwnerAdd/Release/etc in isolation and get good
> timings for them that way, to confirm or disprove whether there's
> a performance loss for small arrays.
>
> I still suspect it would be advantageous to have two different
> operating modes depending on the size of an individual resource
> array, and not bother with hashing until you got to more than a
> couple dozen entries. Given that you're rehashing anyway when you
> enlarge the array, this wouldn't cost anything except a few more
> lines of code, ISTM --- and you already have a net code savings
> because of the refactoring.

To be honest I don't have much faith in such micro benchmarks. They
don't consider how code will be executed under real load. Therefore any
results of such a benchmark wouldn't give us a clear answer which
solution is preferable. Besides different users can execute the same
code in different fashion.

I compared two implementations - "always use hashing" (step2a.path) and
"use hashing only for large arrays" (step2b.path). Both patches give
the same performance according to benchmark I described in a first
message of this thread.

Since none of these patches is better in respect of problem I'm trying
to solve I suggest to use step2b.path. It seems to be a good idea not to
use hashing for searching in small arrays. Besides in this case
behaviour of PostgreSQL will be changed less after applying a patch.
Users will probably appreciate this.

Best regards,
Aleksander
Attachment

pgsql-hackers by date:

Previous
From: "Shulgin, Oleksandr"
Date:
Subject: Re: More stable query plans via more predictable column statistics
Next
From: Pavel Stehule
Date:
Subject: Re: [patch] Proposal for \crosstabview in psql