Thread: Performance difference in accessing differrent columns in a Postgres Table

Performance difference in accessing differrent columns in a Postgres Table

From
Dinesh Kumar
Date:
Hello All,

I created a table with 200 bigint column, 200 varchar column. (Postgres 10.4)

create table i200c200 ( pk bigint primary key, int1 bigint, int2 bigint,....., int200 bigint, char1 varchar(255),......, char200 varchar(255)) ;

Inserted values only in pk,int1,int200 columns with some random data ( from generate series) and remaining columns are all null. The table has 1000000 rows.

I found performance variance between accessing int1 and int200 column which is quite large.

Reports from pg_stat_statements:

                 query                  | total_time | min_time | max_time | mean_time |    stddev_time     
-----------------------------------------+------------+----------+----------+-----------+--------------------select pk,int1 from i200c200 limit 200  |       0.65 |    0.102 |    0.138 |      0.13 | 0.0140142784330839select pk,int199 from i200c200 limit $1 |      1.207 |     0.18 |    0.332 |    0.2414 | 0.0500583659341773 select pk,int200 from i200c200 limit 200|       1.67 |    0.215 |    0.434 |     0.334 | 0.0697825193010399
Explain Analyse:

explain analyse select pk,int1 from i200c200 limit 1000;                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------Limit  (cost=0.00..23.33 rows=1000 width=16) (actual time=0.014..0.390 rows=1000 loops=1)  ->  Seq Scan on i200c200  (cost=0.00..23334.00 rows=1000000 width=16) (actual time=0.013..0.268 rows=1000 loops=1)Planning time: 0.066 msExecution time: 0.475 ms
explain analyse select pk,int200 from i200c200 limit 1000;                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------Limit  (cost=0.00..23.33 rows=1000 width=16) (actual time=0.012..1.001 rows=1000 loops=1)  ->  Seq Scan on i200c200  (cost=0.00..23334.00 rows=1000000 width=16) (actual time=0.011..0.894 rows=1000 loops=1)Planning time: 0.049 msExecution time: 1.067 ms
I am curious in getting this postgres behaviour and its internals.

Note: I have the tried the same query with int199 column which is null in all rows,it is still performance variant.Since,postgres doesn't store null values in data instead it store in null bit map,there should not be this variation(because i'm having data only for pk,int1,int200).I am wondering that this null bit map lookup is slowing down this , because each row in my table is having a null bit map of size (408 bits).As newbie I am wondering whether this null bit map lookup for non-earlier column is taking too much time (for scanning the null bit map itself).Am i thinking in right way?

Thanks in advance,

Dineshkumar.P

Postgres Newbie.





On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
> I found performance variance between accessing int1 and int200 column which
> is quite large.

Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
that tuples are deformed starting at the first attribute. If you ask
for attribute 200 then it must deform 1-199 first.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
>> I found performance variance between accessing int1 and int200 column which
>> is quite large.

> Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> that tuples are deformed starting at the first attribute. If you ask
> for attribute 200 then it must deform 1-199 first.

Note that that can be optimized away in some cases, though evidently
not the one the OP is testing.  From memory, you need a tuple that
contains no nulls, and all the columns to the left of the target
column have to be fixed-width datatypes.  Otherwise, the offset to
the target column is uncertain, and we have to search for it.

            regards, tom lane




2018-07-30 1:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
>> I found performance variance between accessing int1 and int200 column which
>> is quite large.

> Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> that tuples are deformed starting at the first attribute. If you ask
> for attribute 200 then it must deform 1-199 first.

Note that that can be optimized away in some cases, though evidently
not the one the OP is testing.  From memory, you need a tuple that
contains no nulls, and all the columns to the left of the target
column have to be fixed-width datatypes.  Otherwise, the offset to
the target column is uncertain, and we have to search for it.

JIT decrease a overhead of this.

Regards

Pavel


                        regards, tom lane


On Mon, Jul 30, 2018 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2018-07-30 1:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
>> I found performance variance between accessing int1 and int200 column which
>> is quite large.

> Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> that tuples are deformed starting at the first attribute. If you ask
> for attribute 200 then it must deform 1-199 first.

Note that that can be optimized away in some cases, though evidently
not the one the OP is testing.  From memory, you need a tuple that
contains no nulls, and all the columns to the left of the target
column have to be fixed-width datatypes.  Otherwise, the offset to
the target column is uncertain, and we have to search for it.

JIT decrease a overhead of this.

The bottleneck here is such a simple construct, I don't see how JIT could improve it by much.

And indeed, in my hands JIT makes it almost 3 times worse.

Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT execution and 4594.994 ms for the JIT=off.
 
Cheers,

Jeff
Attachment


2018-07-30 13:19 GMT+02:00 Jeff Janes <jeff.janes@gmail.com>:
On Mon, Jul 30, 2018 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2018-07-30 1:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
>> I found performance variance between accessing int1 and int200 column which
>> is quite large.

> Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> that tuples are deformed starting at the first attribute. If you ask
> for attribute 200 then it must deform 1-199 first.

Note that that can be optimized away in some cases, though evidently
not the one the OP is testing.  From memory, you need a tuple that
contains no nulls, and all the columns to the left of the target
column have to be fixed-width datatypes.  Otherwise, the offset to
the target column is uncertain, and we have to search for it.

JIT decrease a overhead of this.

The bottleneck here is such a simple construct, I don't see how JIT could improve it by much.

And indeed, in my hands JIT makes it almost 3 times worse.

Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT execution and 4594.994 ms for the JIT=off.


Regards

Pavel

 
Cheers,

Jeff

On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:
> On Mon, Jul 30, 2018 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
> 
> > 2018-07-30 1:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
> >
> >> David Rowley <david.rowley@2ndquadrant.com> writes:
> >> > On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
> >> >> I found performance variance between accessing int1 and int200 column
> >> which
> >> >> is quite large.
> >>
> >> > Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> >> > that tuples are deformed starting at the first attribute. If you ask
> >> > for attribute 200 then it must deform 1-199 first.
> >>
> >> Note that that can be optimized away in some cases, though evidently
> >> not the one the OP is testing.  From memory, you need a tuple that
> >> contains no nulls, and all the columns to the left of the target
> >> column have to be fixed-width datatypes.  Otherwise, the offset to
> >> the target column is uncertain, and we have to search for it.
> >>
> >
> > JIT decrease a overhead of this.
> >
> 
> The bottleneck here is such a simple construct, I don't see how JIT could
> improve it by much.

The deparsing can become quite a bit faster with JITing, because we know
the column types and width. If intermittent columns are NOT NULL and
fixed width, we can even optimize processing them at runtime nearly
entirely.


> And indeed, in my hands JIT makes it almost 3 times worse.

Not in my measurement. Your example won't use JIT at all, because it's
below the cost threshold. So I think you might just be seeing cache +
hint bit effects?

> Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT
> execution and 4594.994 ms for the JIT=off.

Even with a debug LLVM build, which greatly increases compilation
overhead, I actually see quite the benefit when I force JIT to be used:


postgres[26832][1]=# ;SET jit_above_cost = -1; set jit_optimize_above_cost = 0; set jit_inline_above_cost = 0;
postgres[26832][1]=# explain (analyze, buffers, timing off) select pk, int200 from i200c200;
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            QUERY PLAN                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual rows=10000000 loops=1) │
│   Buffers: shared hit=133334                                                                      │
│ Planning Time: 0.069 ms                                                                           │
│ Execution Time: 3645.069 ms                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)



postgres[26832][1]=# ;SET jit_above_cost = 0; set jit_optimize_above_cost = 0; set jit_inline_above_cost = 0;
postgres[26832][1]=# explain (analyze, buffers, timing off) select pk, int200 from i200c200;
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            QUERY PLAN                                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual rows=10000000 loops=1) │
│   Buffers: shared hit=133334                                                                      │
│ Planning Time: 0.070 ms                                                                           │
│ JIT:                                                                                              │
│   Functions: 2                                                                                    │
│   Inlining: true                                                                                  │
│   Optimization: true                                                                              │
│ Execution Time: 3191.683 ms                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Now that's not *huge*, but nothing either.  And it's a win even though
JITing takes it good own time (we need to improve on that).


If I force all the bigint columns to be NOT NULL DEFAULT 0 the results
get more drastic:

postgres[28528][1]=#  ;SET jit_above_cost = 0; set jit_optimize_above_cost = 0; set jit_inline_above_cost = 0;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                             QUERY PLAN                                              │
├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on i200c200  (cost=0.00..2600000.00 rows=10000000 width=16) (actual rows=10000000 loops=1) │
│   Buffers: shared hit=2500000                                                                       │
│ Planning Time: 0.066 ms                                                                             │
│ JIT:                                                                                                │
│   Functions: 2                                                                                      │
│   Inlining: true                                                                                    │
│   Optimization: true                                                                                │
│ Execution Time: 4837.872 ms                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

postgres[28528][1]=#  ;SET jit_above_cost = -1; set jit_optimize_above_cost = 0; set jit_inline_above_cost = 0;
postgres[28528][1]=# explain (analyze, buffers, timing off) select pk, int200 from i200c200;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                             QUERY PLAN                                              │
├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on i200c200  (cost=0.00..2600000.00 rows=10000000 width=16) (actual rows=10000000 loops=1) │
│   Buffers: shared hit=2500000                                                                       │
│ Planning Time: 0.067 ms                                                                             │
│ Execution Time: 8192.236 ms                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘

that's because the JITed version essentially now boils down to a near
optimal loop around the intermittent bigint columns (which we deform
because we use a slot - at some point we're going to have to do
better).  No checks for the NULL bitmap, no alignment considerations,
all that's optimized away.

Greetings,

Andres Freund


On Mon, Jul 30, 2018 at 12:01 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-07-30 13:19 GMT+02:00 Jeff Janes <jeff.janes@gmail.com>:
On Mon, Jul 30, 2018 at 12:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2018-07-30 1:00 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 July 2018 at 17:38, Dinesh Kumar <dns98944@gmail.com> wrote:
>> I found performance variance between accessing int1 and int200 column which
>> is quite large.

> Have a look at slot_deform_tuple and heap_deform_tuple. You'll see
> that tuples are deformed starting at the first attribute. If you ask
> for attribute 200 then it must deform 1-199 first.

Note that that can be optimized away in some cases, though evidently
not the one the OP is testing.  From memory, you need a tuple that
contains no nulls, and all the columns to the left of the target
column have to be fixed-width datatypes.  Otherwise, the offset to
the target column is uncertain, and we have to search for it.

JIT decrease a overhead of this.

The bottleneck here is such a simple construct, I don't see how JIT could improve it by much.

And indeed, in my hands JIT makes it almost 3 times worse.

Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT execution and 4594.994 ms for the JIT=off.



The opt1 patch did get performance back to "at least do no harm" territory, but it didn't improve over JIT=off.  Adding the other two didn't get any further improvement.

I don't know where the time is going with the as-committed JIT.  None of the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to anything close to the slow-down I'm seeing.  Shouldn't compiling and optimization time show up there?

Cheers,

Jeff
Hi,

On 2018-07-30 18:01:34 +0200, Pavel Stehule wrote:
> look on
> http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html
> thread, please.

Given the results I just posted in the sibling email I don't think those
issues apply here.

Greetings,

Andres Freund


Hi,

On 2018-07-30 13:31:33 -0400, Jeff Janes wrote:
> I don't know where the time is going with the as-committed JIT.  None of
> the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to anything
> close to the slow-down I'm seeing.  Shouldn't compiling and optimization
> time show up there?

As my timings showed, I don't see the slowdown you're reporting. Could
you post a few EXPLAIN ANALYZEs?

Greetings,

Andres Freund


On Mon, Jul 30, 2018 at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:

> And indeed, in my hands JIT makes it almost 3 times worse.

Not in my measurement. Your example won't use JIT at all, because it's
below the cost threshold. So I think you might just be seeing cache +
hint bit effects?

No, it is definitely JIT.  The explain plans show it, and the cost of the query is 230,000 while the default setting of jit_above_cost is 100,000.  It is fully reproducible by repeatedly toggling the JIT setting.  It doesn't seem to be the cost of compiling the code that slows it down (I'm assuming the code is compiled once per tuple descriptor, not once per tuple), but rather the efficiency of the compiled code.

 

> Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT
> execution and 4594.994 ms for the JIT=off.

Even with a debug LLVM build, which greatly increases compilation
overhead, I actually see quite the benefit when I force JIT to be used:

I don't see a change when I compile without --enable-debug, and jit_debugging_support is off, or in 11beta2 nonexistent.  How can I know if I have a debug LLVM build, and turn it off if I do?  
 


postgres[26832][1]=# ;SET jit_above_cost = -1; set jit_optimize_above_cost = 0; set jit_inline_above_cost = 0;
postgres[26832][1]=# explain (analyze, buffers, timing off) select pk, int200 from i200c200;

Lowering jit_optimize_above_cost does redeem this for me.  It brings it back to being a tie with JIT=OFF.  I don't see any further improvement by lowering jit_inline_above_cost, and overall it is just a statistical tie with JIT=off, not an improvement as you get, but at least it isn't a substantial loss.

Under what conditions would I want to do jit without doing optimizations on it?  Is there a rule of thumb that could be documented, or do we just use the experimental method for each query?

I don't know how sensitive JIT is to hardware.  I'm using Ubuntu 16.04 on VirtualBox (running on Windows 10) on an i5-7200U, which might be important.

I had previously done a poor-man's JIT where I created 4 versions of the main 'for' loop in slot_deform_tuple.  I did a branch on "if(hasnulls)", and then each branch had two loops, one for when 'slow' is false, and then one for after 'slow' becomes true so we don't have to keep setting it true again once it already is, in a tight loop.  I didn't see noticeable improvement there (although perhaps I would have on different hardware), so didn't see how JIT could help with this almost-entirely-null case.  I'm not trying to address JIT in general, just as it applies to this particular case.

Unrelated to JIT and relevant to the 'select pk, int199' case but not the 'select pk, int200' case, it seems we have gone to some length to make slot deforming be efficient for incremental use, but then just deform in bulk anyway up to maximum attnum used in the query, at least in this case.  Is that because incremental deforming is not cache efficient?

Cheers,

Jeff
Hi,

On 2018-07-31 12:56:26 -0400, Jeff Janes wrote:
> On Mon, Jul 30, 2018 at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:
> >
> > > And indeed, in my hands JIT makes it almost 3 times worse.
> >
> > Not in my measurement. Your example won't use JIT at all, because it's
> > below the cost threshold. So I think you might just be seeing cache +
> > hint bit effects?
> >
> 
> No, it is definitely JIT.  The explain plans show it, and the cost of the
> query is 230,000 while the default setting of jit_above_cost is 100,000.
> It is fully reproducible by repeatedly toggling the JIT setting.  It
> doesn't seem to be the cost of compiling the code that slows it down (I'm
> assuming the code is compiled once per tuple descriptor, not once per
> tuple), but rather the efficiency of the compiled code.

Interesting. I see a smaller benefit without opt, but still one. I guess
that depends on code emission.


> > > Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT
> > > execution and 4594.994 ms for the JIT=off.
> >
> > Even with a debug LLVM build, which greatly increases compilation
> > overhead, I actually see quite the benefit when I force JIT to be used:
> >
> 
> I don't see a change when I compile without --enable-debug,
> and jit_debugging_support is off, or in 11beta2 nonexistent.  How can I
> know if I have a debug LLVM build, and turn it off if I do?

llvm-config --assertion-mode should tell you.


> > postgres[26832][1]=# ;SET jit_above_cost = -1; set jit_optimize_above_cost
> > = 0; set jit_inline_above_cost = 0;
> > postgres[26832][1]=# explain (analyze, buffers, timing off) select pk,
> > int200 from i200c200;
> >
> 
> Lowering jit_optimize_above_cost does redeem this for me.  It brings it
> back to being a tie with JIT=OFF.  I don't see any further improvement by
> lowering jit_inline_above_cost, and overall it is just a statistical tie
> with JIT=off, not an improvement as you get, but at least it isn't a
> substantial loss.

Interesting, as posted, I do see quite measurable improvements. What's
your version of LLVM?


> Under what conditions would I want to do jit without doing optimizations on
> it?  Is there a rule of thumb that could be documented, or do we just use
> the experimental method for each query?

I don't think we quite know yet. Optimization for larger queries can
take a while. For expression heavy queries there's a window where JITing
can help, but optimization can be beneficial.


> I had previously done a poor-man's JIT where I created 4 versions of the
> main 'for' loop in slot_deform_tuple.  I did a branch on "if(hasnulls)",
> and then each branch had two loops, one for when 'slow' is false, and then
> one for after 'slow' becomes true so we don't have to keep setting it true
> again once it already is, in a tight loop.  I didn't see noticeable
> improvement there (although perhaps I would have on different hardware), so
> didn't see how JIT could help with this almost-entirely-null case.  I'm not
> trying to address JIT in general, just as it applies to this particular
> case.

I don't see how it follows from that observation that JITing can't be
beneficial? The bitmap access alone can be optimized if you unroll the
loop (as now the offsets into it are constant). The offset computations
into tts_values/isnull aren't dynamic anymore. The loop counter is
gone. And nearly all tuple have hasnulls set, so specializing for that
case isn't going to get you that much, it's perfectly predictable.


> Unrelated to JIT and relevant to the 'select pk, int199' case but not the
> 'select pk, int200' case, it seems we have gone to some length to make slot
> deforming be efficient for incremental use, but then just deform in bulk
> anyway up to maximum attnum used in the query, at least in this case.  Is
> that because incremental deforming is not cache efficient?

Well, that's not *quite* how it works: We always deform up to the point
used in a certain "level" of the query. E.g. if a select's where clause
needs something up to attribute 3, the seqscan might deform only up to
there, even though an aggregate ontop of that might need up to 10.  But
yes, you're right, uselessly incrementally deforming isn't cache
efficient.

I think before long we're going to have to change the slot mechanism so
we don't deform columns we don't actually need. I.e we'll need something
like a bitmap of needed columns and skip over unneeded ones. When not
JITed that'll allow us to skip copying such columns (removing an 8 and 1
byte write), when JITing we can do better, and e.g. entirely skip
processing fixed width NOT MULL columns that aren't needed.

Greetings,

Andres Freund


On Mon, Jul 30, 2018 at 3:02 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-07-30 13:31:33 -0400, Jeff Janes wrote:
> I don't know where the time is going with the as-committed JIT.  None of
> the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to anything
> close to the slow-down I'm seeing.  Shouldn't compiling and optimization
> time show up there?

As my timings showed, I don't see the slowdown you're reporting. Could
you post a few EXPLAIN ANALYZEs?


I don't think you showed any timings where jit_above_cost < query cost < jit_optimize_above_cost, which is where I saw the slow down.  (That is also where things naturally land for me using default settings)

I've repeated my test case on a default build (./configure --with-llvm --prefix=....) and default postgresql.conf, using the post-11BETA2 commit 5a71d3e.


I've attached the full test case, and the full output.

Here are the last two executions, with jit=on and jit=off, respectively.  Doing it with TIMING OFF doesn't meaningfully change things, nor does increasing shared_buffers beyond the default.



                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual time=29.317..11966.291 rows=10000000 loops=1)
 Planning Time: 0.034 ms
 JIT:
   Functions: 2
   Generation Time: 1.589 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 9.002 ms
   Emission Time: 19.948 ms
 Execution Time: 12375.493 ms
(11 rows)

Time: 12376.281 ms (00:12.376)
SET
Time: 1.955 ms
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual time=0.063..3897.302 rows=10000000 loops=1)
 Planning Time: 0.037 ms
 Execution Time: 4292.400 ms
(3 rows)

Time: 4293.196 ms (00:04.293)

Cheers,

Jeff
Attachment

Re: Performance difference in accessing differrent columns in aPostgres Table

From
Dinesh Kumar
Date:
Hi All,
I was wondering whether the case is solved or still continuing. As a Postgres newbie, I can't understand any of the terms (JIT, tuple deformation) as you mentioned above. Please anyone let me know , what is the current scenario.

Thanks,
Dineshkumar.

On Wed, Aug 1, 2018 at 8:51 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 30, 2018 at 3:02 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-07-30 13:31:33 -0400, Jeff Janes wrote:
> I don't know where the time is going with the as-committed JIT.  None of
> the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to anything
> close to the slow-down I'm seeing.  Shouldn't compiling and optimization
> time show up there?

As my timings showed, I don't see the slowdown you're reporting. Could
you post a few EXPLAIN ANALYZEs?


I don't think you showed any timings where jit_above_cost < query cost < jit_optimize_above_cost, which is where I saw the slow down.  (That is also where things naturally land for me using default settings)

I've repeated my test case on a default build (./configure --with-llvm --prefix=....) and default postgresql.conf, using the post-11BETA2 commit 5a71d3e.


I've attached the full test case, and the full output.

Here are the last two executions, with jit=on and jit=off, respectively.  Doing it with TIMING OFF doesn't meaningfully change things, nor does increasing shared_buffers beyond the default.



                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual time=29.317..11966.291 rows=10000000 loops=1)
 Planning Time: 0.034 ms
 JIT:
   Functions: 2
   Generation Time: 1.589 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 9.002 ms
   Emission Time: 19.948 ms
 Execution Time: 12375.493 ms
(11 rows)

Time: 12376.281 ms (00:12.376)
SET
Time: 1.955 ms
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on i200c200  (cost=0.00..233332.28 rows=9999828 width=16) (actual time=0.063..3897.302 rows=10000000 loops=1)
 Planning Time: 0.037 ms
 Execution Time: 4292.400 ms
(3 rows)

Time: 4293.196 ms (00:04.293)

Cheers,

Jeff

Re: Performance difference in accessing differrent columns in aPostgres Table

From
Jeff Janes
Date:
On Wed, Sep 5, 2018 at 12:21 AM Dinesh Kumar <dns98944@gmail.com> wrote:
Hi All,
I was wondering whether the case is solved or still continuing. As a Postgres newbie, I can't understand any of the terms (JIT, tuple deformation) as you mentioned above. Please anyone let me know , what is the current scenario.


JIT is a just-in-time compilation, which will be new in v11.  Tuple deforming is how you get the row from the on-disk format to the in-memory format.

Some people see small improvements in tuple deforming using JIT in your situation, some see large decreases, depending on settings and apparently on hardware.  But regardless, JIT is not going to reduce your particular use case (many nullable and actually null columns, referencing a high-numbered column) down to being constant-time operation in the number of preceding columns.  Maybe JIT will reduce the penalty for accessing a high-numbered column by 30%, but won't reduce the penalty by 30 fold.  Put your NOT NULL columns first and then most frequently accessed NULLable columns right after them, if you can.

Cheers,

Jeff

Re: Performance difference in accessing differrent columns in aPostgres Table

From
Jeff Janes
Date:


On Wed, Sep 5, 2018 at 12:00 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Sep 5, 2018 at 12:21 AM Dinesh Kumar <dns98944@gmail.com> wrote:
Hi All,
I was wondering whether the case is solved or still continuing. As a Postgres newbie, I can't understand any of the terms (JIT, tuple deformation) as you mentioned above. Please anyone let me know , what is the current scenario.


JIT is a just-in-time compilation, which will be new in v11.  Tuple deforming is how you get the row from the on-disk format to the in-memory format.

Some people see small improvements in tuple deforming using JIT in your situation, some see large decreases, depending on settings and apparently on hardware.  But regardless, JIT is not going to reduce your particular use case (many nullable and actually null columns, referencing a high-numbered column) down to being constant-time operation in the number of preceding columns.  Maybe JIT will reduce the penalty for accessing a high-numbered column by 30%, but won't reduce the penalty by 30 fold.  Put your NOT NULL columns first and then most frequently accessed NULLable columns right after them, if you can.

Correction: NOT NULL columns with fixed width types first.  Then of the columns which are either nullable or variable width types, put the most frequently accessed earlier.

Re: Performance difference in accessing differrent columns in aPostgres Table

From
Dinesh Kumar
Date:
Ok, will do that. Thanks a lot.

On Wed, Sep 5, 2018 at 9:37 PM Jeff Janes <jeff.janes@gmail.com> wrote:


On Wed, Sep 5, 2018 at 12:00 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Sep 5, 2018 at 12:21 AM Dinesh Kumar <dns98944@gmail.com> wrote:
Hi All,
I was wondering whether the case is solved or still continuing. As a Postgres newbie, I can't understand any of the terms (JIT, tuple deformation) as you mentioned above. Please anyone let me know , what is the current scenario.


JIT is a just-in-time compilation, which will be new in v11.  Tuple deforming is how you get the row from the on-disk format to the in-memory format.

Some people see small improvements in tuple deforming using JIT in your situation, some see large decreases, depending on settings and apparently on hardware.  But regardless, JIT is not going to reduce your particular use case (many nullable and actually null columns, referencing a high-numbered column) down to being constant-time operation in the number of preceding columns.  Maybe JIT will reduce the penalty for accessing a high-numbered column by 30%, but won't reduce the penalty by 30 fold.  Put your NOT NULL columns first and then most frequently accessed NULLable columns right after them, if you can.

Correction: NOT NULL columns with fixed width types first.  Then of the columns which are either nullable or variable width types, put the most frequently accessed earlier.