Re: Wierd context-switching issue on Xeon - Mailing list pgsql-performance

From Tom Lane
Subject Re: Wierd context-switching issue on Xeon
Date
Msg-id 1407.1082419316@sss.pgh.pa.us
Whole thread Raw
In response to Re: Wierd context-switching issue on Xeon  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Wierd context-switching issue on Xeon  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Wierd context-switching issue on Xeon  (Joe Conway <mail@joeconway.com>)
Re: Wierd context-switching issue on Xeon  (Jeff <threshar@torgo.978.org>)
Re: Wierd context-switching issue on Xeon  ("Matt Clark" <matt@ymogen.net>)
Re: Wierd context-switching issue on Xeon  ("J. Andrew Rogers" <jrogers@neopolitan.com>)
Re: Wierd context-switching issue on Xeon  (ohp@pyrenet.fr)
List pgsql-performance
Here is a test case.  To set up, run the "test_setup.sql" script once;
then launch two copies of the "test_run.sql" script.  (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.)  Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.

What I actually see on Josh's client's machine is a context swap storm:
"vmstat 1" shows CS rates around 170K/sec.  strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in.  top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.

I haven't bothered to check how long the test_run query takes, but if it
ends while you're still examining the behavior, just start it again.

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.

            regards, tom lane

drop table test_data;

create table test_data(f1 int);

insert into test_data values (random() * 100);
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;

create index test_index on test_data(f1);

vacuum verbose analyze test_data;
checkpoint;
-- force nestloop indexscan plan
set enable_seqscan to 0;
set enable_mergejoin to 0;
set enable_hashjoin to 0;

explain
select count(*) from test_data a, test_data b, test_data c
where a.f1 = b.f1 and b.f1 = c.f1;

select count(*) from test_data a, test_data b, test_data c
where a.f1 = b.f1 and b.f1 = c.f1;

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Wierd context-switching issue on Xeon
Next
From: Tom Lane
Date:
Subject: Re: Wierd context-switching issue on Xeon