Thread: OFFSET/LIMIT - Disparate Performance w/ Go application

OFFSET/LIMIT - Disparate Performance w/ Go application

From
Matt Silverlock
Date:
Hi all. This might be tricky in so much as there’s a few moving parts (when isn’t there?), but I’ve tried to test the postgres side as much as possible.

Trying to work out a potential database bottleneck with a HTTP application (written in Go):
  • Pages that render HTML templates but don’t perform DB queries can hit ~36k+ req/s
  • Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, "SELECT * FROM listings WHERE id = $1 AND expiry_date > current_date", l.Id)
  • Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s
  • There’s very little “extra” logic around these queries: you can find the code here (about 39 lines for both functions) https://gist.github.com/elithrar/b2497b0b473da64932b5
Other pertinent details:
  • It’s always been about this slow to my knowledge
  • The table is a test database with about 40 rows, although performance doesn’t change noticeably even with a few hundred (it’s unlikely to ever be more than a 10,000 rows over its lifetime)
  • Running PostgreSQL 9.3.4 on OS X w/ a 3.5GHz i7, 12GB RAM, 128GB PCI-E SSD.
  • The Go application peaks at about 40MB memory when hitting 37k req/s — so there doesn’t appear to be an issue of it eating into the available RAM on the machine
  • I’m also aware that HTTP benchmarks aren’t the most reliable thing, but I’m using wrk -c 400 -t 32 -15s to stress it out
The application has a connection pool via the lib/pq driver (https://github.com/lib/pq) with MaxOpen set to 256 connections. Stack size is 8GB and max socket connections are set to 1024 (running out of FDs isn’t the problem here from what I can see).

Relevant postgresql.conf settings — everything else should be default, including fsync/synchronous commits (on) for obvious reasons:
max_connections = 512
shared_buffers = 2048MB
temp_buffers = 16MB
work_mem = 4MB
wal_buffers = 16
checkpoint_segments = 16
random_page_cost = 2.0
effective_cache_size = 8192MB
The query in question is: http://explain.depesz.com/s/7g8 and the table schema is as below:


Table "public.listings"
┌───────────────┬──────────────────────────┬───────────┐
│ Column │ Type │ Modifiers │
├───────────────┼──────────────────────────┼───────────┤
│ id │ character varying(17) │ not null │
│ title │ text │ │
│ company │ text │ │
│ location │ text │ │
│ description │ text │ │
│ rendered_desc │ text │ │
│ term │ text │ │
│ commute │ text │ │
│ company_url │ text │ │
│ rep │ text │ │
│ rep_email │ text │ │
│ app_method │ text │ │
│ app_email │ text │ │
│ app_url │ text │ │
│ posted_date │ timestamp with time zone │ │
│ edited_date │ timestamp with time zone │ │
│ renewed_date │ timestamp with time zone │ │
│ expiry_date │ timestamp with time zone │ │
│ slug │ text │ │
│ charge_id │ text │ │
│ sponsor_id │ text │ │
│ tsv │ tsvector │ │
└───────────────┴──────────────────────────┴───────────┘
Indexes:
"listings_pkey" PRIMARY KEY, btree (id)
"fts" gin (tsv)
"listings_expiry_date_idx" btree (expiry_date)
"listings_fts_idx" gin (to_tsvector('english'::regconfig, (((((((title || ' '::text) || company) || ' '::text) || location) || ' '::text) || term) || ' '::text) || commute))
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON listings FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('tsv', 'pg_catalog.english', 'title', 'company', 'location', 'term', 'commute’)


The single row query has a query plan here: http://explain.depesz.com/s/1Np (this is where I see 6.6k req/s at the application level), 

Some pgbench results from this machine as well:
$ pgbench -c 128 -C -j 4 -T 15 -M extended -S
starting vacuum...end.
transaction type: SELECT only
scaling factor: 1
query mode: extended
number of clients: 128
number of threads: 4
duration: 15 s
number of transactions actually processed: 17040
tps = 1134.481459 (including connections establishing)
tps = 56884.093652 (excluding connections establishing)
Ultimately I'm not expecting a miracle—database ops are nearly always the slowest part of a web server outside the latency to the client itself—but I'd expect something a little closer (even 10% of 33k would be a lot better). And of course, this is somewhat "academic" because I don't expect to see four million hits an hour—but I'd also like to catch problems for future reference.

Thanks in advance.




Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Tom Lane
Date:
Matt Silverlock <matt@eatsleeprepeat.net> writes:
> Hi all. This might be tricky in so much as there’s a few moving parts (when isn’t there?), but I’ve tried to test the
postgresside as much as possible. 
> Trying to work out a potential database bottleneck with a HTTP application (written in Go):
> Pages that render HTML templates but don’t perform DB queries can hit ~36k+ req/s
> Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, "SELECT * FROM listings WHERE id = $1
ANDexpiry_date > current_date", l.Id) 
> Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s

You don't show us exactly what you're doing with OFFSET/LIMIT, but I'm
going to guess that you're using it to paginate large query results.
That's basically always going to suck: Postgres has no way to implement
OFFSET except to generate and then throw away that number of initial rows.
If you do the same query over again N times with different OFFSETs, it's
going to cost you N times as much as the base query would.

If the application's interaction with the database is stateless then you
may not have much choice, but if you do have a choice I'd suggest doing
pagination by means of fetching from a cursor rather than independent
queries.

            regards, tom lane


Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Merlin Moncure
Date:
On Thu, Jun 12, 2014 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Matt Silverlock <matt@eatsleeprepeat.net> writes:
>> Hi all. This might be tricky in so much as there’s a few moving parts (when isn’t there?), but I’ve tried to test
thepostgres side as much as possible. 
>> Trying to work out a potential database bottleneck with a HTTP application (written in Go):
>> Pages that render HTML templates but don’t perform DB queries can hit ~36k+ req/s
>> Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, "SELECT * FROM listings WHERE id = $1
ANDexpiry_date > current_date", l.Id) 
>> Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s
>
> You don't show us exactly what you're doing with OFFSET/LIMIT, but I'm
> going to guess that you're using it to paginate large query results.
> That's basically always going to suck: Postgres has no way to implement
> OFFSET except to generate and then throw away that number of initial rows.
> If you do the same query over again N times with different OFFSETs, it's
> going to cost you N times as much as the base query would.
>
> If the application's interaction with the database is stateless then you
> may not have much choice, but if you do have a choice I'd suggest doing
> pagination by means of fetching from a cursor rather than independent
> queries.

Well, you can also do client side pagination using the row-wise
comparison feature, implemented by you :-).  Cursors can be the best
approach, but it's nice to know the client side approach if you're
really stateless and/or want to be able to pick up external changes
during the browse.

SELECT * FROM listings
WHERE (id, expiry_date) > (last_id_read, last_expiry_date_read)
ORDER BY id, expiry_date LIMIT x.

then you just save off the highest id, date pair and feed it back into
the query.   This technique is usefui for emulating ISAM browse
operations.

merlin


Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Andreas Joseph Krogh
Date:
På torsdag 12. juni 2014 kl. 16:58:06, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Matt Silverlock <matt@eatsleeprepeat.net> writes:
> Hi all. This might be tricky in so much as there���s a few moving parts (when isn���t there?), but I���ve tried to test the postgres side as much as possible.
> Trying to work out a potential database bottleneck with a HTTP application (written in Go):
> Pages that render HTML templates but don���t perform DB queries can hit ~36k+ req/s
> Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, "SELECT * FROM listings WHERE id = $1 AND expiry_date > current_date", l.Id)
> Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s

You don't show us exactly what you're doing with OFFSET/LIMIT, but I'm
going to guess that you're using it to paginate large query results.
That's basically always going to suck: Postgres has no way to implement
OFFSET except to generate and then throw away that number of initial rows.
If you do the same query over again N times with different OFFSETs, it's
going to cost you N times as much as the base query would.
 
Are there any plans to make PG implement OFFSET more efficiently, so it doesn't have to "read and throw away"?
 
I used SQL Server back in 2011 in a project and seem to remember they implemented offset pretty fast. Paging in a resultset of millions was much faster than in PG.
 
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Merlin Moncure
Date:
On Thu, Jun 12, 2014 at 2:48 PM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
>
> På torsdag 12. juni 2014 kl. 16:58:06, skrev Tom Lane <tgl@sss.pgh.pa.us>:
>
> Matt Silverlock <matt@eatsleeprepeat.net> writes:
> > Hi all. This might be tricky in so much as there���s a few moving parts (when isn���t there?), but I���ve tried to
testthe postgres side as much as possible. 
> > Trying to work out a potential database bottleneck with a HTTP application (written in Go):
> > Pages that render HTML templates but don���t perform DB queries can hit ~36k+ req/s
> > Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, "SELECT * FROM listings WHERE id = $1
ANDexpiry_date > current_date", l.Id) 
> > Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s
>
> You don't show us exactly what you're doing with OFFSET/LIMIT, but I'm
> going to guess that you're using it to paginate large query results.
> That's basically always going to suck: Postgres has no way to implement
> OFFSET except to generate and then throw away that number of initial rows.
> If you do the same query over again N times with different OFFSETs, it's
> going to cost you N times as much as the base query would.
>
> Are there any plans to make PG implement OFFSET more efficiently, so it doesn't have to "read and throw away"?
>
> I used SQL Server back in 2011 in a project and seem to remember they implemented offset pretty fast. Paging in a
resultsetof millions was much faster than in PG. 

I doubt it.  Offset is widely regarded as being pretty dubious. SQL
has formally defined the way to do this (cursors) and optimizing
offset would be complex for such a little benefit.   Speaking
generally SQL server also has some trick optimizations of other
constucts like fast count(*) but in my experience underperforms pg in
many areas.

merlin


Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Jeff Janes
Date:
On Thu, Jun 12, 2014 at 12:08 AM, Matt Silverlock
<matt@eatsleeprepeat.net> wrote:
>
> Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s

Is that tested at the OFFSET and LIMIT of 0 and 15, as shown in the
explain plan?


> The query in question is: http://explain.depesz.com/s/7g8 and the table schema is as below:

The reported runtime of 0.078 ms should be able to sustain nearly 10
times the reported rate of 1.3k/s, so the bottleneck would seem to be
elsewhere.

Perhaps the bottleneck is formatting the result set in postgres to be
sent over the wire, then sending it over the wire, then parsing it in
the Go connection library to hand back to the Go user code, and then
the Go user code doing something meaningful with it.

What happens if you get rid of the offset and the order by, and just
use limit?  I bet it doesn't change the speed much (because that is
not where the bottleneck is).

You seem to be selecting an awful lot of wide columns.  Do you really
need to see all of them?

>
> Some pgbench results from this machine as well:
>
> $ pgbench -c 128 -C -j 4 -T 15 -M extended -S

This is just benchmarking how fast you can make and break connections
to the database.

Because your app is using an embedded connection pooler, this
benchmark isn't very relevant to your situation.


>
> Ultimately I'm not expecting a miracle—database ops are nearly always the slowest part
> of a web server outside the latency to the client itself—but I'd expect something a little
> closer (even 10% of 33k would be a lot better). And of course, this is somewhat "academic"
> because I don't expect to see four million hits an hour—but I'd also like to catch problems
> for future reference.

I think you have succeeded in doing that.  If you want to get
substantially faster than the current speed in the future, you will
need a web-app-side results cache for this type of query.

I can't imagine the results of such a query change more than 1300
times/s, nor that anyone would notice or care if the observed results
which were stale by two or three seconds.

That type of cache is a PITA, and I've never needed one because I also
don't expect to get 4 millions hits an hour.  But if this is what your
future looks like, you'd be best off to embrace it sooner rather than
later.


Cheers,

Jeff


Re: OFFSET/LIMIT - Disparate Performance w/ Go application

From
Matt Silverlock
Date:
Thanks for the replies Jeff, Tom and Merlin.

Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s

Is that tested at the OFFSET and LIMIT of 0 and 15, as shown in the
explain plan?


Yes — 0 (OFFSET) and 16 (LIMIT), or 15 and 31 (i.e. “second page” of results). There’s no difference on that front. For context, OFFSET is a multiple of 15 (i.e. 15 results per page) and LIMIT is always 15 + 1 in an attempt to fetch one more result, get the len of the returned slice and then return paginate true + slice the last result off if there’s more than 15.



The query in question is: http://explain.depesz.com/s/7g8 and the table schema is as below:

The reported runtime of 0.078 ms should be able to sustain nearly 10
times the reported rate of 1.3k/s, so the bottleneck would seem to be
elsewhere.

Perhaps the bottleneck is formatting the result set in postgres to be
sent over the wire, then sending it over the wire, then parsing it in
the Go connection library to hand back to the Go user code, and then
the Go user code doing something meaningful with it.

What happens if you get rid of the offset and the order by, and just
use limit?  I bet it doesn't change the speed much (because that is
not where the bottleneck is).

You seem to be selecting an awful lot of wide columns.  Do you really
need to see all of them?

- Testing SELECT * FROM … with just LIMIT 15 and no offset yields 1299 request/s at the front end of the application.
- Testing SELECT id, title, company, location, commute, term, expiry_date (a subset of fields) with LIMIT 15 and no OFFSET yields 1800 request/s at the front end.

There’s definitely an increase to be realised there (I’d say by just tossing the rendered HTML field). 

Based on your comments about the Go side of things, I ran a quick test by cutting the table down to 6 records from the 39 in the test DB in all previous tests. This skips the pagination logic (below) and yields 3068 req/s on the front-end. 

// Determine if we have more than one page of results.
// If so, trim the extra result off and set pagination = true
if len(listings) > opts.PerPage {
paginate = true
listings = listings[:opts.PerPage]
}

So there certainly appears to be a bottleneck on the Go side as well (outside of even the DB driver), probably from the garbage generated from slicing the slice, although I’d be keen to know if there’s a better way to approach returning a paginated list of results.

Well, you can also do client side pagination using the row-wise
comparison feature, implemented by you :-).  Cursors can be the best
approach, but it's nice to know the client side approach if you're
really stateless and/or want to be able to pick up external changes
during the browse.


What would be a better approach here? The cursor approach isn’t ideal in my case (although I could make it work), but what other options are there that are stateless?



Some pgbench results from this machine as well:

$ pgbench -c 128 -C -j 4 -T 15 -M extended -S

This is just benchmarking how fast you can make and break connections
to the database.

Because your app is using an embedded connection pooler, this
benchmark isn't very relevant to your situation.

Noted — thanks.


On 13 Jun 2014, at 4:46 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

<snip>