Re: Performance issues - Mailing list pgsql-performance

From Vivekanand Joshi
Subject Re: Performance issues
Date
Msg-id 849e75d9463938372dd4c516f5c5d89c@mail.gmail.com
Whole thread Raw
In response to Re: Performance issues  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: Performance issues  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-performance
Hi Guys,

So here is the full information attached as well as in the link provided
below:

http://pgsql.privatepaste.com/41207bea45

I can provide new information as well.

Would like to see if queries of these type can actually run in postgres
server?

If yes, what would be the minimum requirements for hardware? We would like
to migrate our whole solution on PostgreSQL as we can spend on hardware as
much as we can but working on a proprietary appliance is becoming very
difficult for us.

Vivek


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Saturday, March 14, 2015 3:56 AM
To: Varadharajan Mukundan
Cc: vjoshi@zetainteractive.com; Tomas Vondra;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On Fri, Mar 13, 2015 at 4:03 PM, Varadharajan Mukundan
<srinathsmn@gmail.com> wrote:
>> We might even consider taking experts advice on how to tune queries
>> and server, but if postgres is going to behave like this, I am not
>> sure we would be able to continue with it.
>>
>> Having said that, I would day again that I am completely new to this
>> territory, so I might miss lots and lots of thing.
>
> My two cents: Postgres out of the box might not be a good choice for
> data warehouse style queries, that is because it is optimized to run
> thousands of small queries (OLTP style processing) and not one big
> monolithic query. I've faced similar problems myself before and here
> are few tricks i followed to get my elephant do real time adhoc
> analysis on a table with ~45 columns and few billion rows in it.
>
> 1. Partition your table! use constraint exclusion to the fullest
> extent 2. Fire multiple small queries distributed over partitions and
> aggregate them at the application layer. This is needed because, you
> might to exploit all your cores to the fullest extent (Assuming that
> you've enough memory for effective FS cache). If your dataset goes
> beyond the capability of a single system, try something like Stado
> (GridSQL)
> 3. Storing index on a RAM / faster disk disk (using tablespaces) and
> using it properly makes the system blazing fast. CAUTION: This
> requires some other infrastructure setup for backup and recovery 4. If
> you're accessing a small set of columns in a big table and if you feel
> compressing the data helps a lot, give this FDW a try -
> https://github.com/citusdata/cstore_fdw

Agreed here. IF you're gonna run reporting queries against postgresql you
have to optimize for fast seq scan stuff. I.e. an IO subsystem that can read
a big table in hundreds of megabytes per second.
Gigabytes if you can get it. A lot of spinning drives on a fast RAID card or
good software raid can do this on the cheapish, since a lot of times you
don't need big drives if you have a lot. 24 cheap 1TB drives that each can
read at ~100 MB/s can gang up on the data and you can read a 100GB in a few
seconds. But you can't deny physics. If you need to read a 2TB table it's
going to take time.

If you're only running 1 or 2 queries at a time, you can crank up the
work_mem to something crazy like 1GB even on an 8GB machine. Stopping sorts
from spilling to disk, or at least giving queries a big playground to work
in can make a huge difference. If you're gonna give big work_mem then
definitely limit connections to a handful. If you need a lot of persistent
connections then use a pooler.

The single biggest mistake people make in setting up reporting servers on
postgresql is thinking that the same hardware that worked well for
transactional stuff (a handful of SSDs and lots of memory) might not help
when you're working with TB data sets. The hardware you need isn't the same,
and using that for a reporting server is gonna result in sub-optimal
performance.

--
To understand recursion, one must first understand recursion.

Attachment

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Performance issues
Next
From: Tomas Vondra
Date:
Subject: Re: Performance issues