Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future? - Mailing list pgsql-general

From Peter J. Holzer
Subject Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future?
Date
Msg-id 20170505182853.GB19234@hjp.at
Whole thread Raw
In response to Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future?  ("Sven R. Kunze" <srkunze@mail.de>)
Responses Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future?
List pgsql-general
On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote:
> On 03.05.2017 12:57, Thomas Güttler wrote:
> >Am 02.05.2017 um 05:43 schrieb Jeff Janes:
> >>No.  You can certainly use PostgreSQL to store blobs.  But then, you
> >>need to store the PostgreSQL data **someplace**.
> >>If you don't store it in S3, you have to store it somewhere else.
> >
> >I don't understand what you mean here. AFAIK storing blobs in PG is not
> >recommended since it is not very efficient.
>
> Seems like several people here disagree with this conventional wisdom.

I think it depends very much on what level of "efficiency" you need. On
my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of
junk, but not super powerful either) I can retrieve a small blob from a
100GB table in about 0.1 ms, and for large blobs the speed approaches
200MB/s. For just about everything I'd do on that server (or even at
work) this is easily fast enough.

Sure, just telling the kernel "send data from file descriptor A (which
happens to be a file) to file descriptor B (a socket)" is a lot more
efficient than copying data from disk into a postgresql process, then
from that process to an application server, from that to the webserver
and that finally sends it to the socket. But if that just lets my server
be 99.9% idle instead of 99.0% idle, I haven't gained much. Similarly,
if my server spends 90% of it's resources doing other stuff, I won't
gain much by optimizing this (I should better optimize that other stuff
it's spending so much time on).

I am in this regard a firm believer in not optimizing prematurely. Do
whatever makes sense from an application point of view. If the blobs are
logically part of some other data (e.g. PDFs in a literature database),
store them together (either all of them in PostgreSQL, or all in some
NoSQL database, or maybe on stone tablets, if that makes sense for some
reason). Only if you have good reason[1] to believe that physically
separating data which logically belongs together will resolve a
bottleneck, then by all means separate them.

        hp

[1] "I read somewhere on the internet" is usually not a good reason.

--
   _  | Peter J. Holzer    | A coding theorist is someone who doesn't
|_|_) |                    | think Alice is crazy.
| |   | hjp@hjp.at         | -- John Gordon
__/   | http://www.hjp.at/ |    http://downlode.org/Etext/alicebob.html

Attachment

pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future?
Next
From: John R Pierce
Date:
Subject: Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis,amqp, s3 in the future?