Re: [SPAM] Re: Architectural question - Mailing list pgsql-performance
From | Moreno Andreo |
---|---|
Subject | Re: [SPAM] Re: Architectural question |
Date | |
Msg-id | 56F25E5E.4090609@evolu-s.it Whole thread Raw |
In response to | Re: [SPAM] Re: Architectural question (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Responses |
Re: [SPAM] Re: Architectural question
|
List | pgsql-performance |
Il 11/03/2016 17:37, Jim Nasby ha scritto: > On 2/22/16 8:40 AM, Moreno Andreo wrote: >> Il 18/02/2016 21:33, Jim Nasby ha scritto: >>> Depending on your needs, could could use synchronous replication as >>> part of that setup. You can even do that at a per-transaction level, >>> so maybe you use sync rep most of the time, and just turn it off when >>> inserting or updating BLOBS. >> This sounds good, and when everything is OK we have I/O operation split >> across the two servers; a small delay in synchronizing blobs should not >> be a big deal, even if something bad happens (because of XLOG), right? > > It all depends on what you can tolerate. You also don't have to use > synchronous replication; normal streaming replication is async, so if > you can stand to lose some data if one of the servers dies then you > can do that. I can't tolerate data loss, so synchronous replication is mandatory (I had a case this week of a customer asking for an old document that I couldn't find in the database, either if the "attach present" flag was true... and I had a bit of a hard time trying to convince the customer it was his fault... :-) ) > >>>> Last thing: should blobs (or the whole database directory itself) >>>> go in >>>> a different partition, to optimize performance, or in VM environment >>>> this is not a concern anymore? >>> >>> First: IMO concerns about blobs in the database are almost always >>> overblown. >> In many places I've been they say, at last, "BLOBs are slow". So I >> considered this as another point to analyze while designing server >> architecture. If you say "don't mind", then I won't. > > It all depends. They're certainly a lot slower than handling a single > int, but in many cases the difference just doesn't matter. The main goal is to be *quick*. A doctor with a patient on the other side of his desk does not want to wait, say, 30 seconds for a clinical record to open. Let me explain what is the main problem (actually there are 2 problems). 1. I'm handling health data, and sometines they store large images (say an hi-res image of an x-ray). When their team mates (spread all over the city, not in the same building) ask for that bitmap (that is, 20 megabytes), surely it can't be cached (images are loaded only if requested by user) and searching a 35k rows, 22 GB table for the matching image should not be that fast, even with proper indexing (patient record number) 2. When I load patient list, their photo must be loaded as well, because when I click on the table row, a small preview is shown (including a small thumbnail of the patient's photo). Obviously I can't load all thumbs while loading the whole patient list (the list can be up to 4-5000 records and photo size is about 4-500kBytes, so it would be an enormous piece of data to be downloaded. > >>> 30GB of blobs on modern hardware really isn't a big deal, and there's >>> a *lot* to be said for not having to write the extra code to manage >>> all that by hand. >> What do you mean? Extra code? > > If the blob is in the database then you have nothing extra to do. It's > handled just like all your other data. > > If it's a file in a file system then you need to: > > - Have application code that knows how and where to get at the file > - Have a way to make those files available on all your webservers > - Have completely separate backup and recovery plans for those files > > That's a lot of extra work. Sometimes it's necessary, but many times > it's not. In my case I think it's not necessary, since all blobs go into a bytea field in a table that's just for them. It's an approach that helps us keeping up with privacy, since all blobs are encrypted, and can be accessed only by application. > >>> When it comes to your disk layout, the first things I'd look at >>> would be: >>> >>> - Move the temporary statistics directory to a RAM disk >>> - Move pg_xlog to it's own partition >> So I need another vDisk, not that big, for pg_xlog? > > Yeah, but note that with virtualization that may or may not help. I was afraid of that. With virtualization we are bound to that hardware lying behind us, and that we can't see nor control. Even if we create 2 vDisk, they should be bound to the same host spindles, and so having two vDisk is completely useless. I'm thinking of increase checkpoint_segments interval, so In the next two week I should have the VM deployed, so I'll see what I'll have in terms of speed and response (looking at the amount we are paying, I hope it will be a very FAST machine... :-D) Thanks Moreno.-
pgsql-performance by date: