Re: PostgreSQL Write Performance - Mailing list pgsql-general

From Alban Hertroys
Subject Re: PostgreSQL Write Performance
Date
Msg-id 8A4E4050-34DE-4459-979C-3752E9C02F21@solfertje.student.utwente.nl
Whole thread Raw
In response to Re: PostgreSQL Write Performance  (Yan Cheng Cheok <yccheok@yahoo.com>)
List pgsql-general
On 5 Jan 2010, at 8:30, Yan Cheng Cheok wrote:

>>> What is the actual problem you are trying to solve?
>
> I am currently developing a database system for a high speed measurement machine.
>
> The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement
resultfor every single unit. Hence, the time taken by record down the measurement result shall be far more less than
milliseconds,so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to
finishwriting, before performing measurement on next unit) 
>
> Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to
customers.


That flat file can help you with clustering INSERTs together and also means you'll have all your measurements ready for
asingle INSERT. The latter is useful if you're planning on using arrays to store your measurements, as updating array
valuesrequires the entire array to be rewritten to the database. I don't know how your measurements would arrive
withoutthe flat file, but I wouldn't be surprised if the measurements for a single unit would come out at different
pointsin time, which would be a bit painful with arrays (not quite as much with a measurements table though). 

A safe approach (with minimal risk of data loss) would be to split your flat file every n units (earlier in this thread
anumber of n=1000 was mentioned) and store that data using COPY in the format COPY expects. You will probably also want
tokeep a queue-table (which is just a normal table, but it's used like a queue) with the names of the flat files that
needprocessing. 

I haven't done this kind of thing before, but I envision it something like this:

CREATE TABLE unit (
    id        bigserial    NOT NULL,
    date        date        NOT NULL DEFAULT CURRENT_DATE,
    measured    text[],
    measurements    numeric(4,3)[]
);

CREATE TABLE queue (
    file        text        NOT NULL,
    definitive    boolean        DEFAULT False
);

---file-2010-01-05-00000001---
/* Update in it's own transaction so that we know we tried to process this file
 * even if the transaction rolls back.
 */
UPDATE queue SET definitive = True
 WHERE file = 'file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00000001';

/* Start work */
BEGIN;
COPY unit FROM STDIN;
1    {Width,Height}    {0.001,0.021}
2    {Width,Height}    {0.002,0.019}
...
999    {Width,Height}    {0.000,0.018}
\.

/* This file was processed and can be removed from the queue */
DELETE FROM queue WHERE file='file-2010-01-05-00000001';
COMMIT;

/* This will probably be the name of the next flat file, but we don't know that
 * for sure yet. It needs to be outside the transaction as otherwise CURRENT_DATE
 * will have the date of the start of the transaction and we need to know what the
 * next batch will be regardless of whether this one succeeded.
 */
INSERT INTO queue (file, definitive)
VALUES ('file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00001000', False);
---end of file---

You'd need a little program (a script will probably work) to read that queue table and send the commands in those files
tothe database. Don't forget that at the start the queue table will be empty ;) I recall some of this lists' members
wroteup a webpage about how to implement queue-tables reliably. 

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:737,4b431caa10731320433375!



pgsql-general by date:

Previous
From: Milan Zamazal
Date:
Subject: Large tables, ORDER BY and sequence/index scans
Next
From: Alban Hertroys
Date:
Subject: Re: Insert Data Into Tables Linked by Foreign Key