Thread: Inefficient escape codes.
This is my first post in the list. I have a deep low-level background on computer programming, but I am totally newbie to sql databases. I am using postgres because of its commercial license.
My problem is with storing large values. I have a database that stores large ammounts of data (each row consisting of up to 5MB). After carefully reading the Postgres 8.0 manual (the version I'm using), I was told that the best option was to create a bytea field.
Large objects are out of the line here since we have lots of tables.
As I understand it, the client needs to put the data into the server using a textual-based command. This makes the 5MB data grow up-to 5x, making it 25MB in the worst case. (Example: 0x01 -> \\001).
My question is:
1) Is there any way for me to send the binary field directly without needing escape codes?
2) Will this mean that the client actually wastes my network bandwidth converting binary data to text? Or does the client transparently manage this?
Thanks for any light on the subject,
Rodrigo
On Tue, Oct 18, 2005 at 06:07:12PM +0000, Rodrigo Madera wrote: > 1) Is there any way for me to send the binary field directly without needing > escape codes? In 7.4 and later the client/server protocol supports binary data transfer. If you're programming with libpq you can use PQexecParams() to send and/or retrieve values in binary instead of text. http://www.postgresql.org/docs/8.0/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN APIs built on top of libpq or that implement the protcol themselves might provide hooks to this capability; check your documentation. What language and API are you using? See also COPY BINARY: http://www.postgresql.org/docs/8.0/interactive/sql-copy.html > 2) Will this mean that the client actually wastes my network bandwidth > converting binary data to text? Or does the client transparently manage > this? Binary transfer sends data in binary, not by automatically converting to and from text. -- Michael Fuhr
[Please copy the mailing list on replies so others can participate in and learn from the discussion.] On Tue, Oct 18, 2005 at 07:09:08PM +0000, Rodrigo Madera wrote: > > What language and API are you using? > > I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++). I've only dabbled with libpqxx; I don't know if or how you can make it send data in binary instead of text. See the documentation or ask in a mailing list like libpqxx-general or pgsql-interfaces. > > Binary transfer sends data in binary, not by automatically converting > > to and from text. > > Uh, I'm sorry I didn't get that... If I send: insert into foo > values('\\001\\002') will libpq send 0x01, 0x02 or "\\\\001\\\\002"?? If you do it that way libpq will send the string as text with escape sequences; you can use a sniffer like tcpdump or ethereal to see this for yourself. To send the data in binary you'd call PQexecParams() with a query like "INSERT INTO foo VALUES ($1)". The $1 is a placeholder; the other arguments to PQexecParams() provide the data itself, the data type and length, and specify whether the data is in text format or binary. See the libpq documentation for details. -- Michael Fuhr
On 18/10/05, Michael Fuhr <mike@fuhr.org> wrote: > [Please copy the mailing list on replies so others can participate > in and learn from the discussion.] > > On Tue, Oct 18, 2005 at 07:09:08PM +0000, Rodrigo Madera wrote: > > > What language and API are you using? > > > > I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++). > > I've only dabbled with libpqxx; I don't know if or how you can make > it send data in binary instead of text. See the documentation or > ask in a mailing list like libpqxx-general or pgsql-interfaces. > > > > Binary transfer sends data in binary, not by automatically converting > > > to and from text. > > > > Uh, I'm sorry I didn't get that... If I send: insert into foo > > values('\\001\\002') will libpq send 0x01, 0x02 or "\\\\001\\\\002"?? > > If you do it that way libpq will send the string as text with escape > sequences; you can use a sniffer like tcpdump or ethereal to see this > for yourself. To send the data in binary you'd call PQexecParams() > with a query like "INSERT INTO foo VALUES ($1)". The $1 is a > placeholder; the other arguments to PQexecParams() provide the data > itself, the data type and length, and specify whether the data is in > text format or binary. See the libpq documentation for details. > You could base64 encode your data admitiaddly increasing it by 1/3 but it does at least convert it to text which means that its more unserstandable. base64 is also pritty standard being whats used in EMails for mime attachments. Peter
Rodrigo wrote: $$ As I understand it, the client needs to put the data into the server using a textual-based command. This makes the 5MB data grow up-to 5x, making it 25MB in the worst case. (Example: 0x01 -> \\001). My question is: 1) Is there any way for me to send the binary field directly without needing escape codes? 2) Will this mean that the client actually wastes my network bandwidth converting binary data to text? Or does the client transparently manage this? $$ [snip] I think the fastest, most efficient binary transfer of data to PostgreSQL via C++ is a STL wrapper to libpq. Previously I would not have recommended libqpxx for this purpose although this may have changed with the later releases. As others have commented you most certainly want to do this with the ExecParams/ExecPrepared interface. If you want to exclusively use libqxx then you need to find out if it exposes/wraps this function (IIRC libpqxx build on top of libpq). You can of course 'roll your own' libpq wrapper via STL pretty easily. For example, here is how I am making my SQL calls from my COBOL apps: typedef vector<string> stmt_param_strings; typedef vector<int> stmt_param_lengths; typedef vector<const char*> stmt_param_values; typedef vector<int> stmt_param_formats; [...] res = PQexecPrepared( _connection, stmt.c_str(), num_params, ¶m_values[0], ¶m_lengths[0], ¶m_formats[0], result_format); Executing data this way is a direct data injection to/from the server, no parsing/unparsing, no plan generating, etc. Also STL vectors play very nice with the libpq interface because it takes unterminated stings. Merlin
I guess, You should check, if a blob field and large object access is suitable for you - no escaping etc, just raw binarylarge objects. AFAIK, PQExecParams is not the right solution for You. Refer the "Large object" section: "28.3.5. Writing Data to a Large Object The function int lo_write(PGconn *conn, int fd, const char *buf, size_t len);writes len bytes from buf to large object descriptor fd.The fd argument must have been returned by a previous lo_open. The number of bytes actually written is returned. In theevent of an error, the return value is negative." regards, Narcus -----Ursprüngliche Nachricht----- Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]Im Auftrag von Michael Fuhr Gesendet: Dienstag, 18. Oktober 2005 22:47 An: Rodrigo Madera Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Inefficient escape codes. [Please copy the mailing list on replies so others can participate in and learn from the discussion.] On Tue, Oct 18, 2005 at 07:09:08PM +0000, Rodrigo Madera wrote: > > What language and API are you using? > > I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++). I've only dabbled with libpqxx; I don't know if or how you can make it send data in binary instead of text. See the documentation or ask in a mailing list like libpqxx-general or pgsql-interfaces. > > Binary transfer sends data in binary, not by automatically converting > > to and from text. > > Uh, I'm sorry I didn't get that... If I send: insert into foo > values('\\001\\002') will libpq send 0x01, 0x02 or "\\\\001\\\\002"?? If you do it that way libpq will send the string as text with escape sequences; you can use a sniffer like tcpdump or ethereal to see this for yourself. To send the data in binary you'd call PQexecParams() with a query like "INSERT INTO foo VALUES ($1)". The $1 is a placeholder; the other arguments to PQexecParams() provide the data itself, the data type and length, and specify whether the data is in text format or binary. See the libpq documentation for details. -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Hi! I'm experiencing a very slow deletion of records. Which I thin is not right. I have a Dual Xeon Server with 6gig Memory. I am only deleting about 22,000 records but it took me more than 1 hour to finish this. What could possibly I do so that I can make this fast? Here is the code inside my function: FOR temp_rec IN SELECT * FROM item_qc_doer LOOP DELETE FROM qc_session WHERE item_id = temp_rec.item_id; DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; END LOOP; Item_qc_oder table contains 22,000 records. I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html
Christian, Do you have foreign keys pointing to your table with ON CASCADE... ? Cause in that case you're not only deleting your 22000 records, but the whole tree of cascades. And if you don't have an index on one of those foreign keys, then you might have a sequential scan of the child table on each deleted row... I would check the foreign keys. HTH, Csaba. On Thu, 2005-10-20 at 10:43, Christian Paul B. Cosinas wrote: > Hi! > > I'm experiencing a very slow deletion of records. Which I thin is not right. > I have a Dual Xeon Server with 6gig Memory. > I am only deleting about 22,000 records but it took me more than 1 hour to > finish this. > > What could possibly I do so that I can make this fast? > > Here is the code inside my function: > > FOR temp_rec IN SELECT * FROM item_qc_doer LOOP > DELETE FROM qc_session WHERE item_id = temp_rec.item_id; > DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; > END LOOP; > > Item_qc_oder table contains 22,000 records. > > > I choose Polesoft Lockspam to fight spam, and you? > http://www.polesoft.com/refer.html > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
> What could possibly I do so that I can make this fast? > > Here is the code inside my function: > > FOR temp_rec IN SELECT * FROM item_qc_doer LOOP > DELETE FROM qc_session WHERE item_id = temp_rec.item_id; > DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; > END LOOP; > > Item_qc_oder table contains 22,000 records. I'd check to see if i have foreign keys on those tables and if the columns that refer to them are properly indexed. (For cascade delete or even just checking restrict) Chris
Hi, > What could possibly I do so that I can make this fast? > > Here is the code inside my function: > > FOR temp_rec IN SELECT * FROM item_qc_doer LOOP > DELETE FROM qc_session WHERE item_id = temp_rec.item_id; > DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; > END LOOP; Qhat about just using: DELETE FROM gc_session WHERE item_id IN (SELECT item_id FROM item_qc_doer) DELETE FROM item_qc_doer; It doesn't make sense to run 2 x 22.000 separate delete statements instead that only two... And... What about using a foreing key? Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
> Here is the code inside my function: > > FOR temp_rec IN SELECT * FROM item_qc_doer LOOP > DELETE FROM qc_session WHERE item_id = temp_rec.item_id; > DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; > END LOOP; > > Item_qc_oder table contains 22,000 records. Also, chekc you have an index on both those item_id columns. Also, why don't you just not use the loop and do this instead: DELETE FROM qc_session WHERE item_id IN (SELECT item_id FROM item_qc_doer); DELETE FROM item_qc_doer; Chris
HI!
I am having a confusion to the memory handling of postgreSQL.
Here is the Scenario.
I rebooted my Server which is a PostgreSQL 8.0 Running on Redhat 9, which is a Dual Xeon Server and 6 gig of memory.
Of course there is not much memory still used since it is just restarted.
But after a number of access to the tables the memory is being used and it is not being free up. Actually after this access to the database and the server is just idle
The memory is still used up. I am monitoring this using the “free” command which gives me about 5.5 gig of used memory and the rest free.
Is there something that I should do to minimize and free up the used memory?
Thanks You.
I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html
--On Freitag, Oktober 21, 2005 03:40:47 +0000 "Christian Paul B. Cosinas" <cpc@cybees.com> wrote: > I am having a confusion to the memory handling of postgreSQL. > I rebooted my Server which is a PostgreSQL 8.0 Running on Redhat 9, which > is a Dual Xeon Server and 6 gig of memory. > > Of course there is not much memory still used since it is just restarted. > > But after a number of access to the tables the memory is being used and > it is not being free up. Actually after this access to the database and > the server is just idle > > The memory is still used up. I am monitoring this using the "free" > command which gives me about 5.5 gig of used memory and the rest free. I suppose you looked at the top row of the free output? Because there the disk-cache is counted as "used"... Have a look at the second row where buffers are counted as free, which they more or less are. > Is there something that I should do to minimize and free up the used > memory? No, the buffers make your database faster because they reduce direct disk access > I choose Polesoft Lockspam to fight spam, and you? > http://www.polesoft.com/refer.html I don't :) Mit freundlichem Gruß, Jens Schicke -- Jens Schicke j.schicke@asco.de asco GmbH http://www.asco.de Mittelweg 7 Tel 0531/3906-127 38106 Braunschweig Fax 0531/3906-400
But as long as the memory is in the cache my database became much slower. What could probably be the cause of this? But When I restarted the database is back to normal processing. -----Original Message----- From: Jens-Wolfhard Schicke [mailto:ml+pgsql-performance@asco.de] Sent: Friday, October 21, 2005 7:23 AM To: Christian Paul B. Cosinas; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Used Memory --On Freitag, Oktober 21, 2005 03:40:47 +0000 "Christian Paul B. Cosinas" <cpc@cybees.com> wrote: > I am having a confusion to the memory handling of postgreSQL. > I rebooted my Server which is a PostgreSQL 8.0 Running on Redhat 9, > which is a Dual Xeon Server and 6 gig of memory. > > Of course there is not much memory still used since it is just restarted. > > But after a number of access to the tables the memory is being used > and it is not being free up. Actually after this access to the > database and the server is just idle > > The memory is still used up. I am monitoring this using the "free" > command which gives me about 5.5 gig of used memory and the rest free. I suppose you looked at the top row of the free output? Because there the disk-cache is counted as "used"... Have a look at the second row where buffers are counted as free, which they more or less are. > Is there something that I should do to minimize and free up the used > memory? No, the buffers make your database faster because they reduce direct disk access > I choose Polesoft Lockspam to fight spam, and you? > http://www.polesoft.com/refer.html I don't :) Mit freundlichem Gruß, Jens Schicke -- Jens Schicke j.schicke@asco.de asco GmbH http://www.asco.de Mittelweg 7 Tel 0531/3906-127 38106 Braunschweig Fax 0531/3906-400 I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html
Also Does Creating Temporary table in a function and not dropping them affects the performance of the database? -----Original Message----- From: Jens-Wolfhard Schicke [mailto:ml+pgsql-performance@asco.de] Sent: Friday, October 21, 2005 7:23 AM To: Christian Paul B. Cosinas; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Used Memory --On Freitag, Oktober 21, 2005 03:40:47 +0000 "Christian Paul B. Cosinas" <cpc@cybees.com> wrote: > I am having a confusion to the memory handling of postgreSQL. > I rebooted my Server which is a PostgreSQL 8.0 Running on Redhat 9, > which is a Dual Xeon Server and 6 gig of memory. > > Of course there is not much memory still used since it is just restarted. > > But after a number of access to the tables the memory is being used > and it is not being free up. Actually after this access to the > database and the server is just idle > > The memory is still used up. I am monitoring this using the "free" > command which gives me about 5.5 gig of used memory and the rest free. I suppose you looked at the top row of the free output? Because there the disk-cache is counted as "used"... Have a look at the second row where buffers are counted as free, which they more or less are. > Is there something that I should do to minimize and free up the used > memory? No, the buffers make your database faster because they reduce direct disk access > I choose Polesoft Lockspam to fight spam, and you? > http://www.polesoft.com/refer.html I don't :) Mit freundlichem Gruß, Jens Schicke -- Jens Schicke j.schicke@asco.de asco GmbH http://www.asco.de Mittelweg 7 Tel 0531/3906-127 38106 Braunschweig Fax 0531/3906-400 I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html
On Fri, 21 Oct 2005 03:40:47 -0000 "Christian Paul B. Cosinas" <cpc@cybees.com> wrote: > > But after a number of access to the tables the memory is being used > and it is not being free up. Actually after this access to the > database and the server is just idle I noticed this behavior on my SUSE linux box as well. I thought it was a memory leak in something (I think there was an actual memory leak in the kernel shared memory stuff, which I fixed by upgrading my kernel to 2.6.13-ck8). It turns out that some file systems are better than others when it comes to increasing the performance of I/O on Linux. ReiserFS was what I put on originally and by the end of the day, the box would be using all of it's available memory in caching inodes. I kept rebooting and trying to get the memory usage to go down, but it never did. All but 500MB of it was disk cache. I let my apps just run and when the application server needed more memory, it reclaimed it from the disk cache, so there weren't side effects to the fact that top and free always reported full memory usage. They tell me that this is a good thing, as it reduces disk I/O and increases performance. That's all well and good, but it's entirely unnecessary in our situation. Despite that, I can't turn it off because my research into the issue has shown that kernel developers don't want users to be able to turn off disk caching. There is a value in /proc/sys/vm/vfs_cache_pressure that can be changed, which will affect the propensity of the kernel to cache files in RAM (google it to find the suggestions on what value to set it to), but there isn't a setting to turn that off on purpose. After rolling my own CK-based kernel, switching to XFS, and tweaking the nice and CPU affinity of my database process (I use schedtool in my CK kernel to run it at SCHED_FIFO, nice -15, and CPU affinity confined to the second processor in my dual Xeon eServer) has got me to the point that the perpetually high memory usage doesn't affect my application server. Hope any of this helps. Jon Brisbin Webmaster NPC International, Inc.
to the second processor in my dual Xeon eServer) has got me to the
point that the perpetually high memory usage doesn't affect my
application server.
I'm curious - how does the high memory usage affect your application server?
Alex
I guess, You should check, if a blob field and large object access is suitable for you - no escaping etc, just raw binary large objects.
AFAIK, PQExecParams is not the right solution for You. Refer the "Large object" section:
"28.3.5. Writing Data to a Large Object
The function
int lo_write(PGconn *conn, int fd, const char *buf, size_t len);writes len bytes from buf to large object descriptor fd. The fd argument must have been returned by a previous lo_open. The number of bytes actually written is returned. In the event of an error, the return value is negative."
Well, I read that large objects are kept in only ONE table. No matter what, only the LOID would be kept. I can't affor that since I hav lots of tables (using the image-album-app analogy, imagine that we have pictures from several cities, each one corresponding to a city, like Memphis_Photos, Chicago_Photos, etc.).
This is one major drawback, isn't it?
Rodrigo
total used free shared buffers cached
Mem: 6192460 6137424 55036 0 85952 5828844
-/+ buffers/cache: 222628 5969832
Swap: 2096472 0 2096472
Here is the result of “free” command” I am talking about.
What does this result mean?
I just noticed that as long as the free memory in the first row (which is 55036 as of now) became low, the slower is the response of the database server.
I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html
Christian Paul B. Cosinas wrote: > > > Here is the result of “free” command” I am talking about. > > What does this result mean? > I seem to recall the Linux man page for 'free' being most unenlightening, so have a look at: http://gentoo-wiki.com/FAQ_Linux_Memory_Management (For Gentoo, but should be applicable to RHEL). The basic idea is that modern operating systems try to make as much use of the memory as possible. Postgresql depends on this behavior - e.g. a page that has previously been fetched from disk, will be cached, so it can be read from memory next time, as this is faster(!) > > > I just noticed that as long as the free memory in the first row (which > is 55036 as of now) became low, the slower is the response of the > database server. > Well, you could be swapping - what does the swap line of 'free' show then? Also, how about posting your postgresql.conf (or just the non-default parameters) to this list? Some other stuff that could be relevant: - Is the machine just a database server, or does it run (say) Apache + Php? - When the slowdown is noticed, does this coincide with certain activities - e.g, backup , daily maintenance, data load(!) etc. regards Mark > > I choose Polesoft Lockspam to fight spam, and you? > http://www.polesoft.com/refer.html Nope, not me either.
Please clarify this:
I have 5000 tables, one for each city:
City1_Photos, City2_Photos, ... City5000_Photos.
Each of these tables are: CREATE TABLE CityN_Photos (location text, lo_id largeobectypeiforgot)
So, what's the limit for these large objects? I heard I could only have 4 billion records for the whole database (not for each table). Is this true? If this isn't true, then would postgres manage to create all the large objects I ask him to?
Also, this would be a performance penalty, wouldn't it?
Much thanks for the knowledge shared,
Rodrigo
I guess, You should check, if a blob field and large object access is suitable for you - no escaping etc, just raw binary large objects.
AFAIK, PQExecParams is not the right solution for You. Refer the "Large object" section:
"28.3.5. Writing Data to a Large Object
The function
int lo_write(PGconn *conn, int fd, const char *buf, size_t len);writes len bytes from buf to large object descriptor fd. The fd argument must have been returned by a previous lo_open. The number of bytes actually written is returned. In the event of an error, the return value is negative."
Well, I read that large objects are kept in only ONE table. No matter what, only the LOID would be kept. I can't affor that since I hav lots of tables (using the image-album-app analogy, imagine that we have pictures from several cities, each one corresponding to a city, like Memphis_Photos, Chicago_Photos, etc.).
This is one major drawback, isn't it?
Rodrigo
But what I really want to know is this:
1) All large objects of all tables inside one DATABASE is kept on only one table. True or false?
Thanks =o)
Rodrigo
oh, btw, no harm, but :having 5000 tables only to gain access via city name is a major design flaw.you might consider putting all into one table working with a distributed index over yer table (city, loc_texdt, blobfield); creating a partitioned index over city.best regards-----Ursprüngliche Nachricht-----Now this interests me a lot.
Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]Im Auftrag von Rodrigo Madera
Gesendet: Montag, 24. Oktober 2005 21:12
An: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Inefficient escape codes.
Please clarify this:
I have 5000 tables, one for each city:
City1_Photos, City2_Photos, ... City5000_Photos.
Each of these tables are: CREATE TABLE CityN_Photos (location text, lo_id largeobectypeiforgot)
So, what's the limit for these large objects? I heard I could only have 4 billion records for the whole database (not for each table). Is this true? If this isn't true, then would postgres manage to create all the large objects I ask him to?
Also, this would be a performance penalty, wouldn't it?
Much thanks for the knowledge shared,
Rodrigo
It affect my application since the database server starts to slow down. Hence a very slow in return of functions.
Any more ideas about this everyone?
Please….
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Alex Turner
Sent: Friday, October 21, 2005 3:42 PM
To: Jon Brisbin
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Used Memory
[snip]
to the second processor in my dual Xeon eServer) has got me to the
point that the perpetually high memory usage doesn't affect my
application server.
I'm curious - how does the high memory usage affect your application server?
Alex
I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html