Re: 8.3 / 8.2.6 restore comparison - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: 8.3 / 8.2.6 restore comparison |
Date | |
Msg-id | 200803040140.m241e2A09710@momjian.us Whole thread Raw |
In response to | Re: 8.3 / 8.2.6 restore comparison ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Responses |
Re: 8.3 / 8.2.6 restore comparison
Re: 8.3 / 8.2.6 restore comparison |
List | pgsql-hackers |
Heikki, are there any TODO items here? --------------------------------------------------------------------------- Heikki Linnakangas wrote: > Joshua D. Drake wrote: > > On Sun, 24 Feb 2008 00:43:18 +0000 > > "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > > > >> Incidentally, I've been working on a patch to speed up CopyReadLine. > >> I was going to run some more tests first, but since we're talking > >> about it, I guess I should just post the patch. I'll post to > >> pgsql-patches shortly. > > > > On your post to patches you mentioned only about a 5% improvement. > > Don't get me wrong, 5% is 5% and I respect it greatly but as far as I > > can tell we are about 300% behind the curve. > > Yeah. Looking at the profile, the time is spent really all over the > place. There's no one clear bottleneck to focus on. I think we could do > a few more ~5% improvements, but > > At some point, I think we have to bite the bullet and find a way to use > multiple CPUs for a single load. I don't have any good ideas or plans > for that, but hopefully someone does. > > > My tests were maxing out at ~22G an hour. On hardware that can do > > in 360G an hour and that is assuming > 50% overhead between OS, libs, > > etc... I have no choice but to conclude we have a much, much deeper and > > fundamental issue going on with COPY. I am inspired by Itagaki Takahiro > > and his batch update of indexes which should help greatly overall but > > doesn't help my specific issue. > > Yep, the index build idea is an I/O improvement, not a CPU one. > > > Forgive me for not being a C programmer and Alvaro is not online so I > > would vet these questions with him first. > > > > I know that copy is in theory a bulk loader but, when performing the > > readline how many lines are we reading? Do we read up to 8192? Or do we > > shove in say 8megs of data before we invoke DoCopy? > > We read 64 KB at a time, and then CopyReadLineText returns one line at a > time from that buffer. > > Looking at your profile more, and after the memchr patch, the "raw input > side" of copy, consisting of reading the data from disk in 64KB blocks, > splitting that into lines, and splitting lines into columns, still takes > ~20% of the CPU time. I suspect CopyReadAttributesText is the biggest > culprit there. > > You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by > creating the table (or truncating it) in the same transaction with the COPY. > > After that, heap_formtuple is high on the list. I wonder if we could do > something about that. > > > I am just curious if there is some simple low hanging fruit that is > > possibly missing. > > I don't see any piece of code that's causing problems. We can shave off > a few percents here and there I think, but don't expect a 300% > improvement anytime soon. A few ideas I've thought about are: > > - use a specialized version of strtol, for base 10. That won't help on > your table, but I've seen strtol consume a significant amount of time on > tables with numeric/integer columns. > > - Instead of pallocing and memcpying the text fields, leave a little bit > of room between fields in the attribute_buf, and write the varlen header > there directly. This might help you since your table has a lot of text > fields. > > - Instead of the normal PG function calling conventions, provide > specialized fastpath input functions for the most common data types. > InputFunctionCall consumed 4.5% of the CPU time in your profile. > > - Use a simpler memory context implementation, that's like a stack with > no pfree support, for the per-tuple context. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: