You don't say which platform you are on, but if it is UNIX-like, then you may want to try the read() system call, which does not perform the extra layer of buffering that fgets() et al do. This may speed things up slightly, on the other hand it may well slow things down - the only way to find out is to suck it and see.
This turned out to be the fastest method of all. I eventually went down this route. It was simpler than I had thought to do "my own buffering" and it turned out to be much, much faster (almost 4 times) than using fgets().
– dreamlax Jan 16 '10 at 2:44 Ironically, for me pread perfomed 4 times worse than fgets. – abirvalg Oct 15 at 19:33.
Use fgets_unlocked(), but read carefully what it does first Get the data with fgetc() or fgetc_unlocked() instead of fgets(). With fgets(), your data is copied into memory twice, first by the C runtime library from a file to an internal buffer (stream I/O is buffered), then from that internal buffer to an array in your program.
Sometimes it comes from the disk, sometimes it is fed through stdin, but in both cases the the time spent in fgets is roughly the same. Even creating a RAM disk for the file doesn't speed things up much. – dreamlax Apr 9 '09 at 1:38 After edit: the problem is that this application will be run on end user's computer, that's why performance is quite important.
– dreamlax Apr 9 '09 at 1:49.
You might try minimizing the amount of time you spend reading from the disk by reading large amounts of data into RAM then working on that. Reading from disk is slow, so minimize the amount of time you spend doing that by reading (ideally) the entire file once, then working on it. Sorta like the way CPU cache minimizes the time the CPU actually goes back to RAM, you could use RAM to minimize the number of times you actually go to disk.
– Paul Tomblin Apr 9 '09 at 1:53 I think so but I'm sure it's less than a megabyte, so reading more than that should still help. – GMan Apr 9 '09 at 2:29.
Up vote 7 down vote favorite 2 share g+ share fb share tw.
I'm writing a program where performance is quite important, but not critical. Currently I am reading in text from a FILE* line by line and I use fgets to obtain each line. After using some performance tools, I've found that 20% to 30% of the time my application is running, it is inside fgets.
Are there faster ways to get a line of text? My application is single-threaded with no intentions to use multiple threads. Input could be from stdin or from a file.
Thanks in advance. C optimization file-io stdin fgets link|improve this question asked Apr 9 '09 at 1:24dreamlax22.1k53475 100% accept rate.
This helps determining the fastest way to access them. – Juliano Apr 9 '09 at 2:13 @Juliano, the lines are always less than 260 characters in length. I have already avoided a line-building loop.
– dreamlax Apr 9 '09 at 2:29 Do you control the input format? Could you make it more compact? – Dave Apr 29 '09 at 17:17 @Dave, no, I have no control on the input format.
– dreamlax Apr 29 '09 at 21:28.
This turned out to be the fastest method of all. I eventually went down this route. It was simpler than I had thought to do "my own buffering" and it turned out to be much, much faster (almost 4 times) than using fgets().
– dreamlax Jan 16 '10 at 2:44 Ironically, for me pread perfomed 4 times worse than fgets. – abirvalg Oct 15 '11 at 19:33.
Thanks for the suggestion, but I forgot to mention I am using Mac OS X. Fgets_unlocked is not available since it is a GNU extension. I will look into using fgetc_unlocked.
– dreamlax Apr 30 '09 at 5:11 Well, OS X is running GCC, you should get the GNU extensions, right? – Martin Cote May 18 '09 at 3:09 1 @Martin: It is not an extension of the GNU compiler, but the GNU C runtime library. – dreamlax Aug 6 '09 at 8:13.
If the data is coming from disk, you could be IO bound. If that is the case, get a faster disk (but first check that you're getting the most out of your existing one...some Linux distributions don't optimize disk access out of the box (hdparm)), stage the data into memory (say by copying it to a RAM disk) ahead of time, or be prepared to wait. If you are not IO bound, you could be wasting a lot of time copying.
You could benefit from so-called zero-copy methods. Something like memory map the file and only access it through pointers. That is a bit beyond my expertise, so you should do some reading or wait for more knowledgeable help.
BTW-- You might be getting into more work than the problem is worth; maybe a faster machine would solve all your problems... NB-- It is not clear that you can memory map the standard input either...
– Paul Tomblin Apr 9 '09 at 1:53 I think so but I'm sure it's less than a megabyte, so reading more than that should still help. – GManNickG Apr 9 '09 at 2:29.
Depending on your environment, using setvbuf() to increase the size of the internal buffer used by file streams may or may not improve performance. This is the syntax - setvbuf (InputFile, NULL, _IOFBF, BUFFER_SIZE); Where InputFile is a FILE* to a file just opened using fopen() and BUFFER_SIZE is the size of the buffer (which is allocated by this call for you). You can try various buffer sizes to see if any have positive influence.
Note that this is entirely optional, and your runtime may do absolutely nothing with this call.
Read the whole file in one go into a buffer. Process the lines from that buffer. That's the fastest possible solution.
If the OS supports it, you can try asynchronous file reading, that is, the file is read into memory whilst the CPU is busy doing something else. So, the code goes something like: start asynchronous read loop: wait for asynchronous read to complete if end of file goto exit start asynchronous read do stuff with data read from file goto loop exit: If you have more than one CPU then one CPU reads the file and parses the data into lines, the other CPU takes each line and processes it. Skizz.
Look into fread(). It reads much faster for me, especially if buffer for fread is set to 65536. Cons: you have to do a lot of work and essentially write your own getline function to convert from binary read to text.
Check out: file I/O.
I've written a program that takes a uncompressed file, and reads it line per line, which works perfectly. Now I want to be able to open the compressed files inmemory and run my little program. I've looked into zlib but I can't find a good solution.
Loading the entire file is impossible using gzread(gzFile,void *,unsigned), because of the 32bit unsigned int limitation. I've also looked into "buffering", such as splitting the gzread process into multiple 2gig chunks, find the last newline using strcchr, and then setting the gzseek. But gzseek will emulate a total file uncompression.
Which is very slow.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.