Reading a specific number of lines from a file in C (scanf, fseek,fgets)?

If you don't care about dividing the file exactly evenly, and the distribution of line lengths is somewhat even over the entire file, you can avoid reading the entire file in the parent once.

Up vote 1 down vote favorite 1 share g+ share fb share tw.

I have a process master that spawns N child processes that communicate with the parent through unnamed pipes. I must be able to: make the father open the file and then send, to each child, a struct telling that it has to read from min to max line; this is going to happen at the same time, so I don't know: 1st how to divide total_lines for N maps and 2nd how do I make each child read just the lines it is supposed to? My problem does not concern the O.S. concepts, only the file operations :S Perhaps fseek?

I can't mmap the log file (some have more than 1GB). I would appreciate some ideas. Thank you in advance EDIT: I'm trying to make the children read the respective lines without using fseek and the value of chunks, so, could someone please tell me if this is valid?

: //somewhere in the parent process: FILE* logFile = fopen(filename, "r"); while (fgets(line, 1024, logFile)! = NULL) { num_lines++; } rewind(logFile); int prev = 0; for (i = 0; I =minLine and n C file parent-child scanf fseek link|improve this question edited Oct 13 '10 at 19:38 asked Oct 13 '10 at 11:28neverMind372217 70% accept rate.

I don't know of any way to compute the count of lines in a file (your total_lines variable) without reading the whole file once. Is that acceptable? – Frédéric Hamidi Oct 13 '10 at 11:34 1 Don;t make the children read the file that is just not a bottleneck that can't be improved.

Make the master read the file and then send the lines to the children. – Loki Astari Oct 13 '10 at 11:38.

If you don't care about dividing the file exactly evenly, and the distribution of line lengths is somewhat even over the entire file, you can avoid reading the entire file in the parent once. Get the file size. Chunk_size = file_size / number_of_children When you spawn each child do in the parent: seek to (child_num+1) * chunk_size Read forward until you find a newline.

Spawn the child, telling it to start at the end of the previous chunk (or 0 for the first child), and the actual length of the chunk. Each child seeks to start and reads chunk_size bytes. That's a rough sketch of the strategy.

Edited to simplify things a bit. Edit: here's some untested code for step 3, and step 4 below. This is all untested, and I haven't been careful about off-by-one errors, but it gives you an idea of the usage of fseek and ftell, which sounds like what you are looking for.

// Assume FILE* f is open to the file, chunk_size is the average expected size, // child_num is the id of the current child, spawn_child() is a function that // handles the logic of spawning a child and telling it where to start reading, // and how much to read. Child_chunks is an array of structs to keep track of // where the chunks start and how big they are. If(fseek(f, child_num * chunk_size, SEEK_SET) = FEOF && ch!

= '\n') {/*empty*/} // FIXME: needs to handle EOF properly. Child_chunkschild_num. End = ftell(f); // FIXME: needs error check.

Child_chunkschild_num+1. Start = child_chunkschild_num. End + 1; spawn_child(child_num); Then in your child (step 4), assume the child has access to child_chunks and knows its child_num: void this_is_the_child(int child_num) { /* ... */ fseek(f, child_chunkschild_num.

Start, SEEK_SET); // FIXME: handle error while(fgets(...) && ftell(f).

The problem is that each line has very different sizes, but max is 1024 bytes. The children have to read an IP from each line, and if I estimate wrong, they may reach a IP at his middle, or something and that cannot happen of course – neverMind Oct 13 '10 at 12:14 The second bullet in step 5 prevents you from having the estimation break a line in half -- read forward until you find a newline. If the distribution of line sizes is fairly uniform, you could skip estimating the expected line size and just divide the file length by the number of children.

– bstpierre Oct 13 '10 at 12:21 Can you please exemplify wiht some code? My doubt doesn't concern the operating systems concepts, just the work with the file :( – neverMind Oct 13 '10 at 12:38.

Get an array with line-startpositions (file-offsets) */ fpos_t readLineBegins(FILE *f,fpos_t **begins) { fpos_t ch=0, mark=0, num=0; *begins = 0; do { if( ch=='\n' ) { *begins = realloc( *begins, ++num * sizeof(fpos_t) ); (*begins)num-1 = mark; mark = ftell(f); } } while( (ch=fgetc(f))! =EOF ); if( mark ='\n' && ch! ='\r' ) putchar(ch); puts(""); } } main() { FILE *f=fopen("file.

Txt","rb"); fpos_t *lineBegins, /* Array with line-startpositions */ lb = readLineBegins(f,&lineBegins); /* get number of lines */ workLineBlocks(f,lineBegins,lb-2,lb-1); /* out last two lines */ workLineBlocks(f,lineBegins,0,1); /* out first two lines */ fclose(f); free(lineBegins); }.

Compiling this code, with the respective includes, gives a lot of errors. Could you tell me if you've tested it? Thanks.

– neverMind Oct 13 '10 at 18:31 my MinGW-GCC don't have any errors, replace fpos_t with long like codepad.org/g3OpxZoP – user411313 Oct 13 '10 at 21:17 here another better solution without compiler-errors: ideone.com/trPG4 – user411313 Oct 13 '10 at 21:33 THis code is great, but instead of putting to the stdout, I would prefer to have a string that stores each line. Could you tell me how to do that? I'm sick of trying, but without success :( – neverMind 13 Oct3 at 1:25 try and be happy: ideone.com/8ySsH – user411313 Oct 14 '10 at 18:24.

I think it can help you: Read specific range of lines form a text file.

Thanks, I already have coded something like you have here! I count the lines and then I divide min-max range to each child to read (each child opens the file). It may not be the fastest algorithm to do that (perhaps fseek and pointers), but is easier for me to understand and debug.

– neverMind Oct 15 '10 at 21:54 How can I tell the time? When I do start = clock (); within the parent, start the countdown clock, but the father goes to sleep when the child is working and the clock pauses when the parent process is S (sleeping). How can I make the clock does not stop counting?

Thanks – John Doe Oct 16 '10 at 15:47.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions