Reading integers from a memory mapped formatted file?

This code is a little unsafe since there's no guarantee that strtol will stop at the end of the memory mapped block, but it's a start. Should go very fast even with additional checking added.

Up vote 2 down vote favorite share g+ share fb share tw.

I have memory mapped a large formatted (text) file containing one integer per line like so: 123 345 34324 3232 ... So, I have a pointer to the memory at the first byte and also a pointer to the memory at the last byte. I am trying to read all those integers into an array as fast as possible. Initially I created a specialized std::streambuf class to work with std::istream to read from that memory but it seem to be relatively slow.

Do you have any suggestion on how to efficiently parse a string like "1231232\r\n123123\r\n123\r\n1231\r\n2387897..." into an array {1231232,123123,1231,231,2387897,...}? The number of integers in the file is not known beforehand. C++ windows mmap streambuf memory-mapping link|improve this question edited Nov 17 '11 at 17:14Charles19k51836 asked Nov 16 '10 at 20:03Peter Jansson20617.

Std::vector array; char * p = ...; // start of memory mapped block while ( not end of memory block ) { array. Push_back(static_cast(strtol(p, &p, 10))); while (not end of memory block &&! Isdigit(*p)) ++p; } This code is a little unsafe since there's no guarantee that strtol will stop at the end of the memory mapped block, but it's a start.

Should go very fast even with additional checking added.

One optimization could be to replace isdigit with (*p & 0xF0). – ronag Nov 16 '10 at 20:54 You forgot the array. Reserve(...).

The performance penalty of that could be severe. – Yippie-Kai-Yay Nov 16 '10 at 21:08 @HardCoder1986, std::vector will grow the array exponentially so the performance penalty is only log(n). The question said explicitly that the number of integers is unknown.

I agree that making a reasonable guess could help. – Mark Ransom Nov 16 '10 at 22:09 @ronag, your expression doesn't do quite the same thing. I doubt that isdigit is going to be the bottleneck anyway.

– Mark Ransom Nov 16 '10 at 22:10 Furthermore, calling isdigit with a (possibly signed) char other than EOF invokes undefined behavior. – Roland Illig Nov 16 '10 at 22:40.

This was a really interesting task for me to learn a bit more about C++. Admitted, the code is quite large and has a lot of error checking, but that only shows how many different things can go wrong during parsing. #include #include #include #include #include #include static void die(const char *reason) { fprintf(stderr, "aborted (%s)\n", reason); exit(EXIT_FAILURE); } template static bool read_uint(BytePtr *begin_ref, BytePtr end, unsigned int *out) { const unsigned int MAX_DIV = UINT_MAX / 10; const unsigned int MAX_MOD = UINT_MAX % 10; BytePtr begin = *begin_ref; unsigned int n = 0; while (begin!

= end && '0' MAX_DIV || (n == MAX_DIV && digit > MAX_MOD)) die("unsigned overflow"); n = 10 * n + digit; begin++; } if (begin == *begin_ref) return false; *begin_ref = begin; *out = n; return true; } template void parse_ints(BytePtr begin, BytePtr end, IntConsumer out) { while (true) { while (begin! = end && *begin == (unsigned char) *begin && isspace(*begin)) begin++; if (begin == end) return; bool negative = *begin == '-'; if (negative) { begin++; if (begin == end) die("minus at end of input"); } unsigned int un; if (!read_uint(&begin, end, &un)) die("no number found"); if (!negative && un > INT_MAX) die("too large positive"); if (negative && un > -((unsigned int)INT_MIN)) die("too small negative"); int n = negative? -un : un; *out++ = n; } } static void print(int x) { printf("%d\n", x); } int main() { std::vector result; std::string input("2147483647 -2147483648 0 00000 1 2 32767 4 -17 6"); parse_ints(input.begin(), input.end(), back_inserter(result)); std::for_each(result.begin(), result.end(), print); return 0; } I tried hard not to invoke any kind of undefined behavior, which can get quite tricky when converting unsigned numbers to signed numbers or invoking isspace on an unknown data type.

1 for the effort – John Dibling Nov 16 '10 at 22:38.

Since this is memory mapped a simple copy the chars to a stack array and atoi to the another integer array on top of a another memory mapped file would be the very efficient. This way the paging file is not used for these big buffers at all. Open memory mapped file to output int buffer declare small stack buffer of 20 chars while not end of char array while current char not line feed copy chars to stack buffer null terminate the buffer two chars back copy results of int buffer output buffer increment the output buffer pointer end while end while While this doesn't use the a library is has the advantage of minimising memory usage to memory mapped files, so temp buffers are limited to the stack one and the one used by atoi internally.

The output buffer can be thrown away or left saved to the file as needed.

NOTE: This answer has been edited a few times. Reads memory line by line (based on link and link). Class line { std::string data; public: friend std::istream &operator>>(std::istream &is, line &l) { std::getline(is, l.

Data); return is; } operator std::string() { return data; } }; std::streambuf osrb; setg(ptr, ptr, ptrs + size-1); std::istream istr(&osrb); std::vector ints; std::istream_iterator begin(istr); std::istream_iterator end; std::transform(begin, end, std::back_inserter(ints), &boost::lexical_cast).

1 It seems to kind of defeat the point if he's going to copy a memory-mapped file into memory… – Steve M Nov 16 '10 at 20:22 Indeed, it does defeat the point. – ronag Nov 16 '10 at 20:24 Well I think the edited answer is ok... it was an honest try at least. – ronag Nov 16 '10 at 20:51 This is almost the same solution as I did with a specialized streambuf and istream which was relatively slow.

It is elegant code though. – Peter Jansson Nov 16 '10 at 21:09.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions