String.Concat inefficient code?

Well for one thing, it means that the contents of the new array can be trusted to be non-null.... and unchanging.

Up vote 18 down vote favorite 2 share g+ share fb share tw.

I was investigating String. Concat : (Reflector) very strange : the have the values array , they creating a NEW ARRAY for which later they send him to ConcatArray. Question : Why they created a new array?

They had values from the first place... c# string . Net-4.0 concatenation link|improve this question asked Mar 28 at 15:11Royi Namir10.9k1425 96% accept rate.

1 Probably because it makes ConcatArray more efficient, not having to special case for nulls. You do know it isn't copying all of the strings right? It's just creating a new array of references to the existing strings.

– Chris Shain Mar 28 at 15:14 I'm not sure I understand how totalLength could ever be less than zero. What silly thing am I overlooking? – LarsTech Mar 28 at 19:28 2 @LarsTech when overflowexception occures - the int is negative.... – Royi Namir Mar 28 at 19:55 Wait, how were you inspecting Concat?

I've been trying to figure out how to view the code inside built in methods, but all I've found is that I need to use cecil. – Walkerneo Mar 29 at 0:33 I'm wondering why it throws OutOfMemoryException() when totalLength – Nawaz Mar 29 at 12:28.

Well for one thing, it means that the contents of the new array can be trusted to be non-null.... and unchanging. Without that copying, another thread could modify the original array during the call to ConcatArray, which presumably could throw an exception or even trigger a security bug. With the copying, the input array can be changed at any time - each element will be read exactly once, so there can be no inconsistency.

(The result may be a mixture of old and new elements, but you won't end up with memory corruption. ) Suppose ConcatArray is trusted to do bulk copying out of the strings in the array it's passed, without checking for buffer overflow. Then if you change the input array at just the right time, you could end up writing outside the allocated memory.

Badness. With this defensive copy, the system can be sure1 that the total length really is the total length. 1 Well, unless reflection is used to change the contents of a string.

But that can't be done without fairly high permissions - whereas changing the contents of an array is easy.

– Jon Skeet Mar 28 at 15:18 Another thread could modify the original during the copy too. This seems to me to not be correct reasoning. (I didn't downvote, just didn't upvote.

) – Jason Mar 28 at 15:18 3 Your conjecture is correct; this is to ensure safety if someone is modifying the array on another thread. – Eric Lippert Mar 28 at 15:23 1 @Jon Skeet: I get that, but there's still a race condition that is not being avoided by this. – Jason Mar 28 at 15:27 5 @Jason: You are right; I should have called out that the "safety" I am talking about is not "thread safety" -- whatever that is -- but rather memory safety.

The string code is behind the scenes going to use unsafe code to fill in the new buffer, and we must at all costs avoid corrupting memory. If the result is nonsensical because someone mutated an array that was being used on another thread, that's their problem; at least there is a result and an uncorrupted heap. – Eric Lippert Mar 28 at 15:40.

I can confirm Jon's conjecture; I have the original source code in front of me. The comments indicate that the reason for the copy is because some foolish person could be mutating the array that was passed in on another thread. What could then happen?

The calculation of length could say that there will be a hundred bytes of string data in the result, but by the time the copy happens, there could be a million bytes of string data in the array. That would be bad. The problem is prevented easily by making a copy.

1 The problem is mitigated, not prevented; the other thread could still mutate the source array before the copy is completed. – phoog Mar 28 at 15:26 4 @phoog: Right; the problem that is prevented is memory corruption. The output string can still be completely unrelated to the input strings due to race conditions, but memory will not be corrupted.

– Eric Lippert Mar 28 at 15:41 Ah, I see, because the code reads the length of the string after it is written to the new array, so the totalLength variable is always consistent with the total length of the strings in the new array. – phoog Mar 28 at 15:47 3 Perhaps this might be a good time to remind the readers that one does't need to be a C# compiler developer (or even a Microsoft employee) to actually see the original source for most framework assemblies, thanks to the Microsoft Reference Source program. – Kevin Cathcart Mar 28 at 21:12.

They created a new array to normalize out null entries into String.Empty. This couldn't be done on the provided values array because then they would be modifying the input.

Inefficient No, it doesn't matter. That array creation and copy is blazing fast relative to the concatenation, it's only copying references. It looks like they do it to convert null strings in the input array into String.

Empty (they can't do this on values because it would be modifying the input which is a no no), and to detect concatentations that would overflow before actually doing the concatenation (that's what the if(totalLength.

It doesn't matter in most cases, but if the array is huge, the memory overhead will not be insignificant, and the likelihood of large object heap fragmentation will be significantly higher. – phoog Mar 28 at 15:18 1 @phoog: If the array is huge then you are almost certainly making an even huger string, and so you already have bigger problems to worry about than the array copy. – Eric Lippert Mar 28 at 15:20 1 @EricLippert That is, of course, a very good point.

– phoog Mar 28 at 15:25.

Probably to make sure it doesn't change over the lifetime of the method. Which would lead to totalLength not fitting the content of the array anymore. I suspect that ConcatArray uses some unsafe memory copying, and doesn't recheck the string Lengths again.

One could rewrite it to avoid the allocation, but one additional small, short-lived allocation is pretty cheap.

As far as I can see they do some work with this - guess you don't want to mutate the original one ... remember you want your strings immutable same for the Concat-function of course - don't change a parameter-reference if not stated ....

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions