How do I split a string by strings and include the delimiters using .NET?

Despite your reluctance to use regex it actually nicely preserves the delimiters by using a group along with the Regex. Split method: string input = "123xx456yy789"; string pattern = "(xx|yy)"; string result = Regex. Split(input, pattern) If you remove the parentheses from the pattern, using just xx|yy the delimiters are not preserved.Be sure to use Regex.

Escape on the pattern if you use any metacharacters that hold special meaning in regex. The characters include? , |, {, , (,), ^, $ For instance, a delimiter of should be escaped Given a list of delimiters, you need to "OR" them using the pipe symbol and that too is a character that gets escaped.

To properly build the pattern use the following code (thanks to @gabe for pointing this out): var delimiters = new List { ". ", "xx", "yy" }; string pattern = "(" + String. Join("|", delimiters.

Select(d => Regex. Escape(d)) .ToArray()) + ") The parentheses are concatenated rather than included in the pattern since they would be incorrectly escaped for your purposes EDIT: In addition, if the delimiters list happens to be empty, the final pattern would incorrectly be () and this would cause blank matches.To prevent this a check for the delimiters can be used. With all this in mind the snippet becomes: string input = "123xx456yy789"; // to reach the else branch set delimiters to new List(); var delimiters = new List { ".

", "xx", "yy", "()" }; if (delimiters. Count > 0) { string pattern = "(" + String. Join("|", delimiters.

Select(d => Regex. Escape(d)) .ToArray()) + ")"; string result = Regex. Split(input, pattern); foreach (string s in result) { Console.

WriteLine(s); } } else { // nothing to split Console. WriteLine(input); } If you need a case-insensitive match for the delimiters use the RegexOptions. IgnoreCase option: Regex.

Split(input, pattern, RegexOptions. IgnoreCase) EDIT #2: the solution so far matches split tokens that might be a substring of a larger string. If the split token should be matched completely, rather than part of a substring, such as a scenario where words in a sentence are used as the delimiters, then the word-boundary be metacharacter should be added around the pattern For example, consider this sentence (yea, it's corny): Welcome to stackoverflow... where the stack never overflows!

If the delimiters were { "stack", "flow" } the current solution would split "stackoverflow" and return 3 strings { "stack", "over", "flow" } If you needed an exact match, then the only place this would split would be at the word "stack" later in the sentence and not "stackoverflow To achieve an exact match behavior alter the pattern to include be as in b(delim1|delim2|delimN)\b : string pattern = @"\b(" + String. Join("|", delimiters. Select(d => Regex.

Escape(d))) + @")\b Finally, if trimming the spaces before and after the delimiters is desired, add s around the pattern as in s*(delim1|delim2|delimN)\s This can be combined with be as follows: string pattern = @"\s*\b(" + String. Join("|", delimiters. Select(d => Regex.

Escape(d))) + @")\b\s.

Despite your reluctance to use regex it actually nicely preserves the delimiters by using a group along with the Regex. Split method: string input = "123xx456yy789"; string pattern = "(xx|yy)"; string result = Regex. Split(input, pattern); If you remove the parentheses from the pattern, using just "xx|yy", the delimiters are not preserved.Be sure to use Regex.

Escape on the pattern if you use any metacharacters that hold special meaning in regex. The characters include \, *, +,? , |, {, , (,), ^, $,.

, #. For instance, a delimiter of . Should be escaped \.. Given a list of delimiters, you need to "OR" them using the pipe | symbol and that too is a character that gets escaped.

To properly build the pattern use the following code (thanks to @gabe for pointing this out): var delimiters = new List { ". ", "xx", "yy" }; string pattern = "(" + String. Join("|", delimiters.

Select(d => Regex. Escape(d)) .ToArray()) + ")"; The parentheses are concatenated rather than included in the pattern since they would be incorrectly escaped for your purposes. EDIT: In addition, if the delimiters list happens to be empty, the final pattern would incorrectly be () and this would cause blank matches.

To prevent this a check for the delimiters can be used. With all this in mind the snippet becomes: string input = "123xx456yy789"; // to reach the else branch set delimiters to new List(); var delimiters = new List { ".", "xx", "yy", "()" }; if (delimiters. Count > 0) { string pattern = "(" + String.

Join("|", delimiters. Select(d => Regex. Escape(d)) .ToArray()) + ")"; string result = Regex.

Split(input, pattern); foreach (string s in result) { Console. WriteLine(s); } } else { // nothing to split Console. WriteLine(input); } If you need a case-insensitive match for the delimiters use the RegexOptions.

IgnoreCase option: Regex. Split(input, pattern, RegexOptions. IgnoreCase) EDIT #2: the solution so far matches split tokens that might be a substring of a larger string.

If the split token should be matched completely, rather than part of a substring, such as a scenario where words in a sentence are used as the delimiters, then the word-boundary \b metacharacter should be added around the pattern. For example, consider this sentence (yea, it's corny): "Welcome to stackoverflow... where the stack never overflows! " If the delimiters were { "stack", "flow" } the current solution would split "stackoverflow" and return 3 strings { "stack", "over", "flow" }.

If you needed an exact match, then the only place this would split would be at the word "stack" later in the sentence and not "stackoverflow". To achieve an exact match behavior alter the pattern to include \b as in \b(delim1|delim2|delimN)\b: string pattern = @"\b(" + String. Join("|", delimiters.

Select(d => Regex. Escape(d))) + @")\b"; Finally, if trimming the spaces before and after the delimiters is desired, add \s* around the pattern as in \s*(delim1|delim2|delimN)\s*. This can be combined with \b as follows: string pattern = @"\s*\b(" + String.

Join("|", delimiters. Select(d => Regex. Escape(d))) + @")\b\s.

1 That's a nice solution. I do like regex, I just thought it's too big of a tool for a job so simple that a very similar version was included in . NET's string class.

– mafutrct Mar 20 '10 at 22:39 You would need to do pattern = "(" + String. Join("|", (from d in delimeters select Regex. Escape(d)).ToArray()) + ")" because any of the delimeters could have a .

Or | or whatever in them. – Gabe Mar 20 '10 at 22:41 +1 I didn't know you could do that! Very nice.

You just need to fix the Regex. Escape code... – Mark Byers Mar 20 '10 at 22:42 @gabe good point, I missed that. Will edit now.

– Ahmad Mageed Mar 20 '10 at 22:44 @Mark thanks, and done :) – Ahmad Mageed Mar 20 '10 at 22:56.

Ok, sorry, maybe this one: string source = "123xx456yy789"; foreach (string delimiter in delimiters) source = source. Replace(delimiter, ";" + delimiter + ";"); string parts = source. Split(';').

Fails for delimiters that include ;. – mafutrct Mar 20 '10 at 22:13 2 @mafutrct - he actually presented a workable idea, though. Perhaps have a list of possible new delimitters, could be one or more characters each.

Iterate over the list, check if the possible delimitter exists, and use Nagg's logic for the first delimitter that passes the test. – Anthony Pegram Mar 20 '10 at 22:17 True, but I'd really like a solution that is not dependent on the non-existance of certain delimiter literals in the string. I don't see how this is possible with this idea, except with some mapping that would likely hurt the performance too badly.

I'm open for counter examples though, of course. – mafutrct Mar 20 '10 at 22:23.

I came up with a solution for something similar a while back. To efficiently split a string you can keep a list of the next occurance of each delimiter. That way you minimise the times that you have to look for each delimiter.

This algorithm will perform well even for a long string and a large number of delimiters: string input = "123xx456yy789"; string delimiters = { "xx", "yy" }; int nextPosition = delimiters. Select(d => input. IndexOf(d)).ToArray(); List result = new List(); int pos = 0; while (true) { int firstPos = int.

MaxValue; string delimiter = null; for (int I = 0; I = int. MaxValue) { result. Add(input.

Substring(pos, firstPos - pos)); result. Add(delimiter); pos = firstPos + delimiter. Length; for (int I = 0; I Length; i++) { if (nextPositioni!

= -1 && nextPositioni Add(input. Substring(pos)); break; } } (With reservations for any bugs, I just threw this version together now and I haven't tested it thorougly. ).

Seems to work fine for standard input. – mafutrct Mar 21 '10 at 21:41.

Here's a solution that doesn't use a regular expression and doesn't make more strings than necessary: public static List Split(string searchStr, string separators) { List result = new List(); int length = searchStr. Length; int lastMatchEnd = 0; for (int I = 0; I Length; if (((searchStri == str0) && (sepLen Substring(lastMatchEnd, I - lastMatchEnd)); result. Add(separatorsj); I += sepLen - 1; lastMatchEnd = I + 1; break; } } } if (lastMatchEnd!

= length) result. Add(searchStr. Substring(lastMatchEnd)); return result; }.

I noticed this produces an output different from all others. Sometimes an item is missing, apparently. – mafutrct Oct 14 '10 at 11:41.

A naive implementation public IEnumerable SplitX (string text, string delimiters) { var split = text. Split (delimiters, StringSplitOptions. None); foreach (string part in split) { yield return part; text = text.

Substring (part. Length); string delim = delimiters. FirstOrDefault (x => text.

StartsWith (x)); if (delim! = null) { yield return delim; text = text. Substring (delim.

Length); } } }.

My first post/answer...this is a recursive approach. Static void Split(string src, string delims, ref List final) { if (src. Length == 0) return; int endTrimIndex = src.

Length; foreach (string delim in delims) { //get the index of the first occurance of this delim int indexOfDelim = src. IndexOf(delim); //check to see if this delim is at the begining of src if (indexOfDelim == 0) { endTrimIndex = delim. Length; break; } //see if this delim comes before previously searched delims else if (indexOfDelim = -1) endTrimIndex = indexOfDelim; } final.

Add(src. Substring(0, endTrimIndex)); Split(src. Remove(0, endTrimIndex), delims, ref final); }.

This will have identical semantics to String. Split default mode (so not including empty tokens). It can be made faster by using unsafe code to iterate over the source string, though this requires you to write the iteration mechanism yourself rather than using yield return.It allocates the absolute minimum (a substring per non separator token plus the wrapping enumerator) so realistically to improve performance you would have to: use even more unsafe code (by using 'CompareOrdinal' I effectively am) mainly in avoiding the overhead of character lookup on the string with a char buffer make use of domain specific knowledge about the input sources or tokens.

You may be happy to eliminate the null check on the separators you may know that the separators are almost never individual characters The code is written as an extension method public static IEnumerable SplitWithTokens( string str, string separators) { if (separators == null || separators. Length == 0) { yield return str; yield break; } int prev = 0; for (int I = 0; I 0) yield return str. Substring(prev, str.

Length - prev); }.

Ah - realised I am similar to gabe in implementation. Mine saves some allocations but is fundamentally the same concept. – ShuggyCoUk Mar 21 '10 at 1:26 How does your implementation save allocations?

– Gabe Mar 21 '10 at 5:41 @gabe I do not create sub strings for the separator tokens, a minor improvement trivial to add to yours (which I see you have already sone) – ShuggyCoUk Mar 21 '10 at 17:18 Yes, but your foreach loop allocates a new enumerator for the separator array for every character of the input string. – Gabe Mar 21 '10 at 18:40 @gabe foreach on a (compile time known) array does not allocate an enumerator. Try it and see.

– ShuggyCoUk Mar 21 '10 at 22:36.

Despite your reluctance to use regex it actually nicely preserves the delimiters by using a group along with the Regex. If you remove the parentheses from the pattern, using just "xx|yy", the delimiters are not preserved. Be sure to use Regex.

Escape on the pattern if you use any metacharacters that hold special meaning in regex. The characters include \, *, +,? , |, {, , (,), ^, $,.

For instance, a delimiter of . Should be escaped \.. Given a list of delimiters, you need to "OR" them using the pipe | symbol and that too is a character that gets escaped. The parentheses are concatenated rather than included in the pattern since they would be incorrectly escaped for your purposes.

EDIT: In addition, if the delimiters list happens to be empty, the final pattern would incorrectly be () and this would cause blank matches. To prevent this a check for the delimiters can be used. If you need a case-insensitive match for the delimiters use the RegexOptions.

IgnoreCase option: Regex. Split(input, pattern, RegexOptions. EDIT #2: the solution so far matches split tokens that might be a substring of a larger string.

If the split token should be matched completely, rather than part of a substring, such as a scenario where words in a sentence are used as the delimiters, then the word-boundary \b metacharacter should be added around the pattern. If the delimiters were { "stack", "flow" } the current solution would split "stackoverflow" and return 3 strings { "stack", "over", "flow" }. If you needed an exact match, then the only place this would split would be at the word "stack" later in the sentence and not "stackoverflow".

Finally, if trimming the spaces before and after the delimiters is desired, add \s* around the pattern as in \s*(delim1|delim2|delimN)\s*.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions