Is there a way to escape a CDATA end token in xml?

Clearly, this question is purely academic. Fortunately, it has a very definite answer You cannot escape a CDATA end sequence. Production rule 20 of the XML specification is quite clear: 20 CData ::= (Char* - (Char* '>' Char*)) EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence '>'.

No exception EDIT2: The same section also reads: Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using and & CDATA sections cannot nest In other words, it's not possible to use entity reference, markup or any other form of interpreted syntax. The only parsed text inside a CDATA section is and it terminates the section Hence, it is not possible to escape within a CDATA section EDIT3: The same section also reads: 2.7 CDATA Sections Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "": Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section.

That allows it to be possible to split the token and put the two parts of it in adjacent CDATA sections ex:! CDATACertain tokens like > can be difficult and should be written as! CDATACertain tokens like > can be difficult and .

Clearly, this question is purely academic. Fortunately, it has a very definite answer. You cannot escape a CDATA end sequence.

Production rule 20 of the XML specification is quite clear: 20 CData ::= (Char* - (Char* '>' Char*)) EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence '>'. No exception. ".

EDIT2: The same section also reads: Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using ", and it terminates the section. Hence, it is not possible to escape > within a CDATA section. EDIT3: The same section also reads: 2.7 CDATA Sections Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

CDATA sections begin with the string "": Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section. That allows it to be possible to split the > token and put the two parts of it in adjacent CDATA sections. Ex: can be difficult and > should be written as can be difficult and .

Indeed. Well, I'm not an academic type but as I said in the question, I'm just curious about this. To be honest, I'll just take your word on this, because I can barely make sense out of the syntax used for the rule.

Thanks for your answer. – Juan Pablo Califano Oct 21 '08 at 23:17 1 It reads like this: Char* (the set of all character sequences) - (except) Char* '>' Char* (the set of all character sequences that include the substring '>'). – ddaa Oct 22 '08 at 9:12 Thanks for the extra clarification.

I'm accepting your answer as the one that better addresses the question I asked. (S. Lott's answer provides a work-around, which is fine, although it doesn't specifically deal with an actual escape char or sequence.

– Juan Pablo Califano Oct 22 '08 at 12:01 1 This is not an academic question. Think about an RSS feed of a blog post that contains a discussion about CDATA. – usr Jul 12 at 15:05.

You have to break your data into pieces to conceal the >. Here's the whole thing: > The first has the . The second > has the >.

Thanks for your answer. I was rather looking for something like a backslash equivalent (within strings in C, PHP, Java, etc). According to the rule quoted by ddaa, it seems like there's not such a thing.

– Juan Pablo Califano Oct 21 '08 at 23:11 3 This should be the accepted answer. Escaping is a slightly ambiguous term, but this answer definitely addresses the spirit of escaping. Too bad it doesn't fit the OP's narrow conception of escaping, which arbitrarily requires the backslash character to be involved for some reason.

– gWiz Jan 14 at 16:36 This is the correct answer. The question is wrong. – Pacerier Sep 12 at 12:31.

S. Lott's answer is right: you don't encode the end tag, you break it across multiple CDATA sections. How to run across this problem in the real world: using an XML editor to create an XML document that will be fed into a content-management system, try to write an article about CDATA sections.

Your ordinary trick of embedding code samples in a CDATA section will fail you here. You can imagine how I learned this. But under most circumstances, you won't encounter this, and here's why: if you want to store (say) the text of an XML document as the content of an XML element, you'll probably use a DOM method, e.g. : XmlElement elm = doc.

CreateElement("foo"); elm. InnerText = ""; And the DOM quite reasonably escapes the , which means that you haven't inadvertently embedded a CDATA section in your document. Oh, and this is interesting: XmlDocument doc = new XmlDocument(); XmlElement elm = doc.

CreateElement("doc"); doc. AppendChild(elm); string data = ""; XmlCDataSection cdata = doc. CreateCDataSection(data); elm.

AppendChild(cdata); This is probably an ideosyncrasy of the . NET DOM, but that doesn't throw an exception. The exception gets thrown here: Console.

Write(doc. OuterXml); I'd guess that what's happening under the hood is that the XmlDocument is using an XmlWriter produce its output, and the XmlWriter checks for well-formedness as it writes.

Well, I had an almost "real world" example. I usually load Xml from Flash that contains html markup within CDATA sections. Having a way to escape it could be useful, I guess.

But anyway, in that case, the CDATA content is usually valid XHTML, and so the "outer" CDATA could be avoided altogether. – Juan Pablo Califano Oct 22 '08 at 0:18 2 CDATA can nearly always be avoided altogether. I find that people who struggle with CDATA very frequently don't understand what they're really trying to do and/or how the technology they're using really works.

– Robert Rossney Oct 24 '08 at 8:44 Oh, I should also add that the only reason that the CMS I alluded to in my answer used CDATA was that I wrote it, and I didn't understand what I was really trying to do and/or how the technology works. I didn't need to use CDATA. – Robert Rossney Oct 24 '08 at 8:48 If you're using .

Net, the preceding comment about CDATA being avoidable is spot on - just write the content as a string and the framework will do all the escaping (and unescaping on read) for you from the real world....... xmlStream. WriteStartElement("UnprocessedHtml"); xmlStream. WriteString(UnprocessedHtml); xmlStream.WriteEndElement(); – Mark Mullin Aug 8 '10 at 15:28.

Breaking the CDATA into two is the right solution. The problem is by no means academic. One of systems I am using is exporting XHTML templates to XML file and does not treat CDATA right (it was in tag).

This means it was unable to import back its own backups without the trick. Thanks S. Lott.

You do not escape the > but you escape the > after by inserting >, think of this just like a \ in C/Java/PHP/Perl string but only needed before a > and after a . BTW, S. Lott's answer is the same as this, just worded differently.

I don't know; maybe if you had an xml embedded in an xml node. It's a contrived example, I know; and I have never had this problem, actually. I'm just curious to know if it's possible.

– Juan Pablo Califano Oct 21 '08 at 22:18 2 Use case: You may want to enclose free-form documentation text inside a CDATA block if you want to include HTML elements in it (assuming your XML schema doesn't allow elements from the HTML namespace). Then suppose part of the text is explaining how CDATA blocks are opened and closed. – Ates Goral Oct 22 '08 at 0:16 You make a good point and it definitely looks like a valid use case.So concealing the CDATA ending is the way to go?

Or maybe html-encoding it? (In case there's no other choice and that's valid within a CDATA section) – Juan Pablo Califano Oct 22 '08 at 0:22 It's probably not a valid use case. Free-form text containing HTML markup should probably be stored in a text node with the markup characters escaped - which will be done automatically by any DOM.

Your text explaining CDATA would itself have its markup characters escaped. – Robert Rossney Oct 24 '08 at 8:47 1 A use case (or two): We wrap text diffs inside XML. Those diffs are included in a CDATA block.

Sometimes the diffs are of XML files which contain CDATA. When checking in the code for this we found the build system wrapped check in comments in CDATA for RSS - the check in comment for the change to handle this include > (and broken the build server's RSS feed) – Ian G May 17 at 9:10.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions