Why is binary serialization faster than xml serialization?

Binary serialization is more efficient because write raw data directly and the XML needs format, and parse the data to generate a valid XML structure, additionally depending of what sort of data have your objects the XML may have a lot of redundant data.

Consider serializing double for example: binary serialization: writing 8 bytes from memory address to the stream binary deserialization: reading same 8 bytes xml serialization: writing tag, converting to text, writing closing tag - nearly thrice the I/O and 1000x more CPU utilization xml deserialization: tag reading/validation, reading string parsing it to number, reading/validation of closing tag. Little more overhead for I/O and some more for CPU.

Actually, like all things - it depends on the data, and the serializer. Commonly (although perhaps unwisely) people mean BinaryFormatter for "binary", but this has a number of foibles: in adds lots of type metadata (which all takes space) by default it includes field names (which can be verbose, especially for automatically implemented properties) Conversely, xml generally has overheads such as: tags adding space and IO the need to parse tags (which is remarkably expensive) lots of text encoding/decoding Of course, xml is easily compressed, adding CPU but hugely reducing bandwidth. But that doesn't mean one is faster; I would refer you to some sample stats from here (with full source included), to which I've annotated the serializer base (binary, xml, text, etc).

Look in particular at the first two results; it looks like XmlSerializer trumped BinaryFormatter on every value, while retaining the cross-platform advantages. Of course, protobuf then trumps XmlSerializer ;p These numbers tie in quite well to ServiceStack's benchmarks, here. BinaryFormatter *** binary Length: 1314 Serialize: 6746 Deserialize: 6268 XmlSerializer *** xml Length: 1049 Serialize: 3282 Deserialize: 5132 DataContractSerializer *** xml Length: 911 Serialize: 1411 Deserialize: 4380 NetDataContractSerializer *** binary Length: 1139 Serialize: 2014 Deserialize: 5645 JavaScriptSerializer *** text (json) Length: 528 Serialize: 12050 Deserialize: 30558 (protobuf-net v2) *** binary Length: 112 Serialize: 217 Deserialize: 250.

Well, first of all, XML is a bloated format. Every byte you send in binary form would be similar to at least 2 or 3 bytes in XML. For example, sending the number "44" in binary, you need just one byte.In XML you need an element tag, plus two bytes to put the numer: 44 which is a lot more data.

One difference is the encoding/decoding time required to handle the message. Since binary data is so compact, it won't eat up much clock cycles. If the binary data is a fixed structure, you could probably load it directly into memory and access every element from it without the need to parse/unparse the data.

XML is a text-based format which needs a few more steps to be processed. First, the format is bloated so it eats up more memory. Furthermore, all data is text and you might need them in binary form, thus the XML needs to be parsed.

This parsing still needs time to process, no matter how fast your code is. ASN.1 is a "binary XML" format that provides a good alternative for XML, but which will need to be parsed just like XML. Plus, if most of the data you use is text, not numeric, then binary formats won't make a big difference.

Another speed factor is the total size of your data. When you just load and save a binary file of 1 KB or an XML file of 3 KB then you probably won't notice any speed difference. This is because disks use blocks of a specific size to store data.

Up to 4 KB fits easily within most disk blocks. Thus, for the disk it doesn't matter if it needs to read 1 KB or 3 KB since it reads the whole 4KB block. But when the binary file is 1 megabyte and the XML is 3 megabytes, the disk will need to read a lot more blocks to just read the XML.

(Or to write it. ) And then it even matters if your XML is 3 MB or just 2.99 MB or 3.01 MB. With transport over TCP/IP, most binary data will be UU-encoded.

With UU-encoding, your binary data will grow with 1 byte for every 3 bytes in the data. XML data will not be encoded thus the size difference becomes smaller, thus the speed difference becomes less. Still, the binary data will still be faster since the encoding/decoding routines can be real fast.

Basically, size matters. :-) But with XML you have an additional alternative. You can send and store the XML in a ZIP file format.

Microsoft Office does this with it's newer versions. A Word document is created as an XML file, yet stored as part of a bigger ZIP file. This combines the best of both worlds, since Word documents are mostly text thus a binary format would not add much speed increase.

Zipping the XML makes storage and sending the data a lot faster simply by making it binary. Even more interesting, a compressed XML file could end up being smaller than a non-compressed binary file, thus the zipped XML becomes the faster one.(But it's cheating since the XML is now binary...).

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Why is binary serialization faster than xml serialization?

Related Questions

Which is faster binary tree or binary search tree?

Why use unsigned chars for writing to binary files? And why shouldn't stream operators be used to write to binary files?

Discrepancies in deserialization of valid xml files with System.Xml.Serialization.XmlSerializer?

Placing a Property into a Different XML Namespace with XML Serialization?

How to insert XML comments in XML Serialization?

XML Serialization of List - XML Root?