I'm no cryptanalyst, but if you know something about the characteristics of the files you might have a chance.
I'm no cryptanalyst, but if you know something about the characteristics of the files you might have a chance. For example, lets assume that you know that both original plaintexts: contain plain ASCII English text are articles about sports (or whatever) Given those 2 pieces of information, one approach you might take is to scan through the ciphertext 'decrypting' using words that you might expect to be in them, such as "football", "player", "score", etc. Perform the decryption using "football" at position 0 of the ciphertext, then at position 1, then 2 and so on. If the result of decrypting a sequence of bytes appears to be a word or word fragment, then you have a good chance that you've found plaintext from both files.
That may give you a clue as to some surrounding plaintext, and you can see if that results in a sensible decryption. And so on. Repeat this process with other words/phrases/fragments that you might expect to be in the plaintexts.
In response to your question's edit: what Schneier is talking about is that if someone has 2 ciphertexts that have been XOR encrypted using the same key, XORing those ciphertexts will 'cancel out' the keystream, since: (A ^ k) - ciphertext of A (B ^ k) - ciphertext of B (A ^ k) ^ (B ^ k) - the two ciphertexts XOR'ed together which simplifies to: A ^ B ^ k ^ k - which continues to simplify to A ^ B ^ 0 A ^ B So now, the attacker has a new ciphertext that's composed only of the two plaintexts. If the attacker knows one of the plaintexts (say the attacker has legitimate access to A, but not B), that can be used to recover the other plaintext: A ^ (A ^ B) (A ^ A) ^ B 0 ^ B B Now the attacker has the plaintext for B.It's actually worse than this - if the attacker has A and the ciphertext for A then he can recover the keystream already. But, the guessing approach I gave above is a variant of the above with the attacker using (hopefully good) guesses instead of a known plaintext.
Obviously it's not as easy, but it's the same concept, and it can be done without starting with known plaintext. Now the attacker has a ciphertext that 'tells' him when he's correctly guessed some plaintext (because it results in other plaintext from the decryption). So even if the key used in the original XOR operation is random gibberish, an attacker can use the file that has that random gibberish 'removed' to gain information when he's making educated guesses.
You hit the nail right on the head! I just read a page written by tanenbaum (pg. 749, Computer Networks, 4th Edition, 2003) and his advice was the same as yours!
I believe this attack is called "keystream reuse attack". Thanks! – OckhamsRazor Apr 14 at 22:59 This isn't a weakness of "xor encryption" - xor is simply used by stream ciphers to combine the keystream with the plaintext.
The weakness is using the same IV and key for two different messages, resulting in the same keystream. – Nick Johnson Apr 15 at 1:08 @Nick: You're right. I'll remove that paragraph.
– Michael Burr Apr 15 at 7:28 @Michael: This is very helpful. Once I realized the xor round-circle was a known-plaintext attack. ;) Do you know offhand if there's a ciphertext & non-"guess & check" solution here?
– Paul Nathan Aug 23 at 18:46.
You need to take advantage of the fact that both files are plain text. There is a lot of implications which can be derived from that fact. Assuming that both texts are English texts, you can use fact that some letters are much more popular than the others.
See this article. Another hint is to note the structure of correct English text. For example, every time one statements ends, and next begins you there is a (dot, space, capital letter) sequence.
Note that in ASCII code, space is binary "0010 0000" and changing that bit in a letter will change the letter case (lower to upper and vice versa). There will be a lot of XORing using space, if both files are plain text, right? Analyse printable characters table on this page.
Also, at the end you can use spell checker. I know I didn't provide a solution for your question. I just gave you some hints.
Have fun, and please share your findings. It's really an interesting task.
That is interesting. The Schneier book does indeed say that it is easy to break this. And then he kind of leaves it hanging at that.
I guess you have to leave some exercises up to the reader! There is an article by Dawson and Nielson that apparently describes an automated process for this task for text files. It's a bit on the $$ side to buy the single article.
However, a second paper titled A Natural Language Approach to Automated Cryptanalysis of Two-time Pads references the Dawson and Nielsen work and describes some assumptions they made (primarily that the text was limited to 27 characters). But this second paper appears to be freely available and describes their own system. I don't know for sure that it is free, but it is openly available on a Johns Hopkins University server.
That paper is about 10 pages long and looks interesting. I don't have time to read it at the moment but may later. I find it interesting (and telling) that it takes a 10 page paper to describe a task that another cryptographer describes as "easy".
I don't think you can - not without knowing anything about the structure of the two files.
Schneier's piece was with regard to the fact that in his example, you do know a fair amount of detail about the structure of those files. – Rory Alsop Apr 15 at 9:35.
Unless you have one of the plaintext files, you can't get the original information of the other. Mathematically expressed: p1 XOR p2 = en You have one equation with two unknowns, you can't possibly get something meaningful out of it.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.