The paper claimed that the break was so "dangerous" that they would not reveal the method itself; Instead, they include a table ("Table 6") which they claimed could only be created if they had in fact broken SIMON-32/64 in a way that let them recover keys based on 2 chosen plaintexts with a 2.5% (?) chance of success. They claimed that they did roughly the following:
Aside from one error in Table 6, where the hexadecimal value of a cyphertext is shown incorrectly, the table does check out. (though some of the 4-grams don't appear anywhere in pg10.txt)
So, is it proof of a serious break in SIMON? No. There's another way to generate these values:
In fact, without any attempt at optimization (aside from trivial parallelization), an i7-4790k can find about a thousand examples in 8 seconds; about 10% of all keys yielded at least one set of matching blocks.
There are around 48,000 distinct 4-grams in "pg10.txt", so for any given key and 4-byte plaintext, there's about a 1-in-90,000 chance for it to encrypt to some other 4-gram. Since the probability is independent for each 4-gram, the odds of getting 1 are 1/2, and the odds of getting 2 are 1/4. This extremely rough calculation but not too far off the 1/10 actually obtained.
The attached program, which adapts an implementation of SIMON from github, can be built with g++-6 on Linux. It needs "pg10.txt" in the current directory. For parallelization, pass "-fopenmp". `trolled.txt` is one possible output of the program, and the few entries that I back-checked with an independent (Python) SIMON implementation also from github.
I just hope that, whatever the authors actually did to make "table 6", it didn't really take 120 days on two cluster computers.
Update: Several commenters believe the paper takes two, 8-byte blocks from the chosen text. If this is true, then even fewer of the blocks shown actually match "pg10.txt". For instance, I based my "4, 4-byte blocks" assumption on the appearance of "LORDhard" as a cyphertext. If this is the case, then my program would take about 48,000 times longer; since when you find two texts, the odds that they're "1 location" different out of 48,000 locations is about 1 in 48,000. However, since their Table 6 is full of 8-grams (and even 4-grams) that don't come from pg10.txt, I don't feel TOO bad that my program presents examples that aren't either.
Files currently attached to this page:
search.c | 5.0kB |
trolled.txt | 80.7kB |