Hi! I am trying to make a code that cracks any symbol to letter code. So if a friend had a code in which a circle was “a”, a star was “b” and so on, it would be able to solve it. I am having trouble with a small piece of it. I am trying to generate (and for now just print) every combination of the 26 English letters possible. I made a functional mini version. combinations_test.py · GitHub I tried to scale it up, though, and got an error about too many nested loops. I have no idea how to write this differently. Help appreciated. Thanks in advance!
I think I found a way to beat the system. Cut the nested loops in half and assign the other half to a function underneath. The code is here. generator.py · GitHub It is working, but is awkward. Suggestions on alternatives would be appreciated. Thanks!
A few thoughts.
Your code is limited to being able to generate keys of a specific length. To change your key length, you have to write a different program, with the necessary number of loops. If your friend sends a message which is 10 symbols long, you need to write a program with 10 loops; the next message is 15 symbols long, and you have to write a new program with 15 loops. And so on.
So to upgrade your generator, I’d suggest re-thinking it to be a program which can produce keys of a provided length. You want a routine which takes a target length value, and produces all keys of that length.
Along those lines, I’d point out that a key of length “N” is every letter appended to all possible keys of length “N-1”. I.e. all keys of length 3 can be made by adding every letter to all the keys of length 2. Tasks like this are most readily performed using recursion.
Stepping back from your code, I’d point out another possible approach. Your first post mentions “symbol to letter code”, which is commonly known as a “substitution cipher”. If you know what the substitutions are [circle is a, star is b, square is c, etc] then you might consider making a lookup table:
codes = [circle, star, square, triangle, hexagon ...]
Where each symbol is in the table positioned to match a letter. Then, given a symbol message, look up each symbol in the table and replace with the matching letter.
E.g. given the message “star hexagon circle triangle”, you go through each symbol: first is star, which is in the table at 1, which matches ‘b’. hexagon is at 5 which means ‘e’, circle is 0, ‘a’, and the final message is “bead”.
Thanks for you response. In regard to limiting the key length, I am not sure what you mean. For any key in which there is a symbol for each letter there will be a set. As I understand, this key that it generates could be applied to a message two words or two pages long. For example, if a = 2, b = 6, c = 1, d = 5, e = 3, f = 4, and so on, and the message was “bad”, the code would look like 625. My code would receive “625”, split it ( if applicable) into words and letters assigned to certain variables. Then it would search for the key. Eventually it would find set = [‘c’, ‘a’, ‘e’, ‘f’, ‘d’, ‘b’] My code right now would just print that. I am working on the code that would then say “ok, we have 625. Let’s see what letters are at set[6 - 1], set[2-1], and set[5-1] (the -1 is because python counts from 0.)” It would get and compile those letters and then check if that is a real word in the dictionary. It is, it would check the other words. If they all were real words according to the key, you can conclude you have the only possible message. It would then speak, email to me, and print out the message.
I think that concept works; if not please show me. However, the difficulty lies in the fact that if the code tried (as it does) 5,000 combinations/min, it would take centuries, possibly hundreds of centuries to find the one.
So I need another solving method or a way to make to code run faster. For example, if I could make it only generate combinations in which each letter only appeared once, it would drastically reduce then number of combinations.
Here is the code I have now, and thank you so much for you time and help! newest.py · GitHub
OK, it seems to me you are not working on a substitution cipher. If that were so, then both you and the sender “know” the substitution rules. You wouldn’t be searching for the key because you already know it.
If I understand it then, you are trying to determine the substitution rules from an arbitrary message. Sort of like the Enigma code breakers in World War II.
A short message is going to be problematic to decipher. Take your example, “625”. You suggested “bad”, but it’d also work for cad, lad, sad, mad, cat, bat, bet, fed, and so on. A brute-force approach is going to provide multiple possible matches for any given message.
Even a long message will be hard - you don’t mention word separators, so you’re making guesses as to how the words are broken out.
A common technique for cipher breaking is to take letter/word frequency into account. For the English language, data can be found on how often specific letters or short words occur. I.e. letters have the frequency like “e, t, o, a, n,” - I can’t remember them all off-hand. If you have word breaks, then any one letter words come from a very small set; two letter words from a (relatively) small set, etc. “The” is one of the most common three letter words in the English lexicon.
if I could make it only generate combinations in which each letter
only appeared once
Oh yes, there are only 403,291,461,126,605,635,584,000,000 possible non-repeating combinations of the 26 letters of the alphabet. [26!, or 26-factorial.] Much smaller than the
6,156,119,580,207,157,310,796,674,288,400,203,776 possible repeating combinations…
First, if you look at the code, there is word division. Agreed, without it, it would be virtually impossible. Second, yes, while finding only the non-repetitions helps (a lot), it is still an immense number. At 5000/sec, it would take 8.1430153e+23 centuries! We need a different approach or another limiter similar to the no repetition. Thirdly, you are obviously correct. A three letter word could have any number of meanings. However, since there are word separators assumed and coded, anything too much longer will have only one functional key. I will have to think about how to implement letter frequency. While possibly helpful, it could be very difficult to code. Thanks for the help! Any other thought would be appreciated!
I’d follow some hints from here. If the message has any doubled letters, or one or two letter words, then you could use those as a limiter to discard invalid keys. Letter frequency is only likely to be helpful with very long messages.
Thanks! It will be a chore trying to come up with a flowchart concept and code it to do that effectually. I’ll have to think on that. Thanks again! I’ll post other thoughts or progress.