Moderator: mosher

Starting Alphabet Found

fiziwig
Elite member
fiziwig
Elite member
Joined: June 17th, 2010, 12:31 am

July 12th, 2010, 12:57 am #1

This is "A" starting alphabet. I'm not sure if it is "the" starting alphabet for reasons discussed in another post.

This is for the "ALLGOODQQUICK..." text and was derived from the first 85 letters of the text, and is guaranteed to encrypt/decrypt at least that much of the text.


CEDQRSTIXYLMOPZABFVGUHWJKN
ayzbqdsefghlwinkcmoprtuvjx


The search took 1,889,781,533 Trials. (About 4 hours on a 3.2 GHz Windows PC)
Next I need to write a quick program to test it against the balance of the text and make sure it isn't just a "ghost" alphabet.

--gary

Last edited by fiziwig on July 12th, 2010, 1:05 am, edited 2 times in total.
Quote
Like
Share

mosher
Super member
mosher
Super member
Joined: May 26th, 2009, 10:24 am

July 12th, 2010, 3:50 am #2

Hi fiziwig,

I've been following your energetic and most interesting research on Chaocipher, and I look forward to reading more of your output.

Regarding your post, the alphabets you found are very close to the correct ones, In fact, in each alphabet, there is exactly one letter that needs to be moved to another position for everything to work out. Given the high error propagation characteristic of Chaocipher, the deciphered text breaks down after a few iterations:
wrote:Session options
===============
Left starting alphabet: CEDQRSTIXYLMOPZABFVGUHWJKN
Right starting alphabet: AYZBQDSEFGHLWINKCMOPRTUVJX
Left zenith: B
Right zenith: C
PT disk pattern: R
Input file: Chaocipher\Raw\exhibit-1.ct.txt
Output file:
Mode: decipher

Input text has 13615 characters

ALLGOMSUSRIKAQMCXMNXNWVDQXOBLJRRDRNMHENOOWZYLTVUVLMDMJQINBQSZBTOVARLCJDGWANNEBNP ...
Should you want the correct alphabets, just say the word.

BTW, the starting alphabets for the first 55-letter "ALLGOOD..." will unlock the entire exhibit, not only the first line.

Regards,

Moshe
Quote
Like
Share

fiziwig
Elite member
fiziwig
Elite member
Joined: June 17th, 2010, 12:31 am

July 12th, 2010, 6:09 am #3

mosher wrote:Hi fiziwig,

I've been following your energetic and most interesting research on Chaocipher, and I look forward to reading more of your output.

Regarding your post, the alphabets you found are very close to the correct ones, In fact, in each alphabet, there is exactly one letter that needs to be moved to another position for everything to work out. Given the high error propagation characteristic of Chaocipher, the deciphered text breaks down after a few iterations:
wrote:Session options
===============
Left starting alphabet: CEDQRSTIXYLMOPZABFVGUHWJKN
Right starting alphabet: AYZBQDSEFGHLWINKCMOPRTUVJX
Left zenith: B
Right zenith: C
PT disk pattern: R
Input file: Chaocipher\Raw\exhibit-1.ct.txt
Output file:
Mode: decipher

Input text has 13615 characters

ALLGOMSUSRIKAQMCXMNXNWVDQXOBLJRRDRNMHENOOWZYLTVUVLMDMJQINBQSZBTOVARLCJDGWANNEBNP ...
Should you want the correct alphabets, just say the word.

BTW, the starting alphabets for the first 55-letter "ALLGOOD..." will unlock the entire exhibit, not only the first line.

Regards,

Moshe
OOPS! Stop the presses.

The alphabet I posted is the alphabet as is appears AFTER shuffling for the first encipherment a:C Notice that l:L are aligned already, and poised to encipher the second letter.

.CEDQRSTIXYLMOPZABFVGUHWJKN
ayzbqdsefghlwinkcmoprtuvjx

That explains the misplaced letters. Use the alphabet to decipher starting with the second letter of the text and all will be well. Meanwhile, when I get a few minutes to spare I will manually unshuffle the alphabet one more time so it will be correct. (And notice the alignment. I've printed the alphabets so that the pt alphabet starts at -1 (25) rather than 0. I did this to make it easier to visually verify the shuffling. My bad for failing to mention that. So wrap the pt "a" around to the right end of the row before copying the alphabets into your program.)

Here's the output of my log file which was produced in reverse order as the program un-shuffled the alphabet after successfully filling in all the slots and encrypting all the text I had entered (86 letters). I forgot to have the program un-shuffle one more time to get me back to the state before the first letter was enciphered. You can see by looking at the left end of each row that the enciphering is correct even up to the 86th letter which is more than halfway into the second line.

I used this log for debugging to be sure the shuffling was being done correctly.


.MASIXLPJBQTKNWRGUOFYCHZVED
ranktbucofimvsehqgdpwxjylz textIndex: 85
.IXLPJBQTKNRGUOFYCHZVEDMWAS
ektbucofimvshqgdpwxjylzran textIndex: 84
.GUFYCHZVEDMWASIOXLPJBQTKNR
vshqdpwxjylzranektgbucofim textIndex: 83
.KNRGSUFYCHZVEDMWAIOXLPJBQT
ofimvshnqdpwxjylzraektgbuc textIndex: 82
.VEMWAIOXLPJBQTKDNRGSUFYCHZ
pwxjlzraektgbucofiymvshnqd textIndex: 81
.UFYCHZVTEMWAIOXLPJBQKDNRGS
mvshnqdpwxcjlzraektgbuofiy textIndex: 80

...
.KNEDQRSTXYOPBZAFCVGUHIWJLM
qdshlicmoprvjegfxwnkayzbtu textIndex: 8
.ZNEDQRSTXYOPBAFCVGUHIWJLMK
qdsghlicmoprvjefxwnkayzbtu textIndex: 7
.NEDQRSTXYOPBFCVGUHIWJLMKZA
dseghlicmoprvjfxwnkayzbtuq textIndex: 6
.PBFVGUHIWJLMKZANCEDQRSTXYO
oprvjxwnkayzbtuqdsefghlicm textIndex: 5
.ZBFVGUHIWJLMKANCEDQRSTXYOP
opruvjxwnkayzbtqdsefghlicm textIndex: 4
.TXYOPZABFVGUHIWJLMKNCEDQRS
ghlicmoprtuvjxwnkayzbqdsef textIndex: 3
.YOPZABFVGUHWJLMKNCEDQRSTIX
lwicmoprtuvjxnkayzbqdsefgh textIndex: 2
.LOPZABFVGUHWJMKNCEDQRSTIXY
lwikcmoprtuvjxnayzbqdsefgh textIndex: 1
.CEDQRSTIXYLMOPZABFVGUHWJKN
ayzbqdsefghlwinkcmoprtuvjx textIndex: 0

1,889,781,533 Trials

@@@ NEED: the program to do one more unshuffle to make the alphabet
like it was BEFORE enciphering the letter at index 0.




--gary
Last edited by fiziwig on July 12th, 2010, 6:19 am, edited 2 times in total.
Quote
Like
Share

mosher
Super member
mosher
Super member
Joined: May 26th, 2009, 10:24 am

July 12th, 2010, 7:29 am #4

Hi fiziwig,

Cool! Then you've got it down pat -- congratulations on solving the known-plaintext problem.

Actually, you solved a more generic, and very valuable, problem: solving for the alphabets given any plaintext. You may be able to speed up your algorithm by about two magnitudes by selecting the best plaintext for the job. Here's an explanation of the method I used.

The first 86 characters in Exhibit 1 are:
wrote:PT: ALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVER
CT: CLYTZPNZKLDDQGFBOOTYSNEPUAGKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFUTHXNUVKGIM
Using your algorithm (which, by the way, is the same used by others, see Ben Norrington's implementation in Haskell :) -- great minds think alike), the logic runs like this:

(1) Insert (pt,ct)=(A,C) into the alphabets, then permute

(2) The next (pt,ct) pair is (L,L). Since neither the left nor the right alphabets have an 'L' in them (what I call a 'hole' for lack of a better name) we iterate through the empty slots in the alphabets, recursively trying (L,L) in each position. Permute the alphabets.

(3) The next pair is (L,Y). Since the right (pt) alphabet has an 'L' we get a free ride and can insert the 'L' in the right alphabet.

(4) The next pair is (G,T), This is a hole, so we recursively insert it in the empty slots in the alphabets. This is the second nested recursion.

(5) The next pair is (O,Z), again a hole, so we recurse for the third time.

When we have reached the 86th pair, we have identified all 26 letters in both the left and right alphabets (technically we only need to identify 25 letters in each alphabet, in which case we implicitly know the 26th, but waiting to find all 26 is just cleaner algorithmically ;) ).

It turns out that using these first 86 pt/ct letters means dealing with 14 holes:
wrote:     * ** *   * *** *   * *  *  *     *
PT: ALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVER
CT: CLYTZPNZKLDDQGFBOOTYSNEPUAGKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFUTHXNUVKGIM
The bottom line is that there are 14 nested recursions, requiring approximately 26x25x24...x14 = 64,764,752,532,480,000 combinations (this may be less because of diminishing empty slots, but you get the idea).

I wrote a simple program to locate sequences of plaintext/ciphertext in Exhibit 1 that (a) account for all 26 letters in both alphabets, and (b) gives the minimal number of holes. Here's the output:
wrote:Found! (length: 86, #holes: 14, len(ptAlph)=26 len(ctAlph)=26)

PT: ALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOX
ESJUMPOVER

CT: CLYTZPNZKLDDQGFBOOTYSNEPUAGKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFU
THXNUVKGIM

Hole Data:

1, (pt,ct)=(L,L)
3, (pt,ct)=(G,T)
4, (pt,ct)=(O,Z)
6, (pt,ct)=(D,N)
10, (pt,ct)=(I,D)
12, (pt,ct)=(K,Q)
13, (pt,ct)=(B,G)
14, (pt,ct)=(R,F)
16, (pt,ct)=(W,O)
20, (pt,ct)=(X,S)
22, (pt,ct)=(S,E)
25, (pt,ct)=(M,A)
28, (pt,ct)=(V,I)
34, (pt,ct)=(Y,R)

-----------------------------------------------------

Found! (length: 70, #holes: 13, len(ptAlph)=26 len(ctAlph)=26)

PT: OXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZ

CT: YSNEPUAGKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFUTHXNUVKGIMVEZ

Hole Data:

20, (pt,ct)=(X,S)
21, (pt,ct)=(E,N)
22, (pt,ct)=(S,E)
23, (pt,ct)=(J,P)
24, (pt,ct)=(U,U)
25, (pt,ct)=(M,A)
26, (pt,ct)=(P,G)
28, (pt,ct)=(V,I)
33, (pt,ct)=(Z,C)
34, (pt,ct)=(Y,R)
45, (pt,ct)=(H,O)
47, (pt,ct)=(I,F)
66, (pt,ct)=(C,W)

-----------------------------------------------------

Found! (length: 66, #holes: 12, len(ptAlph)=26 len(ctAlph)=26)

PT: JUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZ

CT: PUAGKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFUTHXNUVKGIMVEZ

Hole Data:

24, (pt,ct)=(U,U)
25, (pt,ct)=(M,A)
26, (pt,ct)=(P,G)
27, (pt,ct)=(O,K)
28, (pt,ct)=(V,I)
30, (pt,ct)=(R,N)
33, (pt,ct)=(Z,C)
34, (pt,ct)=(Y,R)
45, (pt,ct)=(H,O)
47, (pt,ct)=(I,F)
66, (pt,ct)=(C,W)
67, (pt,ct)=(K,Y)

-----------------------------------------------------

Found! (length: 63, #holes: 11, len(ptAlph)=26 len(ctAlph)=26)

PT: POVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZ

CT: GKIUNKNCRINRCVKJNHTOAFQPDPNCVLTVFICOTSSLWYYIHBICFUTHXNUVKGIMVEZ

Hole Data:

27, (pt,ct)=(O,K)
28, (pt,ct)=(V,I)
29, (pt,ct)=(E,U)
30, (pt,ct)=(R,N)
33, (pt,ct)=(Z,C)
34, (pt,ct)=(Y,R)
45, (pt,ct)=(H,O)
47, (pt,ct)=(I,F)
66, (pt,ct)=(C,W)
67, (pt,ct)=(K,Y)
78, (pt,ct)=(J,X)

-----------------------------------------------------

Found! (length: 97, #holes: 10, len(ptAlph)=26 len(ctAlph)=26)

PT: ROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZY
DOGTOSAVETHEIRPARTYWA

CT: SYYHZUVLFFURRHRIIFFDZMTTOVKLZOVLPVPPGVGEWWEFRFYHKXOPKXRQSZKLCZKHZWXRJXLMVFGG
FGYIFDAEINIWPOMOUVRFB

Hole Data:

290, (pt,ct)=(O,Y)
292, (pt,ct)=(N,H)
293, (pt,ct)=(F,Z)
295, (pt,ct)=(X,V)
296, (pt,ct)=(E,L)
297, (pt,ct)=(S,F)
300, (pt,ct)=(M,R)
308, (pt,ct)=(Z,D)
310, (pt,ct)=(D,M)
313, (pt,ct)=(T,O)

-----------------------------------------------------

Found! (length: 96, #holes: 9, len(ptAlph)=26 len(ctAlph)=26)

PT: OWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZYD
OGTOSAVETHEIRPARTYWA

CT: YYHZUVLFFURRHRIIFFDZMTTOVKLZOVLPVPPGVGEWWEFRFYHKXOPKXRQSZKLCZKHZWXRJXLMVFGGF
GYIFDAEINIWPOMOUVRFB

Hole Data:

292, (pt,ct)=(N,H)
293, (pt,ct)=(F,Z)
295, (pt,ct)=(X,V)
296, (pt,ct)=(E,L)
297, (pt,ct)=(S,F)
300, (pt,ct)=(M,R)
308, (pt,ct)=(Z,D)
310, (pt,ct)=(D,M)
313, (pt,ct)=(T,O)

-----------------------------------------------------

Found! (length: 82, #holes: 8, len(ptAlph)=26 len(ctAlph)=26)

PT: ERLAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRP
ARTYWA

CT: IIFFDZMTTOVKLZOVLPVPPGVGEWWEFRFYHKXOPKXRQSZKLCZKHZWXRJXLMVFGGFGYIFDAEINIWPOM
OUVRFB

Hole Data:

306, (pt,ct)=(L,F)
308, (pt,ct)=(Z,D)
309, (pt,ct)=(Y,Z)
310, (pt,ct)=(D,M)
311, (pt,ct)=(O,T)
313, (pt,ct)=(T,O)
315, (pt,ct)=(S,K)
329, (pt,ct)=(W,W)

-----------------------------------------------------

Found! (length: 80, #holes: 7, len(ptAlph)=26 len(ctAlph)=26)

PT: LAZYDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPAR
TYWA

CT: FFDZMTTOVKLZOVLPVPPGVGEWWEFRFYHKXOPKXRQSZKLCZKHZWXRJXLMVFGGFGYIFDAEINIWPOMOU
VRFB

Hole Data:

308, (pt,ct)=(Z,D)
309, (pt,ct)=(Y,Z)
310, (pt,ct)=(D,M)
311, (pt,ct)=(O,T)
313, (pt,ct)=(T,O)
315, (pt,ct)=(S,K)
329, (pt,ct)=(W,W)

-----------------------------------------------------

Found! (length: 77, #holes: 6, len(ptAlph)=26 len(ctAlph)=26)

PT: YDOGTOSAVETHEIRPARTYWALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYW
A

CT: ZMTTOVKLZOVLPVPPGVGEWWEFRFYHKXOPKXRQSZKLCZKHZWXRJXLMVFGGFGYIFDAEINIWPOMOUVRF
B

Hole Data:

310, (pt,ct)=(D,M)
311, (pt,ct)=(O,T)
313, (pt,ct)=(T,O)
315, (pt,ct)=(S,K)
316, (pt,ct)=(A,L)
329, (pt,ct)=(W,W)

-----------------------------------------------------

Found! (length: 198, #holes: 5, len(ptAlph)=26 len(ctAlph)=26)

PT: EANDEQUALSTATIONTOWHICHTHELAWSOFNATUREANDOFNATUREXSGODENTITLETHEMQADECENTRES
PECTTOTHEOPINIONSOFMANKINDREQUIRESTHATTHEYSHOULDDECLARETHECAUSESWHICHIMPELTHEMTO
THESEPARATIONZWEHOLDTHESETRUTHSTOBESELFJEV

CT: FFQBVLDYYBBQTLBAOWLOZSDYZFQECGTATTDSEXFVKJAYYVNUQWYZXXDYLGVXZPXOGLZRBZNMXFFW
NUCTFXKVFBSPUMKMJSAEVBPKLBQOYMJRXQJQGRHSQKYYKISZUOEEDCYRQPHZJCDIRIKQFRLTOBPLDBOL
WBVLJHJNRJUYYFPUOXMQELBIHKWFFWUPOKOOKHZMPR

Hole Data:

5929, (pt,ct)=(N,Q)
5930, (pt,ct)=(D,B)
5932, (pt,ct)=(Q,L)
5933, (pt,ct)=(U,D)
5948, (pt,ct)=(C,S)
This simple program found that the plaintext starting at offset 5929 and having a length of 198 characters ("EANDEQUALSTATIONTOWHICHTHELAWS...") gives all 26 letters in both alphabets with only five (5) holes, meaning only five nested recursions. Thus, at most the program will need to text 26x25x24x23x22 = 7,893,600 cases.

Once you've solved for the starting alphabets beginning with offset 5929, you can rewind the alphabets to the beginning (offset 0) by deciphering the known ciphertext (the plaintext for the ciphertext starting with offset 5500 is unknown, so you cannot depend on it).

Having said all of this, there will definitely be times (e.g., using a known or supposed crib) when we cannot choose our plaintext. In that case, the algorithm will just need to examine as many cases as it needs to.

I hope this is clear and that it helps reduce your algorithm's running time.

Regards,

Moshe
Last edited by mosher on April 8th, 2015, 6:15 pm, edited 2 times in total.
Quote
Like
Share

fiziwig
Elite member
fiziwig
Elite member
Joined: June 17th, 2010, 12:31 am

July 12th, 2010, 3:00 pm #5

mosher,

That's a very cool idea! I like it. I can load the whole available text and then do a search for spans with minimal holes.

I have another idea for an algorithm I want to try out. I have no idea if it would be faster or slower.

Start out with the whole alphabet in normal alphabetical order and start enciphering (or deciphering) the sample text. As soon as an error is encountered, then either in the pt alphabet or the ct alphabet (random with 50% probability) find the correct letter and the incorrect letter in the original ordered alphabet and swap them. Then reset everything and start over enciphering. That avoids recursion and all the overhead time spent in backtracking and unshuffling. Each trial is a clean slate with an alphabet that is theoretically closer to the right one, as the original ordered alphabet get mutated each round.

Before going to the trouble of writing code I'm going to try it out by hand with my 6-letter Chaocipher machine. I have a little Windows pop-up app that encrypts/decrypts messages based on a six-letter alphabet at the click of a button.

--gary

ON EDIT: My second idea mentioned above turns out to be terrible! I'll stick with optimizing the first method.

Last edited by fiziwig on July 12th, 2010, 5:23 pm, edited 2 times in total.
Quote
Like
Share

james
Elite member
james
Elite member
Joined: May 14th, 2010, 3:48 pm

December 18th, 2010, 5:14 pm #6

I have tried out Mosher's idea on my Maze algorithm and it cut solving time for the alphabets of Exhibit 1 from 35 minutes to 3 seconds.

(I actually was a bit lazy and just searched the first 5500 letters of Exhibit 1 and found a stretch with 6 holes, compared to mosher's 5.)

Definitely cool, or to use my jargon 'a very good idea'.

Quote
Like
Share

mosher
Super member
mosher
Super member
Joined: May 26th, 2009, 10:24 am

April 8th, 2015, 4:58 pm #7

For the record, on July 15, 2010 Carl Scheffler posted a talkback to Nick Pelling's Cipher Mysteries post, entitled "The Chaocipher Revealed!", in which he explained how he found a plaintext/ciphertext sequence with only 4 (!) holes:
wrote:Indeed, I followed very nearly exactly the same method as in your post: searching for the sequence of characters with the fewest holes and searching through the possible ways of plugging the holes. I see in your post that you found the smallest number of holes to be 5. With a very slight modification in the method, it is possible to do it with only 4 holes. This is not so important for Exhibit 1 (a very long text) but could be important for shorter texts where we have a smaller chance of finding good sequences of characters.

In my version, I search not only forward in the text for characters that have already been seen, but also backwards. This provides a better chance of filling in more of what we already know about the alphabet. To search backwards in the text it is necessary to unpermute the alphabets, i.e. make the opposite permutation of what is required for searching forwards in the text. Other than that the search remains the same.
As Carl wrote, the trick is to search both forward and backward when extending the sequences. In Carl's case, the best plaintext/ciphertext match is at offset 7187 in Exhibit #1:

                      32222222222111         1111111333333333444444444455555
                   <--098765432109871234567890123456123456789012345678901234-->
pt:  ...COLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM...
ct:  ...THYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...


This should be understood as follows:

[*] The pt/ct pair S/E is the first pair in the chaining process.
[*] The pair S/S is the second pair, with the plaintext 'S' already encountered.
[*] The pair I/G is next. This is a 'hole' because 'I' and 'G' have not yet been encountered. We use this hole because the pair E/M to the left of S/E is also a hole.
[*] The pair T/H is next, and is the second hole encountered.
[*] The pair Y/S is next, with the ciphertext 'S' already encountered.
[*] The pair W/O is another hole, making that three (3) holes so far.
[*] We now have a good run of ten (10) non-hole pairs on the right-hand side, H/E to R/E, that have plaintext or ciphertext letters already encountered.
[*] The next right-hand pair, A/Q, is a hole, but the pair on the left-hand side, E/M, is not longer a hole, so we now extended the sequence on the left-hand side.

If we continue as per the diagram and the ordered take-out indices, we will have reconstructed the entire left and right alphabets using a plaintext/ciphertext sequence of 54 letters, and all this with only four (4) holes.

In summary, Scheffler's insight of extending on both the right and the left sides of the pt/ct sequences reduced the number of holes encountered from five (5) to four (4). This can be a very significant reduction.


Last edited by mosher on June 5th, 2015, 11:31 am, edited 5 times in total.
Quote
Like
Share