For this test, I used the complete Project Gutenberg text of Charlotte Perkins Gilman's "The Yellow Wallpaper". This is about 33000 characters long, and is written in such a way as to have a lot of pattern words and common matches (lots of "the", lots of double-l's, lots of similar things). I figured it was written about as "poorly" as a plain text could be for the purposes of ciphering.
I passed it through two different keystreams. The first was the keyphrase of "white whale" (I originally was going to test with Moby Dick, which turned out to simply take too long for what I wanted). The second was a random-ish 3000 character keystream (yellowtest.txt). After encrypting it in both ways, I tested the output files against the original for digraph, trigraph and 4-graph matching.
What was required was for there to be more than one of the letter group patterns (at least two or three or four duplications of the same three or four or five or whatever letters). Then, the exact same duplication has to occur in the original text. This means, if "BZQ" shows up five times, I only care if each time it shows up in the place that "THE" was in the original text.
The expectation was that the 3000 random key characters, being about 10% of the total, versions the 10 key characters of "white whale" (being about one part in 3000) would have fewer true matches.
If you count only duplications that occur at the exact same place between the plain and cipher texts, you get what follows (keep in mind that it matches all sets of letters, so "the" would be a match of "th" and "he"):
What this means, to me, is a verification of several things I said in the original Vigenere article. The ratio of key to plain text needs to approach one and common phrases (if you look at the trigraph results for the 3000 word sample, notice how many of them were originally "THE") must be avoided, altered, or kept sparse.
For those interested, here are some result files:
Written by W Doug Bolden
For those wishing to get in touch, you can contact me in a number of ways
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
The longer, fuller version of this text can be found on my FAQ: "Can I Use Something I Found on the Site?".
"The hidden is greater than the seen."