Jul 2006

Diction & Style

Writing is hard: anyone who claims different has never attempted serious writing. Technical writing is difficult owing to the balance of technical information versus good (or passable) material. Writing using only a basic ascii editor such as vi is daunting because of an apparent lack of tools to assist the writer. Although a real lack of built in tools for editors like vi exists, certainly no shortage of tools and utilities are available to help. A particular set of writing tools are diction and style.

Forcing Reasonable Writing

Diction and style do not follow a pure set of rules for grammar checking, instead, they follow a set of algorithms which drive towards heuristic correction. Diction definitely pushes towards more correct, but not exact, writing and is accurate but not precise. Conversely, style grades words, sentences, paragraphs and ultimately an entire body based on a set of grading algorithms. None of the style algorithms by themselves may offer exactly what a writer wants but they are precise relative to each algorithm. Coupled together, diction and style effectively help the writer find a more correct vector in which to push material. The ultimate reward of using tools like diction and style is the more one uses them the more correct first drafts become over time. The same cannot be said for most word processing tools.

Understanding Results

The results of diction and style do have some pecularities, for instance, diction alone is not well suited for overly technical material simply because it is design for common writing. Not all of the algorithm grades from style apply for certain topics; indeed they may be thrown out completely depending upon the subject. Most technical writing is only concerned with 3 or 4 of the algorithms for example. Using external tools also presents the problem of not editing in place. A file must be processed by diction and style outside of the the editor, it can be edited at the same time (say in another terminal window). A little practice demonstrates how easily diction and style can help.

Obtaining and Installing

The tools can be obtained via a package manager or directly from GNU's Free Software Directory. The build and install process is the same as most GNU packaged software:

tar xzvf diction-X.XX.tar.gz
cd diction-X.XX/
./configure 
make 
(as root or sudo)
make install

A Sample Session of Diction

To understand how diction and style work, an example is needed. Following is some purposefully flawed text:

This is some text, some of this text is easy to read and may well help
you comprehend the very very difficult tasks that lay ahead regarding
your new system.

The example is frought with some obvious errors, following is the output of diction after one pass:

test:1: This is some text, some of this text is 
[easy -> (weak definition)] to read and [may -> = 
Do not confuse with "can".] well help you comprehend the 
very -> (use sparingly; try to use words that are 
strong in themselves for emphasis)] [very -> (use sparingly; 
try to use words that are strong in themselves for emphasis)] 
difficult tasks that [lay -> A transitive verb, not to be 
confused with the intransitive verb "lie". You "lie" down, 
and you "lay" an egg. However, note that the past tense 
of ``lie'' is ``lay'': Yesterday, I lay down and laid an egg.] 
ahead regarding your new [system -> Frequently used without need.].

The output of diction can be daunting, breaking up the example is easier:

test:1: This is some text, some of this text is
[easy -> (weak definition)]

What diction is saying is that the sentence fragment is not a strong sentence. In most cases, when a weak defintion is found rewording or tossing out the sentence is the best solution.

[may -> = Do not confuse with "can".]

Good advice when it applies, does it help? In rereading the sentence using may and well so closely does not sound right.

very -> (use sparingly; try to use words that are
strong in themselves for emphasis)] 

Good advice, the sentence does not need very in it at all.

[lay -> A transitive verb...

Straightforward fair warning - do not use it inless it actually means the definition:

 [lay -> A transitive verb, not to be
confused with the intransitive verb "lie". You "lie" down,
and you "lay" an egg. However, note that the past tense
of ``lie'' is ``lay'': Yesterday, I lay down and laid an egg.]

So once again the text would likely be better served without it.

ahead regarding your new [system -> Frequently used without need.].

The system warning can be ignored, remember, diction is working on a common grammar base and not technical material alone. A more succint sentence:

echo "Within this simple text is information to help setup a system." \
        | diction
(stdin):1: Within this simple text is information to help setup a 
[system -> Frequently used without need.].

A Sample Session of Style

Running style returns a variety of results of readability tests. The ones this site looks at closely are the Kincaid formula developed for military training manuals, the Flesh formula and the fog/smog indexes. The reason a few are used is because of the technical nature of the material. The manual page describes all of the results. Here is some sample output:

[mui@vela:~$] lynx --dump http://systhread.net/texts/200607subver.php | style
readability grades:
        Kincaid: 9.5
        ARI: 11.4
        Coleman-Liau: 12.9
        Flesch Index: 62.0
        Fog Index: 13.2
        Lix: 48.6 = school year 9
        SMOG-Grading: 11.9
sentence info:
        5760 characters
        1181 words, average length 4.88 characters = 1.48 syllables
        60 sentences, average length 19.7 words
        55% (33) short sentences (at most 15 words)
        16% (10) long sentences (at least 30 words)
        24 paragraphs, average length 2.5 sentences
        0% (0) questions
        70% (42) passive sentences
        longest sent 73 wds at sent 38; shortest sent 4 wds at sent 33
word usage:
        verb types:
        to be (56) auxiliary (22) 
        types as % of total:
        conjunctions 5% (58) pronouns 2% (20) prepositions 8% (98)
        nominalizations 2% (25)
sentence beginnings:
        pronoun (2) interrogative pronoun (0) article (17)
        subordinating conjunction (2) conjunction (0) preposition (6)

As per the man page, the Kincaid grade is geared towards technical documents and grades difficulty from 5.5-16.3. 14 is around where an author might be concerned about the material if it is meant to be digested by a large audience.

The flesh formula grades reverse on a scale of 0-100 and is targeted towards school texts ranging from grades 3-12. Flesh by itself it may not prove useful, but it helps to a degree, for instance, a high result might mean that the technical material is fine but general readability is difficult.

The fog index is another school grade and generally a 12 or above is too hard. A high fog index is not neccessarily bad, in the example it illustrates that someone with no concept of the topic or related topics cannot understand it - which makes sense. If the topic had been about the color to paint a shed, then the author should be concerned. The SMOG grading is a straight to school grade level requirement to read. A value of 11.9 is actually good considering the subject material.

The key to using style is looking for better scores for the type of material the author is writing. Not all scores will ever be great, generally scoring decently on at least half of the scores an author is concerned with does the job.

Summary

While the ascii editing world does not have straightforward grammar checking tools it does have non intrusive tools like diction and style exist to point writers in the right direction.