The title, Writing Documentation, sounds somewhat formal. However, in this article I refer to documentation a broad sense, not only to documentation accompanying a particular piece of software, but to any related textual pieces of information. This textual information could be as short as a few lines and, for example, describe how to to start a program with all of its command line options and environment variables set correctly. On the other hand, the text could be several tens of thousands lines long, elaborating all the tricks a group of users has learned over the years while using a large software system.
With today's GNU/Linux distributions, the aspiring documentation writer immediately finds herself in fat city: there are several systems to chose from! Three documentation systems will be introduced in this article series. Here, I start with POD. Next month I'll address LaTeX in conjunction with latex2html, and in part 3 DocBook.
The systems cater different documentation needs and all have their highs and lows. But before assessing the pros and cons of the different systems, let me put up some requirements, which I want to impose on the documentation systems.
The sources of the documentation should be:
Requiring portability ensures that the texts' sources can be read and modified on a wide variety of computer systems, thereby making the documentation accessible to other programmers, which is what Open Source Software is all about.
Just as I require certain features in the documentation's source format, so I do with the output.
HTML support in turn requires ``hyperlinks'', this is, references between documents or parts thereof that can be followed in a convenient way. References also help to implement the Modular Requirement in my list of source format features.
Let us now look at a particularly easy to use format: Perl's Plain Old Documentation.
The ``Plain Old Documentation'' system that ships with every Perl distribution is simplest documentation system in my selection. It is simple to learn, simple to use, but -- and I hesitate to write therefore -- also the most limited of the three. Anyhow, the article you are currently reading (yes this one!) has been prepared with POD. If it is good for the goose, it can't be bad for the gander...
The big advantages of POD are
pod2man --help
to see if it is installed.
The POD format defines three different kinds of paragraphs. Paragraphs are separated from each other by one or more completely (!) empty lines.
Ordinary paragraphs will be filled and justified (if the output format allows for justification) when output.
=
'' in column zero, immediately followed by an identifier. Usually, command paragraphs consist of single lines. Yet they are syntactically paragraphs, because they are separated by blank lines before and after them.Text is sectioned by =head
N commands, like
=head1
primary_heading
=head2
secondary_heading
=head3
tertiary_heading
which also define the section headings primary_heading, etc.. How many heading levels (this is largest N permitted) actually are accepted, depends on the POD-to-something converter. For example, pod2man allows only two levels, pod2html allows up to six levels.
I have added line and column numbers to the source of the examples. The line numbers do not appear in the real source. They are included here to point out the empty lines that must separate the command paragraphs, this is, those starting with an equal sign in column 0. Additionally, I have added a column-number ruler at the beginning of the next example to clarify where column 0 starts.
Example:
1 2 3 4 5 0123456789012345678901234567890123456789001234567890 1 =head1 Hardware 2 3 The physical parts of your computer are called "hardware". 4 5 =head1 CPU 6 7 The CPU is the most important part of your computer. 8 9 =head1 Mass Storage 10 11 Mass storage devices store data permanently. 12 13 =head2 Hard Disk Drives 14 15 Hard disk drives provide fast random access to data. 16 17 =head2 Magnetic Tapes 18 19 Magnetic tapes provide slow sequential access to data. 20 21 =head1 Software 22 23 This is where the trouble starts ...
Itemized, enumerated or description lists are produced with
=over
N
=item
label
=item
label
...
=back
where =over
N starts a list that is indented at least N spaces, and extends until =back
. Depending on the first label the POD-to-something translators generate an itemized list (label = *
), a numbered list (label = 1
) or a description list (label starts with a letter).
Example: itemized list
Again, I have added line numbers to alert the reader of the (many) empty lines used for separating the command paragraphs.
Source
1 =over 4 2 3 =item * 4 5 Fruit, particularly non-imported fruit like ... 6 7 =item * 8 9 Though not tasty, vegetables should make up a large part of your 10 daily diet. 11 12 =item * 13 14 Fish is much easier digestible than meat. Therefore, ... 15 16 =back
Result
Example: enumerated list
Source
1 =over 4 2 3 =item 1. 4 5 Ensure that the power switch is in position "OFF". 6 7 =item 2. 8 9 Plug in the power cord. 10 11 =item 3. 12 13 Switch the power switch in position "ON". 14 15 =back
Result
Example: description list
Source
1 =over 8 2 3 =item Robert 4 5 Lead singer 6 7 =item Jimmy 8 9 Lead guitar 10 11 =item John-Paul 12 13 Bass guitar 14 15 =item John 16 17 Drums and percussion 18 19 =back
Result
Within Ordinary Text, several markup commands are recognized. All markup commands start with a single capital letter and enclose their argument within angle brackets: LETTER<argument>. The argument can consist of multiple words, which can span more than one line.
I
corresponds to the HTML tags em
and var
, thus it is primarily used for emphasizing words or marking up variables.
Examples:
is produced by
Do I<not> remove your Linux kernel!
is generated with
Use B<cd> I<directory> to change your working directory to I<directory>.
B
corresponds to the HTML tag b
. It is used to emphasize in text and to mark up program names or switches.
Examples:
comes from
B<Always> shut down your machine before switching it off.
is the result of
B<podchecker> accepts the options B<-warnings> and B<-nowarnings>.
C
marks up code or anything else which is to be taken literally. The corresponding HTML tags is are code
, samp
, and tt
.
Examples:
main
.
is generated by
Every C-program must have a function called C<main>.
[1 1 0]
, and boolean true by [1 1 1]
.
is produced by
Boolean false is represented by C<[1 1 0]>, and boolean true by C<[1 1 1]>.
L
is a bit tricky. Therefore, I have devoted the next section to it.L
-command is distantly related to HTML's <a href = "reference">description</a>, however, in POD, reference is not a general unified resource locator (URL).
reference can only refer to (automatically by the POD-to-something translator) generated labels. These labels are inserted for every =head
and =item
. The label associated with =head
heading is heading downcased, but otherwise unchanged, e.g.
=head1 A Multi-Word Heading (MWH)
automatically gets assigned the label
a multi-word heading (mwh)
The labels of =item
s are prefixed by item_
, spaces are replaced by underscores, and non-alphanumeric characters are replaced by their hexadecimal ASCII code prefixed by a percent sign. Anybody expected an easy rule? So, one of the items in this article,
=item Automatic Reference Generation.
has the label
item_Automatic_Reference_Generation%2E
because the ASCII number of the period is 46 in decimal or 2e in hexadecimal.
Example:
Source
=head1 Introduction
Section L<"concepts"> introduces the basics of the field.
=head1 Concepts
...
=head1 Synchronization
=over 4
=item Deadlocks
=item Race Conditions
=item Recovering from Deadlocks
=back
How to cope with deadlocks was already discussed in L<Deadlocks|"item_Deadlocks">, and L<Recovering from Deadlocks|"item_Recovering_from_Deadlocks">.
Result
Introduction
Section concepts introduces the basics of the field.
Concepts
...
Synchronization
- Deadlocks
- Race Conditions
- Recovering from Deadlocks
How to cope with deadlocks was discussed in Deadlocks, and Recovering from Deadlocks.
The L
-command is very limited in its use, for the writer cannot insert places to refer to with an L
-command; HTML-like ``anchors'' are missing.
A second limiting factor are some POD translators trying to be smart and decorate link with additional text. For example, pod2latex mangles both references to items in the above example:
How to cope with deadlocks was discussed in the \textsf{Deadlocks$|$"item\_Deadlocks"} entry elsewhere in this document, and the \textsf{Recovering from Deadlocks$|$"item\_Recovering\_from\_Deadlocks"} entry elsewhere in this document.
where I have underlined the words added by pod2latex. Clearly, we want a better mechanism. The mechanism exists in format-specific paragraphs.
We have just seen that the L
-command is somewhat difficult to control. Why can't we simply use a HTML-reference? The terse answer, ``because POD is not HTML'', leads to the solution. If we had a way to say ``this text is for HTML, this line is for LaTeX, and this paragraph is for ''SnaFoo``, we could use the specific markup provided by these formats.
The special command
=for
format paragraph_of_text
tells a translator to look at format before processing paragraph_of_text. If the translator feels responsible for handling format, it transforms paragraph_of_text according to its own rules, otherwise it completely ignores the paragraph. The second part of the translator's name usually specifies which format it takes care of. For example, pod2man transforms =for man
paragraphs, pod2html processes =for html
paragraphs, and so on.
As all command paragraphs, a =for
format paragraph ends at the first completly empty line that follows the introducing =for
.
A consistent document structure will show ``forks'' whenever specific formats are used, because a =for
format clause ought to appear for each desired output format, otherwise we punch a logical holes into the document.
This is an ordinary paragraph, which is processed by all translators.
=for html <p>This paragraph only appears if the file is processed with <b>pod2html</b>.</p>
=for latex This very paragraph is only treated by {\bf pod2latex}.
=for text I am a paragraph for the *pod2text* formatter.
We now continue with the ordinary text for all formatters.
The translators ignore unknown formats, which means we can invent special paragraphs for our own purposes! For example, to ``comment out'' a paragraph, write
=for comment Can someone clarify the next section?
Another popular use is the emacs
format :-) To switch emacs into text-mode when preparing a POD-file, start the file with
=for emacs -*- text -*-
or end it with
=for emacs Local Variables: mode: text End:
The emacs-users who are using the hyperbole add-on can convert their "dumb" POD-files into hyper-linked collections (well -- hyperbole can do a lot more than that, but hyperlinks are a beginning) of files with
=for hyperbole <(std-reference)>
where <(std-reference)>
is a hyperbole button taking you to another file which holds the reference documentation of std
when you click the button in emacs.
Translators from POD to HTML, UN*X manual pages, LaTeX and plain text respectively.
Simple syntax checker for POD files.
Manual pages of perlpod(1), pod2man(1), pod2html(1), pod2latex(1), pod2text(1), and podchecker(1).
Next month: LaTeX in conjuction with latex2html.