%
% $Header: /home/cvs/root/haskell-report/report/literate.verb,v 1.5 2002/12/02 14:53:30 simonpj Exp $
%
%**
The Haskell 98 Report: Literate Comments
%**~header
\subsection{Literate comments}
\label{literate}
\index{literate comments}
The ``literate comment''
convention, first developed by Richard Bird and Philip Wadler for
Orwell, and inspired in turn by Donald Knuth's ``literate
programming'', is an alternative style for encoding \Haskell{} source
code.
The literate style encourages comments by making them the default.
There are multiple styles of literate Haskell,
[PROPOSED TEXT: which it is permissible but not advisable to mix]
[PROPOSED TEXT: which it is not advisable to mix; implementations MAY,
at their option, permit the combination of the two styles as specified here].
(Normal comments are still permissible in the program-code sections
of literate \Haskell{} code.)
The literate program text is treated as a series of lines.
Lines between @\begin{code}@$\ldots$@\end{code}@ delimiters (LaTeX style),
as well as lines in which ``@>@'' is the first character (Bird style),
are treated as part of the program; all other lines are comment.
The Bird-style program lines
must be translated by replacing the leading ``@>@'' with a space.
Layout and comments apply
exactly as described in Appendix~\ref{syntax} in the resulting program text.
To capture some cases where one omits an ``@>@'' by mistake, it is an
error for a program line to appear adjacent to a non-blank comment line,
where a line is taken as blank if it consists only of whitespace (or also if it is a @\begin{code}@ or @\end{code}@ line).
By convention, the style of comment is indicated by the file
extension, with ``@.hs@'' indicating a usual \Haskell{} file and
``@.lhs@'' indicating a literate \Haskell{} file.
Using the Bird style, a
simple factorial program would be:
\bprog
@
This literate program prompts the user for a number
and prints the factorial of that number:
> main :: IO ()
> main = do putStr "Enter a number: "
> l <- readLine
> putStr "n!= "
> print (fact (read l))
This is the factorial function.
> fact :: Integer -> Integer
> fact 0 = 1
> fact n = n * fact (n-1)
@
\eprog
Another program, this one in the LaTeX style:
\bprog
@
\documentstyle{article}
\begin{document}
\section{Introduction}
This is a trivial program that prints the first 20 factorials.
\begin{code}
main :: IO ()
main = print [ (n, product [1..n]) | n <- [1..20]]
\end{code}
\end{document}
@
\eprog
A declarative syntax of literate source files:
(this syntax is overly permissive in that it permits comment
lines next to program lines)
Over multiple lines:
@@@
file -> \{ literateCommentLine | birdProg | texBlock \}
texBlock -> beginCode \{texProg\} endCode
@@@
With the single-line elements being:
@@@
literateCommentLine -> \{any\}_{\langle{}birdProg | possibleLhsTexCmd\rangle}
blankLiterateCommentLine -> \{blankchar\}
nonBlankLiterateCommentLine -> \{any\}_{\langle{}blankLiterateCommentLine | birdProg | possibleLhsTexCmd\rangle}
birdProg -> @>@ \{any\}
texProg -> \{any\}_{\langle{}possibleLhsTexCmd\rangle}
possibleLhsTexCmd -> (@\begin{code}@ | @\end{code}@) \{any\}
beginCode -> @\begin{code}@ \{blankchar\}
endCode -> @\end{code}@ \{blankchar\}
@@@
[COMMENT: should bad encoding be forbidden? literateCommentLine's definition
seems to do so, but then so do comment and ncomment's current definitions]
[COMMENT: I didn't see a good "non-line-breaking whitespace"
even though any's definition relies on it
(\{space | tab\} === \{any_{\langle{}graphic\rangle}\})
so I'll put in the lexical syntax (blankchar -> space | tab)
and change any to be defined as (any -> graphic | blank)]
[COMMENT: beginCode and endCode could have \{blankchar\}
deleted if we want to ban those lines ending in whitespace;
the code implementation would also be modified]
A \Haskell{} inductive/imperative-style definition:
\bprog
@
import Maybe
type LhsLine = String
type HsLine = String
isBlank :: Char -> Bool
isBlank = (`elem` [' ','\t']) --[COMMENT: or maybe just use Char.isSpace?]
literateCommentLine, blankLiterateCommentLine, nonBlankLiterateCommentLine,
possibleLhsTexCmd, beginCode, endCode :: LhsLine -> Bool
birdProg, texProg :: LhsLine -> Maybe HsLine
literateCommentLine l = not (isJust (birdProg l) || possibleLhsTexCmd l)
blankLiterateCommentLine = all isBlank
nonBlankLiterateCommentLine l =
not (blankLiterateCommentLine l || isJust (birdProg l) || possibleLhsTexCmd l)
birdProg ('>':l') = Just (' ':l')
birdProg _ = Nothing
texProg l = if possibleLhsTexCmd l then Nothing else Just l
possibleLhsTexCmd l = isJust (takeBegin l) || isJust (takeEnd l)
beginCode l = fmap (all isBlank) (takeBegin l) == Just True
endCode l = fmap (all isBlank) (takeEnd l) == Just True
takeBegin, takeEnd :: LhsLine -> Maybe String
takeBegin = (`stripPrefix` "\\begin{code}")
takeEnd = (`stripPrefix` "\\end{code}")
-- I really need to get around to proposing this for the standard libraries
-- , says Ian Lynagh
stripPrefix :: Eq a => [a] -> [a] -> Maybe [a]
xs `stripPrefix` [] = Just xs
[] `stripPrefix` _ = Nothing
(x:xs) `stripPrefix` (y:ys)
| x == y = xs `stripPrefix` ys
| otherwise = Nothing
badAdjacentLines :: LhsLine -> LhsLine -> Bool
badAdjacentLines l1 l2 = bad l1 l2 || bad l2 l1
where bad a b = isJust (birdProg a) && nonBlankLiterateCommentLine b
--[COMMENT: should I mind using pattern guards?]
--[COMMENT: except for errors, I always return as many lines as input,
-- to preserve line numbering and such; is there a good
-- higher-order function to express the following pattern?
-- mapAccumL seems a little ugly.]
file, texBlock :: [LhsLine] -> [HsLine]
file [] = []
file (l:ls)
| (fmap (badAdjacentLines l) (listToMaybe ls)) == Just True
= error "literate comment adjacent to code"
| Just l' <- birdProg l = l' : ls
| literateCommentLine l = "" : file ls
| beginCode l = "" : texBlock ls
| endCode l = error "\\end{code} not in code block"
| otherwise = error "{code} followed in line by visible text"
texBlock (l:ls)
| Just l' <- texProg l = l' : texBlock ls
| beginCode l = error "\\begin{code} inside code block"
| endCode l = "" : file ls
| otherwise = error "{code} followed in line by visible text"
texBlock [] = error "\\end{code} expected"
translateFile :: String -> String
translateFile = unlines . file . lines
@
\eprog
Note that although literate \Haskell{} files containing no code
are not expressly forbidden, they translate to the equivalent of
@module Main(main) where {}@, which is in error.
Literate \Haskell{} in which a translation of literate comments
to non-blank non-literate comment lines would not be an equivalent
translation, is not advisable. These are the non-blank literate
comments that occur within a single lexical element, namely,
a block comment or a string gap:
\bprog
@
> foo = "hello \
This is an odd place for a comment.
> \world"
> {-
Literate comments may have all sorts of weird character
sequences in them, like "-}".
> -}
@
\eprog
[PROPOSED TEXT: Implementations MAY, at their option, choose
a translation that eliminates all literate comments, or one that
will fail on such literate Haskell.]
[COMMENT: Without the proposed text, I think it is clear that
implementations must still support the "not advisable" behavior.]
Note that string literals are not meaningful at the point of translation;
for example, in the literate snippet
\bprog
@
foo = "hello\
\end{code}"
@
\eprog
the @\end{code}@ line is interpreted as a possibleLhsTexCmd
(and in error due to the text @"@ appearing in the same line).
[COMMENT: Are we sure we want to require implementations to ban this?]
However it's fine if the @\end{code}@ does not
appear at the beginning of the line:
\bprog
@
foo = "hello\
\end{code}"
@
\eprog
%**~footer