The "literate comment" convention, first developed by Richard Bird and Philip Wadler for Orwell, and inspired in turn by Donald Knuth's "literate programming", is an alternative style for encoding Haskell source code. The literate style encourages comments by making them the default. There are multiple styles of literate Haskell, [PROPOSED TEXT: which it is permissible but not advisable to mix] [PROPOSED TEXT: which it is not advisable to mix; implementations MAY, at their option, permit the combination of the two styles as specified here]. (Normal comments are still permissible in the program-code sections of literate Haskell code.)
The literate program text is treated as a series of lines. Lines between \begin{code}...\end{code} delimiters (LaTeX style), as well as lines in which ">" is the first character (Bird style), are treated as part of the program; all other lines are comment.
The Bird-style program lines must be translated by replacing the leading ">" with a space. Layout and comments apply exactly as described in Appendix ?? in the resulting program text.
To capture some cases where one omits an ">" by mistake, it is an error for a program line to appear adjacent to a non-blank comment line, where a line is taken as blank if it consists only of whitespace (or also if it is a \begin{code} or \end{code} line).
By convention, the style of comment is indicated by the file extension, with ".hs" indicating a usual Haskell file and ".lhs" indicating a literate Haskell file.
Using the Bird style, a
simple factorial program would be:
This literate program prompts the user for a number
and prints the factorial of that number:
> main :: IO ()
> main = do putStr "Enter a number: "
> l <- readLine
> putStr "n!= "
> print (fact (read l))
This is the factorial function.
> fact :: Integer -> Integer
> fact 0 = 1
> fact n = n * fact (n-1)
Another program, this one in the LaTeX style:
\documentstyle{article}
\begin{document}
\section{Introduction}
This is a trivial program that prints the first 20 factorials.
\begin{code}
main :: IO ()
main = print [ (n, product [1..n]) | n <- [1..20]]
\end{code}
\end{document}
A declarative syntax of literate source files:
(this syntax is overly permissive in that it permits comment lines next to program lines)
Over multiple lines:
| file | -> | {literateCommentLine | birdProg | texBlock } |
| texBlock | -> | beginCode {texProg}endCode |
With the single-line elements being:
| literateCommentLine | -> | {any}<birdProg | possibleLhsTexCmd> |
| blankLiterateCommentLine | -> | {blankchar} |
| nonBlankLiterateCommentLine | -> | {any}<blankLiterateCommentLine | birdProg | possibleLhsTexCmd> |
| birdProg | -> | > {any} |
| texProg | -> | {any}<possibleLhsTexCmd> |
| possibleLhsTexCmd | -> | (\begin{code} | \end{code}) {any} |
| beginCode | -> | \begin{code} {blankchar} |
| endCode | -> | \end{code} {blankchar} |
[COMMENT: should bad encoding be forbidden? literateCommentLine's definition seems to do so, but then so do comment and ncomment's current definitions]
[COMMENT: I didn't see a good non-line-breaking whitespace even though any's definition relies on it ({space | tab}=== {any_<graphic>}) so I'll put in the lexical syntax (blankchar -> space | tab) and change any to be defined as (any -> graphic | blank)]
[COMMENT: beginCode and endCode could have {blankchar} deleted if we want to ban those lines ending in whitespace; the code implementation would also be modified]
A Haskell inductive/imperative-style definition:
import Maybe
type LhsLine = String
type HsLine = String
isBlank :: Char -> Bool
isBlank = (`elem` [' ','\t']) --[COMMENT: or maybe just use Char.isSpace?]
literateCommentLine, blankLiterateCommentLine, nonBlankLiterateCommentLine,
possibleLhsTexCmd, beginCode, endCode :: LhsLine -> Bool
birdProg, texProg :: LhsLine -> Maybe HsLine
literateCommentLine l = not (isJust (birdProg l) || possibleLhsTexCmd l)
blankLiterateCommentLine = all isBlank
nonBlankLiterateCommentLine l =
not (blankLiterateCommentLine l || isJust (birdProg l) || possibleLhsTexCmd l)
birdProg ('>':l') = Just (' ':l')
birdProg _ = Nothing
texProg l = if possibleLhsTexCmd l then Nothing else Just l
possibleLhsTexCmd l = isJust (takeBegin l) || isJust (takeEnd l)
beginCode l = fmap (all isBlank) (takeBegin l) == Just True
endCode l = fmap (all isBlank) (takeEnd l) == Just True
takeBegin, takeEnd :: LhsLine -> Maybe String
takeBegin = (`stripPrefix` "\\begin{code}")
takeEnd = (`stripPrefix` "\\end{code}")
-- I really need to get around to proposing this for the standard libraries
-- , says Ian Lynagh
stripPrefix :: Eq a => [a] -> [a] -> Maybe [a]
xs `stripPrefix` [] = Just xs
[] `stripPrefix` _ = Nothing
(x:xs) `stripPrefix` (y:ys)
| x == y = xs `stripPrefix` ys
| otherwise = Nothing
badAdjacentLines :: LhsLine -> LhsLine -> Bool
badAdjacentLines l1 l2 = bad l1 l2 || bad l2 l1
where bad a b = isJust (birdProg a) && nonBlankLiterateCommentLine b
--[COMMENT: should I mind using pattern guards?]
--[COMMENT: except for errors, I always return as many lines as input,
-- to preserve line numbering and such; is there a good
-- higher-order function to express the following pattern?
-- mapAccumL seems a little ugly.]
file, texBlock :: [LhsLine] -> [HsLine]
file [] = []
file (l:ls)
| (fmap (badAdjacentLines l) (listToMaybe ls)) == Just True
= error "literate comment adjacent to code"
| Just l' <- birdProg l = l' : ls
| literateCommentLine l = "" : file ls
| beginCode l = "" : texBlock ls
| endCode l = error "\\end{code} not in code block"
| otherwise = error "{code} followed in line by visible text"
texBlock (l:ls)
| Just l' <- texProg l = l' : texBlock ls
| beginCode l = error "\\begin{code} inside code block"
| endCode l = "" : file ls
| otherwise = error "{code} followed in line by visible text"
texBlock [] = error "\\end{code} expected"
translateFile :: String -> String
translateFile = unlines . file . lines
Note that although literate Haskell files containing no code are not expressly forbidden, they translate to the equivalent of module Main(main) where {}, which is in error.
Literate Haskell in which a translation of literate comments
to non-blank non-literate comment lines would not be an equivalent
translation, is not advisable. These are the non-blank literate
comments that occur within a single lexical element, namely,
a block comment or a string gap:
> foo = "hello \
This is an odd place for a comment.
> \world"
> {-
Literate comments may have all sorts of weird character
sequences in them, like "-}".
> -}
[PROPOSED TEXT: Implementations MAY, at their option, choose
a translation that eliminates all literate comments, or one that
will fail on such literate Haskell.]
[COMMENT: Without the proposed text, I think it is clear that
implementations must still support the not advisable behavior.]
Note that string literals are not meaningful at the point of translation;
for example, in the literate snippet
foo = "hello\
\end{code}"
the \end{code} line is interpreted as a possibleLhsTexCmd
(and in error due to the text " appearing in the same line).
[COMMENT: Are we sure we want to require implementations to ban this?]
However it's fine if the \end{code} does not
appear at the beginning of the line:
foo = "hello\
\end{code}"