% % $Header: /home/cvs/root/haskell-report/report/literate.verb,v 1.5 2002/12/02 14:53:30 simonpj Exp $ % %**The Haskell 98 Report: Literate Comments %**~header \subsection{Literate comments} \label{literate} \index{literate comments} The ``literate comment'' convention, first developed by Richard Bird and Philip Wadler for Orwell, and inspired in turn by Donald Knuth's ``literate programming'', is an alternative style for encoding \Haskell{} source code. The literate style encourages comments by making them the default. There are multiple styles of literate Haskell, [PROPOSED TEXT: which it is permissible but not advisable to mix] [PROPOSED TEXT: which it is not advisable to mix; implementations MAY, at their option, permit the combination of the two styles as specified here]. (Normal comments are still permissible in the program-code sections of literate \Haskell{} code.) The literate program text is treated as a series of lines. Lines between @\begin{code}@$\ldots$@\end{code}@ delimiters (LaTeX style), as well as lines in which ``@>@'' is the first character (Bird style), are treated as part of the program; all other lines are comment. The Bird-style program lines must be translated by replacing the leading ``@>@'' with a space. Layout and comments apply exactly as described in Appendix~\ref{syntax} in the resulting program text. To capture some cases where one omits an ``@>@'' by mistake, it is an error for a program line to appear adjacent to a non-blank comment line, where a line is taken as blank if it consists only of whitespace (or also if it is a @\begin{code}@ or @\end{code}@ line). By convention, the style of comment is indicated by the file extension, with ``@.hs@'' indicating a usual \Haskell{} file and ``@.lhs@'' indicating a literate \Haskell{} file. Using the Bird style, a simple factorial program would be: \bprog @ This literate program prompts the user for a number and prints the factorial of that number: > main :: IO () > main = do putStr "Enter a number: " > l <- readLine > putStr "n!= " > print (fact (read l)) This is the factorial function. > fact :: Integer -> Integer > fact 0 = 1 > fact n = n * fact (n-1) @ \eprog Another program, this one in the LaTeX style: \bprog @ \documentstyle{article} \begin{document} \section{Introduction} This is a trivial program that prints the first 20 factorials. \begin{code} main :: IO () main = print [ (n, product [1..n]) | n <- [1..20]] \end{code} \end{document} @ \eprog A declarative syntax of literate source files: (this syntax is overly permissive in that it permits comment lines next to program lines) Over multiple lines: @@@ file -> \{ literateCommentLine | birdProg | texBlock \} texBlock -> beginCode \{texProg\} endCode @@@ With the single-line elements being: @@@ literateCommentLine -> \{any\}_{\langle{}birdProg | possibleLhsTexCmd\rangle} blankLiterateCommentLine -> \{blankchar\} nonBlankLiterateCommentLine -> \{any\}_{\langle{}blankLiterateCommentLine | birdProg | possibleLhsTexCmd\rangle} birdProg -> @>@ \{any\} texProg -> \{any\}_{\langle{}possibleLhsTexCmd\rangle} possibleLhsTexCmd -> (@\begin{code}@ | @\end{code}@) \{any\} beginCode -> @\begin{code}@ \{blankchar\} endCode -> @\end{code}@ \{blankchar\} @@@ [COMMENT: should bad encoding be forbidden? literateCommentLine's definition seems to do so, but then so do comment and ncomment's current definitions] [COMMENT: I didn't see a good "non-line-breaking whitespace" even though any's definition relies on it (\{space | tab\} === \{any_{\langle{}graphic\rangle}\}) so I'll put in the lexical syntax (blankchar -> space | tab) and change any to be defined as (any -> graphic | blank)] [COMMENT: beginCode and endCode could have \{blankchar\} deleted if we want to ban those lines ending in whitespace; the code implementation would also be modified] A \Haskell{} inductive/imperative-style definition: \bprog @ import Maybe type LhsLine = String type HsLine = String isBlank :: Char -> Bool isBlank = (`elem` [' ','\t']) --[COMMENT: or maybe just use Char.isSpace?] literateCommentLine, blankLiterateCommentLine, nonBlankLiterateCommentLine, possibleLhsTexCmd, beginCode, endCode :: LhsLine -> Bool birdProg, texProg :: LhsLine -> Maybe HsLine literateCommentLine l = not (isJust (birdProg l) || possibleLhsTexCmd l) blankLiterateCommentLine = all isBlank nonBlankLiterateCommentLine l = not (blankLiterateCommentLine l || isJust (birdProg l) || possibleLhsTexCmd l) birdProg ('>':l') = Just (' ':l') birdProg _ = Nothing texProg l = if possibleLhsTexCmd l then Nothing else Just l possibleLhsTexCmd l = isJust (takeBegin l) || isJust (takeEnd l) beginCode l = fmap (all isBlank) (takeBegin l) == Just True endCode l = fmap (all isBlank) (takeEnd l) == Just True takeBegin, takeEnd :: LhsLine -> Maybe String takeBegin = (`stripPrefix` "\\begin{code}") takeEnd = (`stripPrefix` "\\end{code}") -- I really need to get around to proposing this for the standard libraries -- , says Ian Lynagh stripPrefix :: Eq a => [a] -> [a] -> Maybe [a] xs `stripPrefix` [] = Just xs [] `stripPrefix` _ = Nothing (x:xs) `stripPrefix` (y:ys) | x == y = xs `stripPrefix` ys | otherwise = Nothing badAdjacentLines :: LhsLine -> LhsLine -> Bool badAdjacentLines l1 l2 = bad l1 l2 || bad l2 l1 where bad a b = isJust (birdProg a) && nonBlankLiterateCommentLine b --[COMMENT: should I mind using pattern guards?] --[COMMENT: except for errors, I always return as many lines as input, -- to preserve line numbering and such; is there a good -- higher-order function to express the following pattern? -- mapAccumL seems a little ugly.] file, texBlock :: [LhsLine] -> [HsLine] file [] = [] file (l:ls) | (fmap (badAdjacentLines l) (listToMaybe ls)) == Just True = error "literate comment adjacent to code" | Just l' <- birdProg l = l' : ls | literateCommentLine l = "" : file ls | beginCode l = "" : texBlock ls | endCode l = error "\\end{code} not in code block" | otherwise = error "{code} followed in line by visible text" texBlock (l:ls) | Just l' <- texProg l = l' : texBlock ls | beginCode l = error "\\begin{code} inside code block" | endCode l = "" : file ls | otherwise = error "{code} followed in line by visible text" texBlock [] = error "\\end{code} expected" translateFile :: String -> String translateFile = unlines . file . lines @ \eprog Note that although literate \Haskell{} files containing no code are not expressly forbidden, they translate to the equivalent of @module Main(main) where {}@, which is in error. Literate \Haskell{} in which a translation of literate comments to non-blank non-literate comment lines would not be an equivalent translation, is not advisable. These are the non-blank literate comments that occur within a single lexical element, namely, a block comment or a string gap: \bprog @ > foo = "hello \ This is an odd place for a comment. > \world" > {- Literate comments may have all sorts of weird character sequences in them, like "-}". > -} @ \eprog [PROPOSED TEXT: Implementations MAY, at their option, choose a translation that eliminates all literate comments, or one that will fail on such literate Haskell.] [COMMENT: Without the proposed text, I think it is clear that implementations must still support the "not advisable" behavior.] Note that string literals are not meaningful at the point of translation; for example, in the literate snippet \bprog @ foo = "hello\ \end{code}" @ \eprog the @\end{code}@ line is interpreted as a possibleLhsTexCmd (and in error due to the text @"@ appearing in the same line). [COMMENT: Are we sure we want to require implementations to ban this?] However it's fine if the @\end{code}@ does not appear at the beginning of the line: \bprog @ foo = "hello\ \end{code}" @ \eprog %**~footer