Kernighan & Ritchie word count example program in a functional language?

Basically, in a functional styly you'll want to divide the IO operation of getting your stream of data from the pure operation of some stateful transistion based on the current character and the current state.

Up vote 4 down vote favorite share g+ share fb share tw.

I have been reading a little bit about functional programming on the web lately and I think I got a basic idea about the concepts behind it. I'm curious how everyday programming problems which involve some kind of state are solved in a pure functional programing language. For example: how would the word count program from the book 'The C programming Language' be implemented in a pure functional language?

Any contributions are welcome as long as the solution is in a pure functional style. Here's the word count C code from the book: #include #define IN 1 /* inside a word */ #define OUT 0 /* outside a word */ /* count lines, words, and characters in input */ main() { int c, nl, nw, nc, state; state = OUT; nl = nw = nc = 0; while ((c = getchar())! = EOF) { ++nc; if (c == '\n') ++nl; if (c == ' ' || c == '\n' || c = '\t') state = OUT; else if (state == OUT) { state = IN; ++nw; } } printf("%d %d %d\n", nl, nw, nc); } haskell clojure functional-programming scheme common-lisp link|improve this question edited yesterday asked 2 days agouser402672.

3 Obviously, in Haskell you could just do main = interact $ show . Length . Words.

— Wait, it distinguishes between chars, words and lines... but the solution would still be along those lines. – leftaroundabout 2 days ago 1 Jeremy Gibbons works through implementations of wc here cs.ox.ac. Uk/jeremy.

Gibbons/publications/fission.pdf. The paper moves at a fast pace, so in some ways it is not ideal for beginners; but if you can spend the time on it, it is very worthwhile. – stephen tetley 2 days ago.

Basically, in a functional styly you'll want to divide the IO operation of getting your stream of data from the pure operation of some stateful transistion based on the current character and the current state. The Haskell solution from Tikhon is very clean but performs three passes on your input data and will result in the entire input being contained in memory until the result is computed. You can process data incrementally, which I do below using the Text package but no other advanced Haskell tools (which could clean this up at the expense of understandability by non-Haskellers).

First we have our preamble: {-# LANGUAGE BangPatterns #-} import Data.Text. Lazy as T import Data.Text.Lazy. IO as TIO Then we define our data structure to hold the state of the process (number of characters, words, and lines along with the State IN/OUT): data Counts = Cnt { nc, nl, nw ::!

Int , state :: State } deriving (Eq, Ord, Show) data State = IN | OUT deriving (Eq, Ord, Show) Now I define a "zero" state just for easy use. I'd normally make a number of helper functions or use a package like lense to make incrementing each field in the Counts structure simple, but will go without for this answer: zeros :: Counts zeros = Cnt 0 0 0 OUT And now I translate your series of if/else statements into a pure state machine: op :: Counts -> Char -> Counts op c '\n' = c { nc = nc c + 1, nw = nw c + 1, nl = nl c + 1, state=OUT } op c ch | ch == ' ' || ch == '\t' = c { nc = nc c + 1, state=OUT } | state c == OUT = c { nc = nc c + 1, nw = nw c + 1, state = IN } | otherwise = c { nc = nc c + 1 } Finally the main function just gets the input stream and folds our operation over the characters: main = do contents Foldl' op zeros contents EDIT: You mentioned not understanding the syntax. Here is an even simpler version that I will explain: import Data.Text.

Lazy as T import Data.Text.Lazy. IO as TIO op (nc, nw, nl, st) ch | ch == '\n' = (nc + 1, nw + 1 , nl + 1, True) | ch == ' ' || ch == '\t' = (nc + 1, nw , nl , True) | st = (nc + 1, nw + 1 , nl , False) | otherwise = (nc + 1, nw , nl , st) main = do contents The op function uses a bunch of guards - parts like | ch = '\n' - which is basically like a C if/elseif/else series. The tuples ( ... , ... , ... , ... ) contain all our state.

Haskell variables are immutable, so we make new tuples by adding one (or not) to the values of the previous variables.

Yes I assumed that Tikhon's solution requires 3 passes. This solution was more what I was expecting to see. Well I'm having a hard time to understand the syntax but from the way you explained it it looks like it's basically the same algorithm as the Clojure solution.

– user402672 2 days ago Too bad it's still operating in linear space due to lazy evaluation retaining 0 + 1 + 1 + 1 + 1 ... thunks in memory. You additionally need strictness annotations to force the calculations on each iteration step. No.

Foldl' is not sufficient because it only forces the tuple constructor. – Rotsor yesterday @rotsor that's why my first version used bang patterns. At anyrate, addressing lazyness and performance was more of an afterthought and never explicitly part of the question or the answer.

– Thomas M. DuBuisson 23 hours ago.

A simple way to do it would be to read in the input and then use some simple functions to get the line/word/character count. Something like this would work: count :: String -> (Int, Int, Int) count str = (length $ lines str, length $ words str, length str) main :: IO () main = fmap count getContents >>= print This isn't exactly the same, but it's close. This works really simply.

Given a string, we can turn it into a list of lines with the standard lines function and a list of words with the standard words function. Since String is just Char, length returns the number of characters. This is how we get the three counts.

(For reference, length $ lines str is the same as length (lines str). ) The important idea is how the IO--reading the input and printing it out--is separated from the actual logic. Also, instead of going through the input character by character keeping track of some state, we get the actual numbers by applying simple functions to the input.

These functions are all just compositions of standard library functions.

You need some kind of print. – leftaroundabout 2 days ago @leftaroundabout: Of course. Not paying attention :).

– Tikhon Jelvis 2 days ago Apparently... look at it once more! – leftaroundabout 2 days ago I hope it works this time. And now I'm going to miss my train :).

– Tikhon Jelvis 2 days ago Well I don't know Haskell but does your solution imply that the input is actually iterated over 3 times (i.e. To count the characters, words, and lines)? If that is the case would there be a better way which iterates over the input just once like the Clojure solution posted here?

– user402672 2 days ago.

In your loop there are four state variables, nc, nw, nl and state, plus the next character c. The loop remembers nc, nw, nl and state from the last time through the loop, and c changes each iteration through the loop. Imagine instead that you take those variables out of the loop and put them in a vector: state, nc, nw, nl.

Then you change your loop construct into a function that takes two arguments, the first being a vector state, nc, nw, nl, and the second being c, and returns a new vector with the updated values of nc, nw, nl and state. In C-ish pseudocode: f(state, nc, nw, nl, c) { ++nc; if (c == '\n') ++nl; if (c == ' ' || c == '\n' || c = '\t') state = OUT; else if (state == OUT) { state = IN; ++nw; } return state, nc, nw, nl; } Now you can call that function with the vector OUT, 0, 0, 0 and the first character in the string ("hello, world", say), and it will return a new vector IN, 1, 0, 0. Call f again with this new vector and the second character 'e', and it returns IN, 2, 0, 0.

Repeat for the rest of the characters in the string, and the last call will return IN, 12, 2, 0, identical to the values printed by the C code. The basic idea is that you take the state variables out of the loop, turn the guts of the loop into a function, and pass the vector of state variables and the next input in as arguments to that function, and return a new state vector as a result. There is a function called reduce that does this.

Here's how you would do it in Clojure (formatted to emphasize the vectors returned): (defn f state nc nw nl c (let nl (if (= c \n)(inc nl) nl) (cond (or (= c \space)(= c \n)(= c \t)) :out (inc nc) nw nl (= state :out) :in (inc nc) (inc nw) nl true state (inc nc) nw nl ))) (defn wc s (reduce f :out 0 0 0 s)) (wc "hello, world") which returns (and prints in the repl) :in 12 2 0.

I understand this solution and the nice thing is it can probably applied to any imperative language. – user402672 2 days ago Nice solution, just one suggestion you can replace (or (= c \space)(= c \n)(= c \t)) with (#{\n \space \t} c) – user499049 15 hours ago.

Here's a solution based on the Clojure example posted here but in CL using recursion. (defstruct (state (:constructor make-state (state chars words lines))) state chars words lines) (defun wc (state stream) (symbol-macrolet ((s (state-state state)) (c (state-chars state)) (w (state-words state)) (l (state-lines state))) (case (read-char stream nil :eof) (:eof state) (#\Newline (wc (make-state :out (1+ c) w (1+ l)) stream)) (#\Space (wc (make-state :out (1+ c) w l) stream)) (t (if (eq s :out) (wc (make-state :in (1+ c) (1+ w) l) stream) (wc (make-state :in (1+ c) w l) stream)))))) (with-input-from-string (stream "Hello Functional Programming World") (wc (make-state :out 0 0 0) stream)) ;;; => #S(STATE :STATE :IN :CHARS 34 :WORDS 4 :LINES 0).

Common Lisp is mentioned, but it is not a pure functional programming language and it does not support TCO in its standard. Individual implementations do. Tail recursive version, if the compiler supports it: (defun word-count (&optional (stream *standard-input*)) (labels ((word-count-aux (in-p chars words lines) (case (read-char stream nil :eof) (:eof (values chars words lines)) (#\newline (word-count-aux nil (1+ chars) words (1+ lines))) ((#\space #\tab) (word-count-aux nil (1+ chars) words lines)) (otherwise (word-count-aux t (1+ chars) (if in-p words (1+ words)) lines))))) (word-count-aux nil 0 0 0))) But since TCO is not in the standard, a portable version would look more like this: (defun word-count (&optional (stream *standard-input*) &aux (in-p nil) (chars 0) (words 0) (lines 0) char) (loop while (setf char (read-char stream nil nil)) do (case char (#\newline (setf in-p nil) (incf lines)) ((#\space #\tab) (setf in-p nil)) (otherwise (unless in-p (incf words)) (setf in-p t))) (incf chars)) (values chars words lines)) Above is no longer Functional.

We can replace the loop with a higher-order stream-map: (defun stream-map (function stream) (loop for char = (read-char stream nil nil) while char do (funcall function char))) (defun word-count (&optional (stream *standard-input*) &aux (in-p nil) (chars 0) (words 0) (lines 0) char) (stream-map (lambda (char) (incf chars) (when (eql char #\newline) (incf lines)) (if (member char '(#\space #\newline #\tab)) (setf in-p nil) (unless in-p (incf words) (setf in-p t)))) stream) (values chars words lines)) The state is modified by the closure. To get rid of that we can implement a stream-reduce. (defun stream-reduce (function stream &key initial-value) (let ((value initial-value)) (loop for char = (read-char stream nil nil) while char do (setf value (funcall function value char))) value)) (defun word-count (&optional (stream *standard-input*)) (rest (stream-reduce (lambda (state char) (destructuring-bind (in-p chars words lines) state (case char (#\newline (list nil (1+ chars) words (1+ lines))) ((#\space #\tab) (list nil (1+ chars) words lines)) (otherwise (list t (1+ chars) (if in-p words (1+ words)) lines))))) stream :initial-value (list nil 0 0 0)))).

State 'out) (loop (read-char input-port) nl (add1 nw) (add1 nc) 'in)) (else (loop (read-char input-port) nl nw (add1 nc) state))))) word-count receives an input port as a parameter; notice that no additional data structures are created (structs, tuples, vectors, etc.) instead, all state is kept in parameters. As an example, for counting the words in a file containing this: hello, world Call the procedure like this: (call-with-input-file "/path/to/file" word-count) > nl: 0, nw: 2, nc: 12.

Here is a Scheme version of the program, from my blog, which implements the entire Unix word count program, including argument- and file-handling. The key function is wc, which is purely functional. It moves all local variables into the arguments of a local function (defined via named-let), which is the standard idiom for converting an imperative loop to functional style.

The man page and code appear below: NAME wc -- word count SYNOPSIS wc -lwc name ... DESCRIPTION Wc counts lines, words and characters in the named files, or in the standard input if no name appears. A word is a maximal string of characters delimited by spaces, tabs or newlines. If the optional argument is present, just the specified counts (lines, words, or characters) are selected by the letters l, w or c.

#! /usr/bin/scheme --script (define l-flag #t) (define w-flag #t) (define c-flag #t) (define (update-flags fs) (if (not (member #\l fs)) (set! L-flag #f)) (if (not (member #\w fs)) (set!

W-flag #f)) (if (not (member #\c fs)) (set! C-flag #f))) (define (put-dec n width) (let* ((n-str (number->string n))) (display (make-string (- width (string-length n-str)) #\space)) (display n-str))) (define (wc) (let loop ((inword #f) (c (read-char)) (ls 0) (ws 0) (cs 0)) (cond ((eof-object? C) (values ls ws cs)) ((char=?

C #\newline) (loop #f (read-char) (add1 ls) ws (add1 cs))) ((not (member c '(#\space #\newline #\tab))) (if inword (loop #t (read-char) ls ws (add1 cs)) (loop #t (read-char) ls (add1 ws) (add1 cs)))) (else (loop #f (read-char) ls ws (add1 cs)))))) (define (main args) (when (and (pair? Args) (char=? (string-ref (car args) 0) #\-)) (update-flags (cdr (string->list (car args)))) (set!

Args (cdr args))) (if (null? Args) (let-values (((ls ws cs) (wc))) (when l-flag (display ls) (display " ")) (when w-flag (display ws) (display " ")) (when c-flag (display cs) (display " ")) (newline)) (let loop ((args args) (l-tot 0) (w-tot 0) (c-tot 0)) (if (null? Args) (begin (when l-flag (put-dec l-tot 12)) (when w-flag (put-dec w-tot 12)) (when c-flag (put-dec c-tot 12))) (with-input-from-file (car args) (lambda () (let-values (((ls ws cs) (wc))) (when l-flag (put-dec ls 12)) (when w-flag (put-dec ws 12)) (when c-flag (put-dec cs 12)) (display " ") (display (car args)) (newline) (loop (cdr args) (+ l-tot ls) (+ w-tot ws) (+ c-tot cs))))))))) (main (cdr (command-line))).

I believe you could write this somewhat more elegantly while still only iterating over the input once, but you'll need to make GHC do more work, certainly use -O2. I have not yet compiled this code, much less compared it's speed vs. Thomas DuBuisson's answer, but this should indicate the basic direction. {-# LANGUAGE BangPatterns #-} import Data.

List wordcount = snd . Foldl' go (False,0) where go (!b,! N)!

C = if elem c ' ','\t','\n' then (False,n) else (True, n + if be then 0 else 1) linecount = foldl' go 0 where go! N! C = n + if c == '\n' then 1 else 0 main = interact $ show .

Go where go x = (linecount x, wordcount x, foldl' (\! N _ ->n+1) 0 x) If I understand stream fusion correctly, then GHC should inline wordcount and linecount into main, merge the three foldl' commands into one, well hopefully, and start rearranging the if checks. I'd hope it'd inlined elem and foldl' too of course.

If not, you could certainly force inlining and probably create a simple fusion rule, but maybe the defaults suffice. Or maybe some simple alterations produce the desired effect. Btw, I have written foldl' (\n _ ->n+1) 0 x only because I've heard bad storied about length, but maybe length works fine, another change worth profiling.

In Haskell using strict IO rather than lazy. Does words only but you can easily implement characters and lines on top of this. Requires the text and conduit packages: module Main where import Data.

Conduit import qualified Data.Conduit. Binary as CB import qualified Data.Conduit. List as CL import qualified Data.Conduit.

Text as CT import qualified Data. Text as T import System. Environment main :: IO () main = do args acc + length words) 0 print n.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions