Browse Source

Write a presentation ;)

master
Peter J. Jones 5 years ago
parent
commit
8319d27829
7 changed files with 399 additions and 0 deletions
  1. 3
    0
      .gitignore
  2. 6
    0
      .gitmodules
  3. 5
    0
      GNUmakefile
  4. 213
    0
      README.md
  5. 170
    0
      slides.md
  6. 1
    0
      vendor/devalot-slides
  7. 1
    0
      vendor/wc-streams

+ 3
- 0
.gitignore View File

@@ -0,0 +1,3 @@
1
+/README.html
2
+/slides.html
3
+/slides.pdf

+ 6
- 0
.gitmodules View File

@@ -0,0 +1,6 @@
1
+[submodule "vendor/devalot-slides"]
2
+	path = vendor/devalot-slides
3
+	url = git://pmade.com/devalot-slides.git
4
+[submodule "vendor/wc-streams"]
5
+	path = vendor/wc-streams
6
+	url = git://pmade.com/wc-streams.git

+ 5
- 0
GNUmakefile View File

@@ -0,0 +1,5 @@
1
+################################################################################
2
+# From git://pmade.com/devalot-slides (git submodule update --init)
3
+ASPECT_RATIO = 169
4
+OTHER_DEPS = vendor/wc-streams/src/Main.hs
5
+include vendor/devalot-slides/slides.mk

+ 213
- 0
README.md View File

@@ -0,0 +1,213 @@
1
+# Simple Introduction to io-streams
2
+
3
+The Basics of
4
+[io-streams](https://hackage.haskell.org/package/io-streams):
5
+
6
+-   `InputStream a`: read values of type `a`
7
+-   `OutputStream a`: write values of type `a`
8
+-   `Maybe a`: `Nothing` means EOF
9
+
10
+
11
+
12
+Values are read from an `InputStream`. You usually begin by creating an
13
+`InputStream ByteString` (an input stream which produces a `ByteString`
14
+each time it’s read from) and transform it with a function which
15
+produces the values you are interested in. We’ll see one such function
16
+(`decodeUtf8`) which transforms an `InputStream ByteString` into an
17
+`InputStream Text`.
18
+
19
+An `OutputStream` is similar, except that it represents a stream you can
20
+write values to. Input streams and output streams can be connected to
21
+one another as long as they share the same value type (the `a` type from
22
+above).
23
+
24
+When reading values from an `InputStream` with the `read` function, a
25
+`Maybe a` value is returned. `Nothing` signals the end of the stream. In
26
+a similar fashion, values written to an `OutputStream` are wrapped in a
27
+`Maybe` to signal downstream when no more values will be written.
28
+
29
+
30
+
31
+# io-streams is great because…
32
+
33
+-   It’s really easy to learn and use
34
+-   Stream processing is very fast
35
+-   It’s easy to compose stream functions together
36
+
37
+# io-streams is less desirable because…
38
+
39
+-   It makes heavy use of the `IO` type
40
+-   Limited functionality compared to Conduit and Pipes
41
+-   Error handling is done via exceptions
42
+
43
+# Writing `wc` with Haskell and io-streams
44
+
45
+The POSIX `wc` utility:
46
+
47
+-   Processes zero or more files
48
+-   When no files are given, processes STDIN
49
+-   Counts bytes, characters, words, and lines
50
+-   I’ve decided to omit bytes in my implementation
51
+
52
+# Keeping Track of Lines, Words, and Characters
53
+
54
+
55
+
56
+Let’s start with some types. First up is a type for tracking the
57
+counters we’ll report after processing all files.
58
+
59
+
60
+
61
+``` {.haskell include="vendor/wc-streams/src/Main.hs" token="counters"}
62
+data Counters = C
63
+  { lines :: Int -- ^ Number of lines.
64
+  , words :: Int -- ^ Number of words.
65
+  , chars :: Int -- ^ Number of characters.
66
+  }
67
+
68
+instance Monoid Counters where
69
+  mempty = C 0 0 0
70
+  mappend (C l1 w1 c1) (C l2 w2 c2) =
71
+    C (l1 + l2) (w1 + w2) (c1 + c2)
72
+```
73
+
74
+
75
+
76
+The `Counters` type is a `Monoid` so that an initial counter can be
77
+created (`mempty`) and a list of counters can be totaled (`mconcat`).
78
+
79
+
80
+
81
+# Maintaining a Bit of State
82
+
83
+
84
+
85
+We also need to maintain a little bit of state as we process files in
86
+chunks. Besides the previously shown counters, we also need to know if
87
+the last character processed was part of a word. This makes it easy to
88
+count words when encountering whitespace, even when there are several
89
+consecutive whitespace characters.
90
+
91
+
92
+
93
+``` {.haskell include="vendor/wc-streams/src/Main.hs" token="state"}
94
+data State = S
95
+  { inWord :: Bool
96
+    -- ^ Last character was part of a word.
97
+
98
+  , counters :: Counters
99
+    -- ^ Current set of counters.
100
+  }
101
+```
102
+
103
+# Counting Unicode Characters
104
+
105
+
106
+
107
+The `wc` function creates a new `State` value based on the previous
108
+state and one character from the input stream.
109
+
110
+
111
+
112
+``` {.haskell include="vendor/wc-streams/src/Main.hs" token="wc"}
113
+wc :: State -> Char -> State
114
+wc state char = case char of
115
+    '\n'               -> newline
116
+    _   | isSpace char -> whitespace
117
+        | otherwise    -> nonspace
118
+  where
119
+    …
120
+```
121
+
122
+
123
+
124
+A newline character modifies the state more than other characters
125
+because it should increment the number of characters, lines, and
126
+possibly the number of words (a newline might terminate a word).
127
+
128
+Space characters also increase the character count. If the previous
129
+character was part of a word, space characters also increment the number
130
+of words (and set `inWord` to `False`).
131
+
132
+All other characters increment the character count and update the state
133
+so that `inWord` is `True`.
134
+
135
+
136
+
137
+# Processing a Stream of Bytes
138
+
139
+
140
+
141
+The `stream` function takes an `InputStream ByteString`, converts it
142
+into an `InputStream Text`, and then process all characters in the
143
+stream by continually reading from it until a `Nothing` is returned.
144
+
145
+
146
+
147
+``` {.haskell include="vendor/wc-streams/src/Main.hs" token="stream"}
148
+stream :: InputStream ByteString -> IO Counters
149
+stream = Streams.decodeUtf8  >=>
150
+         go (S False mempty) >=>
151
+         return . counters
152
+  where
153
+    go :: State -> InputStream Text -> IO State
154
+    go state upstream = do
155
+      textM <- Streams.read upstream
156
+      case textM of
157
+        Nothing   -> return (eof state)
158
+        Just text -> go (T.foldl' wc state text) upstream
159
+    …
160
+```
161
+
162
+
163
+
164
+The `eof` function isn’t shown here but plays an important role. After
165
+consuming all input from a stream we might need to increment the number
166
+of words if the last character in the stream was part of a word.
167
+
168
+
169
+
170
+# Putting it All Together
171
+
172
+
173
+
174
+The only (important) thing left is the `main` function. There are
175
+basically two branches depending on how many files were listed on the
176
+command line. When no files are given, the `stream` function will be
177
+used with standard input. Otherwise the `stream` function will be used
178
+with each of the files listed on the command line.
179
+
180
+When more than one file is given to the `wc` utility, it will print the
181
+counters for each file and then a grand total.
182
+
183
+
184
+
185
+``` {.haskell include="vendor/wc-streams/src/Main.hs" token="main"}
186
+main :: IO ()
187
+main = do
188
+  args <- getArgs
189
+
190
+  case args of
191
+    -- No command line arguments…
192
+    [] -> report 1 "" =<< stream (Streams.stdin)
193
+
194
+    -- List of files…
195
+    fs -> do
196
+      cs <- mapM (flip Streams.withFileAsInput stream) fs
197
+      mapM_ (uncurry $ report (width cs)) (zip fs cs)
198
+      when (length cs > 1) $ report (width cs) "total" (mconcat cs)
199
+```
200
+
201
+# Getting the Code
202
+
203
+
204
+
205
+The source code for the Haskell implementation of `wc` can be found at
206
+the following URL:
207
+
208
+
209
+
210
+<https://github.com/boulder-haskell-programmers/wc-streams>
211
+
212
+<!-- Links -->
213
+

+ 170
- 0
slides.md View File

@@ -0,0 +1,170 @@
1
+---
2
+title: Simple Streaming with io-streams
3
+author: Peter J. Jones \<pjones@devalot.com\>
4
+...
5
+
6
+# Simple Introduction to io-streams
7
+
8
+The Basics of [io-streams][]:
9
+
10
+  * `InputStream a`: read values of type `a`
11
+  * `OutputStream a`: write values of type `a`
12
+  * `Maybe a`: `Nothing` means EOF
13
+
14
+<div class="notes">
15
+
16
+Values are read from an `InputStream`.  You usually begin by creating
17
+an `InputStream ByteString` (an input stream which produces a
18
+`ByteString` each time it's read from) and transform it with a
19
+function which produces the values you are interested in.  We'll see
20
+one such function (`decodeUtf8`) which transforms an `InputStream
21
+ByteString` into an `InputStream Text`.
22
+
23
+An `OutputStream` is similar, except that it represents a stream you
24
+can write values to.  Input streams and output streams can be
25
+connected to one another as long as they share the same value type
26
+(the `a` type from above).
27
+
28
+When reading values from an `InputStream` with the `read` function, a
29
+`Maybe a` value is returned.  `Nothing` signals the end of the stream.
30
+In a similar fashion, values written to an `OutputStream` are wrapped
31
+in a `Maybe` to signal downstream when no more values will be written.
32
+
33
+</div>
34
+
35
+# io-streams is great because...
36
+
37
+  * It's really easy to learn and use
38
+  * Stream processing is very fast
39
+  * It's easy to compose stream functions together
40
+
41
+# io-streams is less desirable because...
42
+
43
+  * It makes heavy use of the `IO` type
44
+  * Limited functionality compared to Conduit and Pipes
45
+  * Error handling is done via exceptions
46
+
47
+# Writing `wc` with Haskell and io-streams
48
+
49
+The POSIX `wc` utility:
50
+
51
+  * Processes zero or more files
52
+  * When no files are given, processes STDIN
53
+  * Counts bytes, characters, words, and lines
54
+  * I've decided to omit bytes in my implementation
55
+
56
+# Keeping Track of Lines, Words, and Characters
57
+
58
+<div class="notes">
59
+
60
+Let's start with some types.  First up is a type for tracking the
61
+counters we'll report after processing all files.
62
+
63
+</div>
64
+
65
+~~~ {.haskell include="vendor/wc-streams/src/Main.hs" token="counters"}
66
+~~~
67
+
68
+<div class="notes">
69
+
70
+The `Counters` type is a `Monoid` so that an initial counter can be
71
+created (`mempty`) and a list of counters can be totaled (`mconcat`).
72
+
73
+</div>
74
+
75
+# Maintaining a Bit of State
76
+
77
+<div class="notes">
78
+
79
+We also need to maintain a little bit of state as we process files in
80
+chunks.  Besides the previously shown counters, we also need to know
81
+if the last character processed was part of a word.  This makes it
82
+easy to count words when encountering whitespace, even when there are
83
+several consecutive whitespace characters.
84
+
85
+</div>
86
+
87
+~~~ {.haskell include="vendor/wc-streams/src/Main.hs" token="state"}
88
+~~~
89
+
90
+# Counting Unicode Characters
91
+
92
+<div class="notes">
93
+
94
+The `wc` function creates a new `State` value based on the previous
95
+state and one character from the input stream.
96
+
97
+</div>
98
+
99
+~~~ {.haskell include="vendor/wc-streams/src/Main.hs" token="wc"}
100
+~~~
101
+
102
+<div class="notes">
103
+
104
+A newline character modifies the state more than other characters
105
+because it should increment the number of characters, lines, and
106
+possibly the number of words (a newline might terminate a word).
107
+
108
+Space characters also increase the character count.  If the previous
109
+character was part of a word, space characters also increment the
110
+number of words (and set `inWord` to `False`).
111
+
112
+All other characters increment the character count and update the
113
+state so that `inWord` is `True`.
114
+
115
+</div>
116
+
117
+# Processing a Stream of Bytes
118
+
119
+<div class="notes">
120
+
121
+The `stream` function takes an `InputStream ByteString`, converts it
122
+into an `InputStream Text`, and then process all characters in the
123
+stream by continually reading from it until a `Nothing` is returned.
124
+
125
+</div>
126
+
127
+~~~ {.haskell include="vendor/wc-streams/src/Main.hs" token="stream"}
128
+~~~
129
+
130
+<div class="notes">
131
+
132
+The `eof` function isn't shown here but plays an important role.
133
+After consuming all input from a stream we might need to increment the
134
+number of words if the last character in the stream was part of a
135
+word.
136
+
137
+</div>
138
+
139
+
140
+# Putting it All Together
141
+
142
+<div class="notes">
143
+
144
+The only (important) thing left is the `main` function.  There are
145
+basically two branches depending on how many files were listed on the
146
+command line.  When no files are given, the `stream` function will be
147
+used with standard input.  Otherwise the `stream` function will be
148
+used with each of the files listed on the command line.
149
+
150
+When more than one file is given to the `wc` utility, it will print
151
+the counters for each file and then a grand total.
152
+
153
+</div>
154
+
155
+~~~ {.haskell include="vendor/wc-streams/src/Main.hs" token="main"}
156
+~~~
157
+
158
+# Getting the Code
159
+
160
+<div class="notes">
161
+
162
+The source code for the Haskell implementation of `wc` can be found at
163
+the following URL:
164
+
165
+</div>
166
+
167
+<https://github.com/boulder-haskell-programmers/wc-streams>
168
+
169
+<!-- Links -->
170
+[io-streams]: https://hackage.haskell.org/package/io-streams

+ 1
- 0
vendor/devalot-slides

@@ -0,0 +1 @@
1
+Subproject commit 3f39ebe77ee14f3a5d22567bf4c98f3f858bfa0b

+ 1
- 0
vendor/wc-streams

@@ -0,0 +1 @@
1
+Subproject commit 402c023744cfb9e162dc644d4ea85dc01fd92a0e

Loading…
Cancel
Save