automata-rich/02_languages_and_strings.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
--- geometry: - lmargin=0.9in - rmargin=0.175in - tmargin=0.3in - bmargin=0.5in - twoside papersize: a4 classoption: - twocolumn ... ## Strings ### Definitions - **Alphabet ($\Sigma$)**\ A a **finite set** consiting of symbols or characters. - **String**\ A **sequence** of symbols drawn from some alphabet $\Sigma$. A string can be empty, denoted by $\epsilon$ - **$\boldsymbol\Sigma$\***\ Denotes the set of **all possible strings** over an alphabet $\Sigma$$\*$ ### Functions on a string - **length**\ $|\epsilon|=0$\ $|100111|=6$ - **character occurences**\ $\#_a$(`abbaa`) = 3 - **concatenation**\ $x = good$, $y = bye$\ $xy=goodbye$, $yx=byegood$\ Also, $x||y\ = goodbye$ (Another notation)\ \- The empty string $\epsilon$ is the identity of the concatenation. $$\forall x(x\epsilon = \epsilon x = x)$$ \- Concatentation is also associative. $$\forall s,t,w\ ((st)w = s(tw))$$ - **replication**\ $$w^0 = \epsilon$$ $$w^i = w^{i-1} w$$ $a^3 = aaa$ , $(bye)^2 = byebye$ , $a^0b^3 = bbb$ - **reversal**\ For a string $w$, the reversal $w^R$ is defined as: \begin{equation}\notag \text{if }|w| = 0 \text{ then } w^R = w = \epsilon \\[-0.3in] \end{equation} \begin{equation}\notag \begin{split} &\text{if }|w| \geq 1 \text{ then } \exists a \in \Sigma(\exists u \in \Sigma^\ast (w = ua))\\ &\text{then } w^R = au^R \end{split} \end{equation} ### Theorem 2.1 If $w$ and $x$ are strings, then $(wx)^R=x^Rw^r$ ### Relations on Strings - **substring**\ A string $s$ is a substring of a string $t$ iff $s$ occurs contiguously as a part of $t$\ A string $s$ is a **proper substring** of a string $t$ iff $s$ is a substring of $t$ and $\bold{s \neq t}$ - **prefix**\ A string *s* is a prefix of *t* iff $\exists x \in \Sigma^\ast(t = sx)$\ **proper prefix** again same as above ($\bold{s \neq t}$) - **suffix**\ A string *s* is a suffix of *t* iff $\exists x \in \Sigma^\ast(t = xs)$\ **proper suffix** again same as above ($\bold{s \neq t}$) **Empty string $\epsilon$ is a suffix, prefix and a substring of every string** ## Languages A **language** is a **finite or infinite set of strings** over a fine alphabet $\Sigma$.\ For a language *L* defined over alphabet $\Sigma$, $L\subset \Sigma^\ast$\ $\Sigma_L$ is used to denote the alphabet from which the strings in the language *L* are formed. ### Ways of defining languages - We can have a **language generator** that **enumerates** the elements of the language or have **language recognizer** which **decides** whether or not a candidate string is in the language and returns *True* if it is and *False* if it isn't. - **Go through examples on page 11-13 to see different ways languages are defined** - Empty language $L = \{\}$ = $\varnothing$ is a language that contains no strings but a language $L = \{\epsilon\}$ is not an empty language as it consists of one element i.e., an empty string. - **Lexigographic order** - Sometimes we may care about the order in which the elements of a language are generated in. - If there exists a total order *D* then we can use *D* to define on *L* a useful total order called **lexicographic order** (written $<_L$): $$ \forall x(\forall y((\ |x| < |y|\ )\to (x <_L y)))$$ and of the strings of the same length, sort them in dictionary order using *D*. ### Cardinality of a language - The smallest language over any alphabet is $\varnothing$ with cardinality of 0 - The largest language over any alphabet $\Sigma$ is $\Sigma^\ast$ - For $\Sigma = \varnothing,\ \Sigma^\ast = \{\epsilon\}$ and $|\Sigma^\ast|=1$ theorem 2.2 is for non-empty alphabets. ### Theorem 2.2 If $\Sigma \neq \varnothing$ then $\Sigma^\ast$ is countably infinite ($\aleph_0$) - Therefore, the cardinality of every language is at least 0 and at most ${\aleph_0}$ ### How many languages are there? - The set of languages defined on $\Sigma$ is $P(\Sigma^\ast)$ - For $\Sigma = \varnothing,\ \Sigma^\ast = \{\epsilon\}$ and $P(\Sigma^\ast)\ is\ \{\varnothing, {\epsilon}\}$ - But when $\Sigma \neq \varnothing$ theorem 2.3 applies. ### Theorem 2.3 If $\Sigma \neq \varnothing$ then the set of languages over $\Sigma$ is uncountably infinite. ### Functions on a language - Since languages are sets, all of the standard set operations are well-defined on languages. (refer Ex2.12 from page 15) - **concatenation**\ Let $L_1$ and $L_2$ be two languages defined over $\Sigma$ then: $$L_1L_2=\{w\in \Sigma^\ast : \exists s \in L_1(\exists t \in \L_2(w = st))\}$$\ Example\ Let $L_1 = \{\texttt{cat, dog, mouse}\}$ and $L_2 = \{\texttt{bone, food}\}$\ \vspace{-0.2in} \begin{multline}\notag L_1L_2 = \{\texttt{catbone, catfood, dogbone, dogfood,}\\ \texttt{mousebone, mousefood}\} \end{multline} \- The language {$\epsilon$} is the identity for concatenation of languages. For all languages *L*: $$L\{\epsilon\} = \{\epsilon\}L = L$$ \- The language $\varnothing$ is a zero for concatenation of languages. For all languages *L*: $$L\varnothing = \varnothing L = \varnothing$$ There are no ways of selecting a string from an empty set.\ \- It's also associative $$(L_1L_2) L_3 = L_1(L_2L_3)$$ - **Kleene star**\ Let *L* be a language defined over $\Sigma$ then: \begin{multline}\notag L^\ast = \{\epsilon\} \cup \{w \in \Sigma^\ast : \exists k \geq 1 (\exists w_1, w_2,\dots w_k \in L\\ (w = w_1w_2\dots w_k))\} \end{multline} note[^1]\ Example:\ \begin{equation}\notag \begin{split} \text{Let }L = & \{\texttt{dog, cat, fish}\}\text{. Then:}\\ L^\ast = & \{\epsilon\texttt{, dog, cat, fish, dogdog, dogcat,\dots,}\\ & \texttt{fishdog, fishcat, fishfish,\dots,}\\ & \texttt{fishcatfish,fishdogfishcat,\dots} \}\\[0.1in] \end{split} \end{equation} \- $L^\ast$ always contains an **infinite number** of strings as long as *L* is not $\varnothing$ or {$\epsilon$}. i.e., as long as there is one nonempty string in *L*.\ \- If *L* = $\varnothing$, then $L^\ast$ = {$\epsilon$} as there are no strings that could be concatenated to $\epsilon$ to make it longer. This means that the Kleene star of any language must have at least $\epsilon$ in it.\ \- If $L=\{\epsilon\}$, then $L = \{\epsilon\}$ as well. [^1]: The part $\exists w_1, w_2,\dots w_k \in L$ should mean 'select *k* elements from *L*'. Each time selecting *k* elements in a particular permutation and concatenating them. - **reverse** Reverse of a language *L* defined over $\Sigma$, written as $L^R$ is: $$L^R = \{w \in \Sigma^\ast : w = x^R \text{ for some }x \in L \}$$ ### Theorem 2.4 If $L_1$ and $L_2$ are languages, then $(L_1L_2)^R = L_2^RL_1^R$ ### Assigning Meaning to Languages To be able to use the language and form the framework for it's applications we need to assigning meaning to its strings. Working with formal languages require a precise way to assign meaning to strings (also called its **semantics**). \- A function that maps strings to its meanings is called a **semantic interpreation function**. But since languages are infinite, its not, in general possible to map each string with its meaning.\ \- Instead we define a function in such a way that it identifies units and can combine those units based on rules to build meanings for larger expression. We call such a function, **compositional semantic interpretation function**\ \- So this function "composes" the meanings of simpler constituents into a single meaning for a larger expression.\ \- When we define a formal function to fulfill a specific purpose, we design it so that there exists a compositional semantic interpretation function.\ \- For example there exists a compositional semantic interpretation function for the C programming language, a C compiler.\ \- These functions are generally *not* one-to-one. For example:\ \vspace{-0.2in} - "Chocolate, please", "I'd like chocolate", "I'll have chocolate" all mean the same - These also have to same meaning ``` int x = 4; int x = 4; int x = 4; x++; x = x --1; x = x + 1; ```