Lexer
[!question]- What is the input and output of a Lexer?
- Input: Stream of Characters
- Output: Stream of tokens
[!question]- Lexer is a $\underline{\hspace{3cm}}$ and $\underline{\hspace{3cm}}$ process Aggregation and Classification
[!question]- What is a microsyntax It is the lexical structure of a language It specifies how to group characters into words and how to split up words that run together
[!question]- Which is the only part of the compiler that manipulates every character of the input program? Lexer
[!question]- Differentiate token, lexeme and pattern
- Token is the smallest meaningful part of a program
- The can’t be broken down further
- The can be thought of as categories
- Lexeme is a sequence of characters that we have matched from the stream of characters
- For a Lexeme to be considered a valid token it has to match the token’s pattern
[!example]
main
is a lexeme For it to be considered aKEYWORD
it has to match[a-zA-Z_]{a-zA-Z0-9_}*
[!question]- What are the three operations supported by Regular expressions
- Alternation
- $R | S = {x | x \in R\ or\ y \in S}$
- Concatenation
- $RS = {xy |x \in R\ and\ y \in S}$
- Self concatenation can be denoted as $R^i$ where $i$ is the number of times $R$ is concatenated with itself
- Kleene Star Closure
- $R^* = \bigcup_{i=0}^{\infty}R^i$
[!question]- What is finite closure $R^3 = {R | RR |RRR}$
[!question]- What is positive closure $R^+ = RR^*$
[!question]- Draw the cycle of constructions
[!question]- Formally define a NFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times (\Sigma \cup \varepsilon) \rightarrow 2^Q$
[!question]- Formally define a DFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times \Sigma \rightarrow Q$
[!question]- Differentiate between tokenizer and recognizer
- Recognizer takes an input string and produces a binary result of Accept/Reject
- Tokenizer takes an input string, splits them into tokens and produces an output of the format
<Token Name, Token Value>
[!question]- What are the two rules used for disambiguation in a lexer?
- Longest Match Rule
- Rule Priority
[!question]- What is the Longest Match Rule and give an example? Suppose we have a number of rules that match a prefix of a given string, pick the rule that matches the most number of characters.
[!example]
for8
will match both anIDENTIFIER
andFOR
FOR
matchesfor
(3 char)IDENTIFIER
matchesfor8
(4 char) Hence we go withIDENTIFIER
[!question]- What is Rule Priority and give an example? Lexer rules are given in order of precedence. If there are multiple matching rules, pick the rule that is given first
[!example]
for
will match bothINDENTIFIER
andFOR
FOR
is given first So we go withFOR