2023-10-10

LL(1) Parsing

reference: https://learning.edx.org/course/course-v1:StanfordOnline+SOE.YCSCS1+3T2020/block-v1:StanfordOnline+SOE.YCSCS1+3T2020+type@sequential+block@77d8cb1649f14de5a5bea817ffcdcd27/block-v1:StanfordOnline+SOE.YCSCS1+3T2020+type@vertical+block@0b953e3c822f4ee29e963616d74a38db

Explanation of the name

The L’s in LL stand for

Left to right scanning
Leftmost derivation

LL(1) is a type of predictive parsing meaning it never backtracks. How does it accomplish this?

Restricted grammar
Lookahead

These parsers only allow grammars that are LL(k) and they theoretically have k tokens of lookahead

$A \rightarrow \alpha| \beta$, we want to be able to choose between $\alpha$ and $\beta$ by just looking at the next input character

LL(1) is a top-down parser meaning it starts with the start symbol and tries to expand to the given string

It reads the input from Left to right and produces the left most derivation

So it builds the tree in the same order as the input i.e the node that gets expanded to become a leaf will correspond to the first token and the next node will correspond to the next token

LL(1) uses a table from $N \times T$

T[A, t] means that we are currently at a A node in the parse tree and the next input is a t. Suppose this value is $\alpha$. That means that either

There is a derivation from $\alpha$ of the form $t\beta$ where $\beta \in (N \cup T)^*$ i.e the derivation from $\alpha$ as $t$ as it first symbol
Or if $\alpha \Rightarrow^* \varepsilon$ and $S \Rightarrow^* \beta A t \delta$

Or in layman’s terms We choose to expand $A$ to $\alpha$ on seeing that the next input is $t$ if

$\alpha$ expands to something which has $t$ as its first symbol
$\alpha$ expands to $\varepsilon$ and there is a derivation from $S$ such that $t$ comes after $A$. So when $A$ is expanded to $\alpha$ and $\alpha$ becomes $\varepsilon$ we still have $t$ at the start. This in effect is equivalent to $\alpha$ expanding to $t$.

We need a way to formalize these two things

Let $FIRST(\alpha)$ be defined as follows: $x \in FIRST(\alpha) \iff \alpha \Rightarrow^* t\beta$ where $\beta \in (N \cup T)^$ $FIRST(X) = {t | X \Rightarrow^ t\alpha} \cup {\varepsilon | X \rightarrow^* \varepsilon}; \alpha \in (N \cup T)^*$

[!note] We don’t usually include $\varepsilon$ as terminal. But we need to store if a NT goes to $\varepsilon$

This takes care of the first point - Can this NT expand to the input at hand at some point?

Let $FOLLOW(\alpha)$ be defined as follows: $x \in FOLLOW(\alpha) \iff \varepsilon \in FIRST(\alpha) S \Rightarrow^* \beta \alpha t \delta$ or $t \in FIRST(\beta)$ for any $\beta$ that appears after $\alpha$ in any production. There exists a production $A \rightarrow B\alpha\beta\gamma$ and $t \in FIRST(\beta)$; $B, \gamma \in (N \cup T)^*$