Cheat sheet

Part 3 · Tagging & Parsing — Cheat Sheet

POS basics, the HMM math, parsing types, the error-compounding cascade, and the common tag reference — all in one page.

Part 3 · Tagging & Parsing — Cheat Sheet — printable cheat sheet
Download PNG

Or read the searchable version below.

1

What is POS tagging?

POS = the role a word plays in a sentence (noun, verb, adjective, …).

Goal: determine the POS tag for a particular instance of a word — same word can be different POS in different sentences (book = noun or verb).

Uses:

  • Enhanced regex: (Det) Adj* N+ catches multiword expressions
  • Text-to-speech (how to pronounce lead)
  • Input for full syntax parsing
  • Backoff for downstream tasks (NER → NN, sentiment → JJ)
2

Two approaches

Rule-basedProbabilistic / ML
MethodHand-written rulesLearn from annotated corpus
DataNot neededTreebank required
ProsInterpretable, works small dataScales, adapts
ConsDoesn't scale, language-specificNeeds annotation
Use whenNo data, specific domainStandard case for English

Modern default: ML with CRF / BiLSTM-CRF / fine-tuned transformer.

3

The HMM math

For words w₁…wₙ and tags t₁…tₙ:

Emission: P(wᵢ | tᵢ)how often is this word tagged this way?

  • P(flies | VBZ) = times flies was a verb / total verb tags

Transition: P(tᵢ | tᵢ₋₁)how often does this tag follow that tag?

  • P(VBZ | NN) = how often a verb follows a noun

Pick the sequence that maximises:

argmax ∏ᵢ P(wᵢ | tᵢ) · P(tᵢ | tᵢ₋₁)

Solved efficiently by Viterbi (dynamic programming).

Punchline: deep learning is, at heart, learning to count.

4

The accuracy trap

State-of-the-art: 94–97% on clean English. Looks great, isn't.

5% error = one wrong tag every 20 words = one error per sentence.

CorpusAccuracy
English WSJ~97%
English Brown~95%
Spanish AnCora~93%
English Twitter~89%
Chinese~75%

Easy tokens (the, a, punctuation) inflate the number — real accuracy on hard cases is lower.

5

Sources of error

SourceExample
Lexicon gapnew words, slang, domain jargon
Unknown wordmisspellings, rare/unseen
Multi-word expressionstake off, by the way, out of control
Structural ambiguity"Visiting relatives can be boring."
Annotation standardguidelines disagree at edges
Punctuation / tokenisationupstream tokenizer fights you

Fix: richer features (suffix, capitalisation, prev/next word, embeddings) + sequential models (CRF, BiLSTM, BERT).

6

Error compounding

Each layer multiplies the previous one's error:

TOK (95%) × POS (93%) × DP (90%) × CLS (85%) ≈ 68%

Two takeaways:

  1. Don't trust headline accuracies on individual NLP tasks — the pipeline eats them.
  2. End-to-end transformers skip the explicit pipeline → no layer-wise compounding → real structural advantage.
7

Two types of parsing

ConstituencyDependency
WhatPhrase structure treeWord-to-word arrows
LevelsTerminal (words), pre-terminal (POS), non-terminal (S/NP/VP)Each word has one head, each arc labelled
Used byLinguistic theory, older NLPModern parsers (spaCy, Stanza), translation
import spacy
nlp = spacy.load("en_core_web_sm")
for tok in nlp("I saw the big dog"):
    print(tok.text, tok.dep_, "→", tok.head.text)
# I nsubj → saw
# saw ROOT → saw
# the det → dog
# big amod → dog
# dog dobj → saw
8

CFG → PCFG

CFG (Context-Free Grammar) = rules like S → NP VP, VP → V NP.

Problem: "Fed raises interest 0.5 percent" has 36 valid parses by typical English CFG rules. All grammatical, most absurd.

PCFG (Probabilistic CFG) = same rules + probabilities learned from a treebank. Pick the parse with the highest product of rule probabilities.

P(rule) = count(rule applied) / count(LHS expanded)

Same machinery as the HMM tagger, one level up. Replaced today by neural parsers (transition-based, graph-based, transformer-based).