Skip to content

S3Q2 · Simple Stemmer

⚡ Quick Reference

Type: File-in, stdout-out

Core idea: for each word, find the first suffix in the list that the word ends with, strip it, print the result. Preserve line structure.

def stem(word):
    for suffix in suffixes:
        if word.endswith(suffix):
            return word[:-len(suffix)]
    return word

with open(filename) as f:
    for line in f:
        stemmed = " ".join(stem(w) for w in line.split())
        if stemmed:
            print(stemmed)

Key rules: - Try suffixes in order - remove the first match only - If no suffix matches, keep the word unchanged - Preserve line structure: words on the same line stay on the same line - Skip blank lines


Problem Statement

Problem (File I/O → stdout)

Read a file of lowercase words (space/newline separated). For each word, remove the first matching suffix from the provided list. Print the result preserving original line structure.

Suffix order matters: "education" matches both "ation" (index 2) and "tion" (index 4). Since "ation" comes first → "educ".


Understanding the stemming rule

suffixes = ['wards', 'ations', 'ation', 'tions', 'tion', ...]

For a given word, scan the suffix list from left to right. Remove the first suffix found. Stop after one removal - don't chain.

"education"  → matches "ation" (index 2)  → "educ"
"nations"    → matches "ations" (index 1) → "n"
"running"    → matches "ing" (index 29)   → "runn"
"cat"        → no match                   → "cat"  (unchanged)

Solution approaches

import tempfile, sys

_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
    f.write(sys.stdin.read())

suffixes = [
    'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
    'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
    'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
    'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
    'or', 'ty', 'en', 'ic', 'al', 'ly'
]

def stem(word):
    for suffix in suffixes:
        if word.endswith(suffix):
            return word[:-len(suffix)]
    return word

with open(filename) as f:
    for line in f:
        words   = line.split()
        stemmed = [stem(w) for w in words]
        if stemmed:
            print(" ".join(stemmed))
import tempfile, sys

_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
    f.write(sys.stdin.read())

suffixes = [
    'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
    'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
    'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
    'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
    'or', 'ty', 'en', 'ic', 'al', 'ly'
]

def stem(word):
    for suffix in suffixes:
        if word.endswith(suffix):
            return word[:-len(suffix)]   # strip the suffix
    return word                           # no match - return unchanged

with open(filename) as f:
    for line in f:
        line  = line.strip()
        if not line:
            continue
        words   = line.split()
        result  = []
        for word in words:
            result.append(stem(word))
        print(" ".join(result))
import tempfile, sys

_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
    f.write(sys.stdin.read())

suffixes = [
    'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
    'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
    'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
    'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
    'or', 'ty', 'en', 'ic', 'al', 'ly'
]

def stem(word):
    match = next((s for s in suffixes if word.endswith(s)), None)
    return word[:-len(match)] if match else word

with open(filename) as f:
    for line in f:
        words = line.split()
        if words:
            print(" ".join(stem(w) for w in words))

next(generator, default) returns the first matching suffix or None. Clean one-liner for the stem function.

import tempfile, sys

_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
    f.write(sys.stdin.read())

suffixes = [
    'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
    'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
    'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
    'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
    'or', 'ty', 'en', 'ic', 'al', 'ly'
]

stem = lambda w: next(
    (w[:-len(s)] for s in suffixes if w.endswith(s)), w
)

with open(filename) as f:
    for line in f:
        words = line.split()
        if words:
            print(" ".join(map(stem, words)))

Key takeaways

01

First match wins - stop after one removal

Scan suffixes in order; return word[:-len(suffix)] exits immediately on the first match. Without the early return, a longer suffix that appears earlier in the list might be skipped in favour of a shorter one that also matches.

02

line.split() handles multiple spaces and blank lines

line.split() without arguments splits on any whitespace and ignores leading/trailing spaces. It returns an empty list for blank lines - checking if words: before printing skips them cleanly.

03

next(generator, default) - clean first-match pattern

next((s for s in suffixes if word.endswith(s)), None) returns the first matching suffix or None. Paired with a conditional strip, it replaces an explicit loop with a single expression.