S3Q2 · Simple Stemmer¶
⚡ Quick Reference
Type: File-in, stdout-out
Core idea: for each word, find the first suffix in the list that the word ends with, strip it, print the result. Preserve line structure.
def stem(word):
for suffix in suffixes:
if word.endswith(suffix):
return word[:-len(suffix)]
return word
with open(filename) as f:
for line in f:
stemmed = " ".join(stem(w) for w in line.split())
if stemmed:
print(stemmed)
Key rules: - Try suffixes in order - remove the first match only - If no suffix matches, keep the word unchanged - Preserve line structure: words on the same line stay on the same line - Skip blank lines
Problem Statement¶
Problem (File I/O → stdout)
Read a file of lowercase words (space/newline separated). For each word, remove the first matching suffix from the provided list. Print the result preserving original line structure.
Suffix order matters: "education" matches both "ation" (index 2) and "tion" (index 4). Since "ation" comes first → "educ".
Understanding the stemming rule¶
For a given word, scan the suffix list from left to right. Remove the first suffix found. Stop after one removal - don't chain.
"education" → matches "ation" (index 2) → "educ"
"nations" → matches "ations" (index 1) → "n"
"running" → matches "ing" (index 29) → "runn"
"cat" → no match → "cat" (unchanged)
Solution approaches¶
import tempfile, sys
_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
f.write(sys.stdin.read())
suffixes = [
'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
'or', 'ty', 'en', 'ic', 'al', 'ly'
]
def stem(word):
for suffix in suffixes:
if word.endswith(suffix):
return word[:-len(suffix)]
return word
with open(filename) as f:
for line in f:
words = line.split()
stemmed = [stem(w) for w in words]
if stemmed:
print(" ".join(stemmed))
import tempfile, sys
_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
f.write(sys.stdin.read())
suffixes = [
'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
'or', 'ty', 'en', 'ic', 'al', 'ly'
]
def stem(word):
for suffix in suffixes:
if word.endswith(suffix):
return word[:-len(suffix)] # strip the suffix
return word # no match - return unchanged
with open(filename) as f:
for line in f:
line = line.strip()
if not line:
continue
words = line.split()
result = []
for word in words:
result.append(stem(word))
print(" ".join(result))
import tempfile, sys
_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
f.write(sys.stdin.read())
suffixes = [
'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
'or', 'ty', 'en', 'ic', 'al', 'ly'
]
def stem(word):
match = next((s for s in suffixes if word.endswith(s)), None)
return word[:-len(match)] if match else word
with open(filename) as f:
for line in f:
words = line.split()
if words:
print(" ".join(stem(w) for w in words))
next(generator, default) returns the first matching suffix or None. Clean one-liner for the stem function.
import tempfile, sys
_, filename = tempfile.mkstemp(prefix="case")
with open(filename, 'w') as f:
f.write(sys.stdin.read())
suffixes = [
'wards', 'ations', 'ation', 'tions', 'tion', 'asions',
'asion', 'sions', 'sion', 'ment', 'ness', 'ship',
'hood', 'able', 'ible', 'less', 'ward', 'wise', 'ion', 'ity', 'age',
'ize', 'ise', 'ify', 'ate', 'ful', 'ous', 'ish', 'ive', 'ing', 'ers', 'er',
'or', 'ty', 'en', 'ic', 'al', 'ly'
]
stem = lambda w: next(
(w[:-len(s)] for s in suffixes if w.endswith(s)), w
)
with open(filename) as f:
for line in f:
words = line.split()
if words:
print(" ".join(map(stem, words)))
Key takeaways¶
First match wins - stop after one removal
Scan suffixes in order; return word[:-len(suffix)] exits immediately on the first match. Without the early return, a longer suffix that appears earlier in the list might be skipped in favour of a shorter one that also matches.
line.split() handles multiple spaces and blank lines
line.split() without arguments splits on any whitespace and ignores leading/trailing spaces. It returns an empty list for blank lines - checking if words: before printing skips them cleanly.
next(generator, default) - clean first-match pattern
next((s for s in suffixes if word.endswith(s)), None) returns the first matching suffix or None. Paired with a conditional strip, it replaces an explicit loop with a single expression.