S3Q1 · Text Frequency Analysis¶
⚡ Quick Reference
Function: analyse_text(sentence: str, task: str)
Four tasks, non-alphabetic chars excluded from analysis:
def analyse_text(sentence: str, task: str):
words = sentence.split()
alpha_only = [c.lower() for c in sentence if c.isalpha()]
clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]
if task == "count_non_whitespace":
return sum(1 for c in sentence if c not in (' ', '\n'))
elif task == "most_frequent_char":
from collections import Counter
freq = Counter(alpha_only)
max_count = max(freq.values())
return max(c for c in freq if freq[c] == max_count)
elif task == "count_diverse_words":
return sum(1 for w in clean_words if len(set(w)) > 5)
elif task == "word_with_highest_char_frequency":
from collections import Counter
def max_freq(w): return max(Counter(w).values()) if w else 0
return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]
Key rules:
- most_frequent_char -case insensitive, ties → largest letter alphabetically
- count_diverse_words -> 5 distinct chars (case insensitive), non-alpha excluded
- word_with_highest_char_frequency -ties → first word in sentence
- Non-alphabetic characters excluded from character analysis
Problem Statement¶
Problem
Write a function analyse_text(sentence, task) that dispatches to one of four text analysis sub-tasks.
Sample sentence:
Task 1 -count_non_whitespace¶
Count every character that is not a space or newline:
"Python is fun, programming is awesome." → letters + punctuation, no spaces.
Count: P,y,t,h,o,n,i,s,f,u,n,,,p,r,o,g,r,a,m,m,i,n,g,i,s,a,w,e,s,o,m,e,. = 33
Non-whitespace ≠ only letters
The task says "excluding spaces and newlines" -commas, periods, and other punctuation are included in the count. Only ' ' and '\n' are excluded.
Task 2 -most_frequent_char¶
Case-insensitive frequency of alphabetic characters only. Ties → largest letter:
from collections import Counter
alpha_only = [c.lower() for c in sentence if c.isalpha()]
freq = Counter(alpha_only)
max_count = max(freq.values())
return max(c for c in freq if freq[c] == max_count)
For the sample: i appears 3 times (in is, is, programming) -most frequent → 'i'
max(candidates) returns the largest character when there are ties -strings compare lexicographically.
Task 3 -count_diverse_words¶
Count words with more than 5 distinct alphabetic characters (case insensitive):
clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]
return sum(1 for w in clean_words if len(set(w)) > 5)
| Word | Clean | Distinct chars | > 5? |
|---|---|---|---|
Python |
python |
{p,y,t,h,o,n} = 6 | ✅ |
is |
is |
2 | ❌ |
fun, |
fun |
3 | ❌ |
programming |
programming |
{p,r,o,g,a,m,i,n} = 8 | ✅ |
is |
is |
2 | ❌ |
awesome. |
awesome |
{a,w,e,s,o,m} = 6 | ✅ |
Count: 3
Task 4 -word_with_highest_char_frequency¶
Find the word where any single character has the highest frequency. Ties → first word:
from collections import Counter
def max_freq(w): return max(Counter(w).values()) if w else 0
return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]
| Word | Clean | Max char freq |
|---|---|---|
Python |
python |
1 (all unique) |
programming |
programming |
r=2, m=2 → 2 |
awesome. |
awesome |
1 |
programming wins with max frequency 2 → 'programming'
max() scans left to right and only updates on strictly greater -ties preserve the first occurrence.
Complete solution approaches¶
from collections import Counter
def analyse_text(sentence: str, task: str):
words = sentence.split()
alpha_only = [c.lower() for c in sentence if c.isalpha()]
clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]
if task == "count_non_whitespace":
return sum(1 for c in sentence if c not in (' ', '\n'))
elif task == "most_frequent_char":
freq = Counter(alpha_only)
max_count = max(freq.values())
return max(c for c in freq if freq[c] == max_count)
elif task == "count_diverse_words":
return sum(1 for w in clean_words if len(set(w)) > 5)
elif task == "word_with_highest_char_frequency":
def max_freq(w): return max(Counter(w).values()) if w else 0
return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]
from collections import Counter
def analyse_text(sentence: str, task: str):
words = sentence.split()
clean_words = []
for w in words:
clean_words.append(''.join(c.lower() for c in w if c.isalpha()))
if task == "count_non_whitespace":
count = 0
for c in sentence:
if c not in (' ', '\n'):
count += 1
return count
elif task == "most_frequent_char":
freq = {}
for c in sentence:
if c.isalpha():
c = c.lower()
freq[c] = freq.get(c, 0) + 1
max_count = max(freq.values())
candidates = [c for c in freq if freq[c] == max_count]
return max(candidates)
elif task == "count_diverse_words":
count = 0
for w in clean_words:
if len(set(w)) > 5:
count += 1
return count
elif task == "word_with_highest_char_frequency":
best_word = words[0]
best_freq = 0
for orig, clean in zip(words, clean_words):
if clean:
mf = max(Counter(clean).values())
if mf > best_freq:
best_freq = mf
best_word = orig
return best_word
from collections import Counter
def analyse_text(sentence: str, task: str):
words = sentence.split()
clean = lambda w: ''.join(filter(str.isalpha, w)).lower()
clean_words = list(map(clean, words))
if task == "count_non_whitespace":
return len(list(filter(lambda c: c not in (' ', '\n'), sentence)))
elif task == "most_frequent_char":
freq = Counter(filter(str.isalpha, sentence.lower()))
max_count = max(freq.values())
return max(filter(lambda c: freq[c] == max_count, freq))
elif task == "count_diverse_words":
return len(list(filter(lambda w: len(set(w)) > 5, clean_words)))
elif task == "word_with_highest_char_frequency":
max_freq = lambda w: max(Counter(w).values()) if w else 0
return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]
Key takeaways¶
max(candidates) for alphabetically largest tie-breaker
When multiple characters share the maximum frequency, max(candidates) returns the largest one alphabetically -'z' > 'a' in Python string comparison. No sorting needed.
Clean words once, reuse everywhere
Pre-compute clean_words (lowercase, alpha-only) before the task dispatch. Tasks 3 and 4 both need it -computing it once avoids redundant work.
max() on zip() for paired data
max(zip(words, clean_words), key=lambda p: score(p[1]))[0] finds the best word using the cleaned version for scoring but returns the original. A clean pattern for "score one thing, return another".