S3Q1 · Text Frequency Analysis¶

⚡ Quick Reference

Function: analyse_text(sentence: str, task: str)

Four tasks, non-alphabetic chars excluded from analysis:

def analyse_text(sentence: str, task: str):
    words = sentence.split()
    alpha_only = [c.lower() for c in sentence if c.isalpha()]
    clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]

    if task == "count_non_whitespace":
        return sum(1 for c in sentence if c not in (' ', '\n'))

    elif task == "most_frequent_char":
        from collections import Counter
        freq = Counter(alpha_only)
        max_count = max(freq.values())
        return max(c for c in freq if freq[c] == max_count)

    elif task == "count_diverse_words":
        return sum(1 for w in clean_words if len(set(w)) > 5)

    elif task == "word_with_highest_char_frequency":
        from collections import Counter
        def max_freq(w): return max(Counter(w).values()) if w else 0
        return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]

Key rules: - most_frequent_char -case insensitive, ties → largest letter alphabetically - count_diverse_words -> 5 distinct chars (case insensitive), non-alpha excluded - word_with_highest_char_frequency -ties → first word in sentence - Non-alphabetic characters excluded from character analysis

Problem Statement¶

Problem

Write a function analyse_text(sentence, task) that dispatches to one of four text analysis sub-tasks.

Sample sentence:

sentence = "Python is fun, programming is awesome."

Task 1 -`count_non_whitespace`¶

Count every character that is not a space or newline:

return sum(1 for c in sentence if c not in (' ', '\n'))

"Python is fun, programming is awesome." → letters + punctuation, no spaces.

Count: P,y,t,h,o,n,i,s,f,u,n,,,p,r,o,g,r,a,m,m,i,n,g,i,s,a,w,e,s,o,m,e,. = 33

Non-whitespace ≠ only letters

The task says "excluding spaces and newlines" -commas, periods, and other punctuation are included in the count. Only ' ' and '\n' are excluded.

Task 2 -`most_frequent_char`¶

Case-insensitive frequency of alphabetic characters only. Ties → largest letter:

from collections import Counter
alpha_only = [c.lower() for c in sentence if c.isalpha()]
freq = Counter(alpha_only)
max_count = max(freq.values())
return max(c for c in freq if freq[c] == max_count)

For the sample: i appears 3 times (in is, is, programming) -most frequent → 'i'

max(candidates) returns the largest character when there are ties -strings compare lexicographically.

Task 3 -`count_diverse_words`¶

Count words with more than 5 distinct alphabetic characters (case insensitive):

clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]
return sum(1 for w in clean_words if len(set(w)) > 5)

Word	Clean	Distinct chars	> 5?
`Python`	`python`	{p,y,t,h,o,n} = 6	✅
`is`	`is`	2	❌
`fun,`	`fun`	3	❌
`programming`	`programming`	{p,r,o,g,a,m,i,n} = 8	✅
`is`	`is`	2	❌
`awesome.`	`awesome`	{a,w,e,s,o,m} = 6	✅

Count: 3

Task 4 -`word_with_highest_char_frequency`¶

Find the word where any single character has the highest frequency. Ties → first word:

from collections import Counter
def max_freq(w): return max(Counter(w).values()) if w else 0
return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]

Word	Clean	Max char freq
`Python`	`python`	1 (all unique)
`programming`	`programming`	`r`=2, `m`=2 → 2
`awesome.`	`awesome`	1

programming wins with max frequency 2 → 'programming'

max() scans left to right and only updates on strictly greater -ties preserve the first occurrence.

Complete solution approaches¶

Pythonic (comprehensions)Explanatory (loops)Using lambda + filter/map

from collections import Counter

def analyse_text(sentence: str, task: str):
    words = sentence.split()
    alpha_only = [c.lower() for c in sentence if c.isalpha()]
    clean_words = [''.join(c.lower() for c in w if c.isalpha()) for w in words]

    if task == "count_non_whitespace":
        return sum(1 for c in sentence if c not in (' ', '\n'))

    elif task == "most_frequent_char":
        freq = Counter(alpha_only)
        max_count = max(freq.values())
        return max(c for c in freq if freq[c] == max_count)

    elif task == "count_diverse_words":
        return sum(1 for w in clean_words if len(set(w)) > 5)

    elif task == "word_with_highest_char_frequency":
        def max_freq(w): return max(Counter(w).values()) if w else 0
        return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]

from collections import Counter

def analyse_text(sentence: str, task: str):
    words = sentence.split()
    clean_words = []
    for w in words:
        clean_words.append(''.join(c.lower() for c in w if c.isalpha()))

    if task == "count_non_whitespace":
        count = 0
        for c in sentence:
            if c not in (' ', '\n'):
                count += 1
        return count

    elif task == "most_frequent_char":
        freq = {}
        for c in sentence:
            if c.isalpha():
                c = c.lower()
                freq[c] = freq.get(c, 0) + 1
        max_count = max(freq.values())
        candidates = [c for c in freq if freq[c] == max_count]
        return max(candidates)

    elif task == "count_diverse_words":
        count = 0
        for w in clean_words:
            if len(set(w)) > 5:
                count += 1
        return count

    elif task == "word_with_highest_char_frequency":
        best_word = words[0]
        best_freq = 0
        for orig, clean in zip(words, clean_words):
            if clean:
                mf = max(Counter(clean).values())
                if mf > best_freq:
                    best_freq = mf
                    best_word = orig
        return best_word

from collections import Counter

def analyse_text(sentence: str, task: str):
    words = sentence.split()
    clean = lambda w: ''.join(filter(str.isalpha, w)).lower()
    clean_words = list(map(clean, words))

    if task == "count_non_whitespace":
        return len(list(filter(lambda c: c not in (' ', '\n'), sentence)))

    elif task == "most_frequent_char":
        freq = Counter(filter(str.isalpha, sentence.lower()))
        max_count = max(freq.values())
        return max(filter(lambda c: freq[c] == max_count, freq))

    elif task == "count_diverse_words":
        return len(list(filter(lambda w: len(set(w)) > 5, clean_words)))

    elif task == "word_with_highest_char_frequency":
        max_freq = lambda w: max(Counter(w).values()) if w else 0
        return max(zip(words, clean_words), key=lambda p: max_freq(p[1]))[0]

Key takeaways¶

01

max(candidates) for alphabetically largest tie-breaker

When multiple characters share the maximum frequency, max(candidates) returns the largest one alphabetically -'z' > 'a' in Python string comparison. No sorting needed.

02

Clean words once, reuse everywhere

Pre-compute clean_words (lowercase, alpha-only) before the task dispatch. Tasks 3 and 4 both need it -computing it once avoids redundant work.

03

max() on zip() for paired data

max(zip(words, clean_words), key=lambda p: score(p[1]))[0] finds the best word using the cleaned version for scoring but returns the original. A clean pattern for "score one thing, return another".