S3Q1 · Sales Data Analysis¶

⚡ Quick Reference

Function: analyse_sales_data(sales_data: list, task: str)

def analyse_sales_data(sales_data, task):
    # Build aggregation once - reused by multiple tasks
    agg = {}
    for record in sales_data:
        pid = record["product_id"]
        if pid not in agg:
            agg[pid] = [0, 0]    # [total_units, total_revenue]
        agg[pid][0] += record["units_sold"]
        agg[pid][1] += record["revenue"]

    if task == "total_revenue":
        return sum(record["revenue"] for record in sales_data)

    if task == "product_wise_total_units_and_revenue":
        return {pid: (v[0], v[1]) for pid, v in agg.items()}

    if task == "top_selling_product":
        return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))

    if task == "average_product_price":
        return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}

Key rules: - Build one aggregation dict ({pid: [units, revenue]}) - all four tasks use it - top_selling_product ties → higher revenue wins (tuple key (units, revenue)) - average_product_price = total_revenue / total_units, rounded to 2 decimals

Problem Statement¶

Problem

Implement analyse_sales_data(sales_data, task) - four analytics operations on transaction records.

Sample data:

sales_data = [
    {"product_id": "P101", "units_sold": 50,  "revenue": 400},
    {"product_id": "P102", "units_sold": 30,  "revenue": 900},
    {"product_id": "P101", "units_sold": 70,  "revenue": 600},
    {"product_id": "P103", "units_sold": 120, "revenue": 600},
]

Building the aggregation¶

All four tasks require per-product totals. Build it once:

agg = {}
for record in sales_data:
    pid = record["product_id"]
    if pid not in agg:
        agg[pid] = [0, 0]
    agg[pid][0] += record["units_sold"]
    agg[pid][1] += record["revenue"]

# agg = {"P101": [120, 1000], "P102": [30, 900], "P103": [120, 600]}

Operation 1 - `total_revenue`¶

Sum all revenue values across every transaction:

return sum(record["revenue"] for record in sales_data)

400 + 900 + 600 + 600 = 2500 ✓

Operation 2 - `product_wise_total_units_and_revenue`¶

return {pid: (v[0], v[1]) for pid, v in agg.items()}

→ {"P101": (120, 1000), "P102": (30, 900), "P103": (120, 600)} ✓

Operation 3 - `top_selling_product`¶

Highest units, ties broken by higher revenue:

return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))

P101 and P103 both have 120 units. Tuple comparison: (120, 1000) > (120, 600) → P101 wins ✓

Operation 4 - `average_product_price`¶

return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}

P101: 1000 / 120 = 8.333… → 8.33 ✓
P102: 900 / 30 = 30.0 → 30.0 ✓
P103: 600 / 120 = 5.0 → 5.0 ✓

Complete solution approaches¶

Pythonic (single aggregation)Using defaultdictExplanatory (separate loops)

def analyse_sales_data(sales_data, task):
    agg = {}
    for r in sales_data:
        pid = r["product_id"]
        if pid not in agg:
            agg[pid] = [0, 0]
        agg[pid][0] += r["units_sold"]
        agg[pid][1] += r["revenue"]

    if task == "total_revenue":
        return sum(r["revenue"] for r in sales_data)

    if task == "product_wise_total_units_and_revenue":
        return {pid: (v[0], v[1]) for pid, v in agg.items()}

    if task == "top_selling_product":
        return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))

    if task == "average_product_price":
        return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}

from collections import defaultdict

def analyse_sales_data(sales_data, task):
    agg = defaultdict(lambda: [0, 0])
    for r in sales_data:
        agg[r["product_id"]][0] += r["units_sold"]
        agg[r["product_id"]][1] += r["revenue"]

    if task == "total_revenue":
        return sum(r["revenue"] for r in sales_data)

    if task == "product_wise_total_units_and_revenue":
        return {pid: (v[0], v[1]) for pid, v in agg.items()}

    if task == "top_selling_product":
        return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))

    if task == "average_product_price":
        return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}

def analyse_sales_data(sales_data, task):
    if task == "total_revenue":
        return sum(r["revenue"] for r in sales_data)

    # Build aggregation for remaining tasks
    agg = {}
    for r in sales_data:
        pid = r["product_id"]
        if pid not in agg:
            agg[pid] = {"units": 0, "revenue": 0}
        agg[pid]["units"]   += r["units_sold"]
        agg[pid]["revenue"] += r["revenue"]

    if task == "product_wise_total_units_and_revenue":
        return {pid: (v["units"], v["revenue"]) for pid, v in agg.items()}

    if task == "top_selling_product":
        best = max(agg, key=lambda pid: (agg[pid]["units"], agg[pid]["revenue"]))
        return best

    if task == "average_product_price":
        return {pid: round(v["revenue"] / v["units"], 2) for pid, v in agg.items()}

Key takeaways¶

01

Build the aggregation once - all tasks share it

Computing per-product totals is O(n) and needed by three of the four tasks. Build it once at the start and reuse - avoids scanning the data multiple times and keeps each task's logic minimal.

02

Tuple key for multi-criteria max

max(agg, key=lambda pid: (agg[pid][0], agg[pid][1])) - Python compares tuples lexicographically. Primary key is units; secondary key is revenue. Ties on units automatically fall through to revenue comparison.

03

average = total_revenue / total_units - not per-transaction average

The average price is computed across all transactions for the product, not the mean of per-transaction prices. For P101: `(400+600) / (50+70) = 8.33` - divide aggregated totals, not individual records.