S3Q1 · Sales Data Analysis¶
⚡ Quick Reference
Function: analyse_sales_data(sales_data: list, task: str)
def analyse_sales_data(sales_data, task):
# Build aggregation once - reused by multiple tasks
agg = {}
for record in sales_data:
pid = record["product_id"]
if pid not in agg:
agg[pid] = [0, 0] # [total_units, total_revenue]
agg[pid][0] += record["units_sold"]
agg[pid][1] += record["revenue"]
if task == "total_revenue":
return sum(record["revenue"] for record in sales_data)
if task == "product_wise_total_units_and_revenue":
return {pid: (v[0], v[1]) for pid, v in agg.items()}
if task == "top_selling_product":
return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))
if task == "average_product_price":
return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}
Key rules:
- Build one aggregation dict ({pid: [units, revenue]}) - all four tasks use it
- top_selling_product ties → higher revenue wins (tuple key (units, revenue))
- average_product_price = total_revenue / total_units, rounded to 2 decimals
Problem Statement¶
Problem
Implement analyse_sales_data(sales_data, task) - four analytics operations on transaction records.
Sample data:
sales_data = [
{"product_id": "P101", "units_sold": 50, "revenue": 400},
{"product_id": "P102", "units_sold": 30, "revenue": 900},
{"product_id": "P101", "units_sold": 70, "revenue": 600},
{"product_id": "P103", "units_sold": 120, "revenue": 600},
]
Building the aggregation¶
All four tasks require per-product totals. Build it once:
agg = {}
for record in sales_data:
pid = record["product_id"]
if pid not in agg:
agg[pid] = [0, 0]
agg[pid][0] += record["units_sold"]
agg[pid][1] += record["revenue"]
# agg = {"P101": [120, 1000], "P102": [30, 900], "P103": [120, 600]}
Operation 1 - total_revenue¶
Sum all revenue values across every transaction:
400 + 900 + 600 + 600 = 2500 ✓
Operation 2 - product_wise_total_units_and_revenue¶
→ {"P101": (120, 1000), "P102": (30, 900), "P103": (120, 600)} ✓
Operation 3 - top_selling_product¶
Highest units, ties broken by higher revenue:
P101 and P103 both have 120 units. Tuple comparison: (120, 1000) > (120, 600) → P101 wins ✓
Operation 4 - average_product_price¶
- P101: 1000 / 120 = 8.333… → 8.33 ✓
- P102: 900 / 30 = 30.0 → 30.0 ✓
- P103: 600 / 120 = 5.0 → 5.0 ✓
Complete solution approaches¶
def analyse_sales_data(sales_data, task):
agg = {}
for r in sales_data:
pid = r["product_id"]
if pid not in agg:
agg[pid] = [0, 0]
agg[pid][0] += r["units_sold"]
agg[pid][1] += r["revenue"]
if task == "total_revenue":
return sum(r["revenue"] for r in sales_data)
if task == "product_wise_total_units_and_revenue":
return {pid: (v[0], v[1]) for pid, v in agg.items()}
if task == "top_selling_product":
return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))
if task == "average_product_price":
return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}
from collections import defaultdict
def analyse_sales_data(sales_data, task):
agg = defaultdict(lambda: [0, 0])
for r in sales_data:
agg[r["product_id"]][0] += r["units_sold"]
agg[r["product_id"]][1] += r["revenue"]
if task == "total_revenue":
return sum(r["revenue"] for r in sales_data)
if task == "product_wise_total_units_and_revenue":
return {pid: (v[0], v[1]) for pid, v in agg.items()}
if task == "top_selling_product":
return max(agg, key=lambda pid: (agg[pid][0], agg[pid][1]))
if task == "average_product_price":
return {pid: round(v[1] / v[0], 2) for pid, v in agg.items()}
def analyse_sales_data(sales_data, task):
if task == "total_revenue":
return sum(r["revenue"] for r in sales_data)
# Build aggregation for remaining tasks
agg = {}
for r in sales_data:
pid = r["product_id"]
if pid not in agg:
agg[pid] = {"units": 0, "revenue": 0}
agg[pid]["units"] += r["units_sold"]
agg[pid]["revenue"] += r["revenue"]
if task == "product_wise_total_units_and_revenue":
return {pid: (v["units"], v["revenue"]) for pid, v in agg.items()}
if task == "top_selling_product":
best = max(agg, key=lambda pid: (agg[pid]["units"], agg[pid]["revenue"]))
return best
if task == "average_product_price":
return {pid: round(v["revenue"] / v["units"], 2) for pid, v in agg.items()}
Key takeaways¶
Build the aggregation once - all tasks share it
Computing per-product totals is O(n) and needed by three of the four tasks. Build it once at the start and reuse - avoids scanning the data multiple times and keeps each task's logic minimal.
Tuple key for multi-criteria max
max(agg, key=lambda pid: (agg[pid][0], agg[pid][1])) - Python compares tuples lexicographically. Primary key is units; secondary key is revenue. Ties on units automatically fall through to revenue comparison.
average = total_revenue / total_units - not per-transaction average
The average price is computed across all transactions for the product, not the mean of per-transaction prices. For P101: `(400+600) / (50+70) = 8.33` - divide aggregated totals, not individual records.