
1) 【一句话结论】
采用哈希表记录用户登录次数,结合时间戳队列维护7天滑动窗口,过滤登录行为后动态更新,最终按活跃次数降序输出Top 10用户。
2) 【原理/概念讲解】
老师会解释:这个问题本质是动态维护一个固定长度(7天)的时间窗口,并统计窗口内特定行为(登录)的次数。核心思路是结合时间窗口的滑动特性和哈希表的高效查询能力。具体来说:
3) 【对比与适用场景】
| 数据结构组合 | 定义 | 特性 | 使用场景 | 注意点 |
|---|---|---|---|---|
| 普通列表 + 哈希表 | 列表存储时间戳,哈希表存储用户状态 | 时间戳遍历需O(n),哈希表O(1) | 小数据量,简单场景 | 移除旧时间戳需遍历列表,效率低 |
| 哈希表 + 双端队列(时间戳队列) | 哈希表存储用户活跃次数,队列存储(用户ID,时间戳) | 队列O(1)入队/出队,哈希表O(1) | 大规模日志,高效维护时间窗口 | 需确保队列与哈希表同步,移除旧时间戳时同步更新用户状态 |
4) 【示例】
假设用户行为日志(时间戳为毫秒级,当前系统时间1702895200):
[
{"user_id": "u1", "action_type": "login", "timestamp": 1702895200},
{"user_id": "u2", "action_type": "click", "timestamp": 1702895200},
{"user_id": "u1", "action_type": "login", "timestamp": 1702902400},
{"user_id": "u3", "action_type": "login", "timestamp": 1702903000},
{"user_id": "u1", "action_type": "login", "timestamp": 1702909200},
{"user_id": "u2", "action_type": "login", "timestamp": 1702910800},
{"user_id": "u4", "action_type": "login", "timestamp": 1702916400},
{"user_id": "u1", "action_type": "login", "timestamp": 1702922600},
{"user_id": "u5", "action_type": "login", "timestamp": 1702928200},
{"user_id": "u2", "action_type": "login", "timestamp": 1702934800},
{"user_id": "u3", "action_type": "login", "timestamp": 1702935400},
{"user_id": "u1", "action_type": "login", "timestamp": 1702941000},
{"user_id": "u6", "action_type": "login", "timestamp": 1702941600},
{"user_id": "u2", "action_type": "login", "timestamp": 1702947600},
{"user_id": "u1", "action_type": "login", "timestamp": 1702954200},
{"user_id": "u3", "action_type": "login", "timestamp": 1702954800},
{"user_id": "u7", "action_type": "login", "timestamp": 1702961000},
{"user_id": "u1", "action_type": "login", "timestamp": 1702967600},
{"user_id": "u2", "action_type": "login", "timestamp": 1702974200},
{"user_id": "u4", "action_type": "login", "timestamp": 1702980800},
{"user_id": "u5", "action_type": "login", "timestamp": 1702987400},
{"user_id": "u1", "action_type": "login", "timestamp": 1702994000},
{"user_id": "u3", "action_type": "login", "timestamp": 1703000600},
{"user_id": "u6", "action_type": "login", "timestamp": 1703002200},
{"user_id": "u2", "action_type": "login", "timestamp": 1703008800},
{"user_id": "u1", "action_type": "login", "timestamp": 1703015400},
{"user_id": "u4", "action_type": "login", "timestamp": 1703022000},
{"user_id": "u7", "action_type": "login", "timestamp": 1703028600},
{"user_id": "u5", "action_type": "login", "timestamp": 1703035200},
{"user_id": "u1", "action_type": "login", "timestamp": 1703041800},
{"user_id": "u2", "action_type": "login", "timestamp": 1703042400},
{"user_id": "u3", "action_type": "login", "timestamp": 1703049000},
{"user_id": "u8", "action_type": "login", "timestamp": 1703055600}
]
处理流程伪代码:
action_type="login"的日志。curTime=1702895200,窗口长度7天=604800秒,队列q为空,哈希表userCount为空。diff=curTime - log.timestamp。若diff≤604800,则:
(log.user_id, log.timestamp)入队(队列按时间顺序排列)。userCount[user_id] +=1,否则初始化为1。userCount[user_id] -=1(因为记录已出窗口)。5) 【面试口播版答案】
好的,面试官,这个问题要高效统计最近7天登录次数并Top10。首先,只统计“登录”行为,因为活跃次数特指登录。然后,用哈希表记录用户活跃次数,用时间戳队列(存储用户ID和时间戳)维护7天滑动窗口。处理每条日志时,判断时间是否在窗口内(当前时间-日志时间≤7天),在的话更新哈希表(累加次数),并将(用户ID,时间戳)入队;队列长度超过7天就移除队头元素(最早的时间戳),同步更新哈希表(减少该用户活跃次数,因为记录已出窗口)。最后统计哈希表,按活跃次数降序取Top10。哈希表和队列操作都是O(1),适合大规模日志。
6) 【追问清单】
7) 【常见坑/雷区】