Topic Trends
Initial Problem Statement
You’re building a system that tracks trending topics. Each topic has a name and a numeric score. You must design a structure that supports these operations:
add(topic: str, score: int)- If topic doesn't exist, insert it with the given score
- If it already exists, increase it's score by the given amount
top(k: int)- Return top k topics with the highest score, in descending order
- If scores tie, return topics in lexicographical order
reset(topic: str)- Reset topics score to 0
- Topic exists, but shouldn't rank above any topics with positive score
You may assume:
- There can be up to operations
- Topic names are strings
top(k)may be called frequentlyaddandresetmay interleave arbitrarily with top
Example
Constraints
Initial Thoughts, Questions
- If we use priority queue, then
addis for insert, and worst case for update- If the score we need to update is at the "bottom", i.e. 0, we need to pop off everything before it into
helper_queue, , update the score, and then add all of that back into base priority queuebase_queue - After popping off each insertion into
helper_queue, which is work at worst, inserting each of the back intobase_queuetakes effort
- If the score we need to update is at the "bottom", i.e. 0, we need to pop off everything before it into
topwould be- We pop off terms into
helper_queue, inserting them back in is push of effort
- We pop off terms into
resetwould be similar to updating add as we need to go find the topic, update it's score, and then push everything back tobase_queue
- We can use a combination of priority queue and hash table here
- For each entry, we keep track of it's current score in hash table
curr_score: Dict[str, int] - Adding a net new topic means adding it to
curr_scoreand pushing ontobase_queuewhich would be time - Updating a topic would become
- updating
curr_score - pushing the updated topic into
base_queue - Means we need to push in
(score, topic_name)tuple intobase_queue- We may need to also hold onto some sort of version for each score
(score, topic_name, version) - Hold onto curr version in
curr_score: Dict[str, (int, int)]so that for each new update via add, we increment this version as well - this is all needed to move past items intop k, i.e. drain queue and invalidate, for top k calls
- We may need to also hold onto some sort of version for each score
- updating
- For each
topcall, we then need to continuously pop off ofbase_queueand check each item againstcurr_scoredict to check if it's the latest score or not- Time complexity here is where is number of items we pop off including invalid data points, checking against
curr_scoreis - After draining, we do need to hold onto valid entries in
helper_queueand push them back on for similar to previous time
- Time complexity here is where is number of items we pop off including invalid data points, checking against
resetis then just a specific type of update where we increment version and update score to 0
- For each entry, we keep track of it's current score in hash table
- Given that
topmay be called frequently, there's really no difference between pure priority queue and priority queue + hash map, but storing all of the operations and current state would certainly have some space effect- Worst case we'd store all operations in
base_queueuntil it's drained, buttopis called frequently so it'd drain frequently addandresetare intermingled withtopso it'd be ideal to not have those block nearly as much
- Worst case we'd store all operations in
Implementation