Skip to main content

Stream Join On UserID

Problem

You’re given two input streams:

click_stream(timestamp, user_id, page)

purchase_stream(timestamp, user_id, item)

Join them such that for every click, you return the click and any purchases by the same user within the next 5 seconds.

Constraints

You must keep a rolling buffer of clicks and purchases

Each process(timestamp) step should check for joinable data and emit matching pairs

Expose:

add_click(ts, uid, page)

add_purchase(ts, uid, item)

process(ts) -> List[Tuple[click, purchase]]

Test Case 1

joiner.add_click(10, "u1", "home") joiner.add_purchase(12, "u1", "shoes") joiner.add_click(15, "u2", "checkout") joiner.add_purchase(18, "u1", "socks") print(joiner.process(20))

Should return:

[

(("u1", "home"), ("u1", "shoes")), # shoes is within 5s of home click

]

✈️ Bonus Setup for Offline Work

You can copy-paste this structure into your editor or notebook and implement each class with:

Internal data structures (min-heaps, deques, hashmaps)

Print debug output as needed (you won’t have a debugger in the air)