Breadth/depth-first traversal
Implementing breadth- and depth-first traversal and using them to fetch all comments from a reddit comment forest
PRAW is used to interact with the reddit api, a reddit account and a registered app is required for usage, to get started:
In [ ]:
!pip install -q praw
import praw
from collections import deque
client_id = "" #@param {type:"string"}
client_secret = "" #@param {type:"string"}
user_agent = "" #@param {type:"string"}
reddit = praw.Reddit(
client_id=client_id,
client_secret=client_secret,
user_agent=user_agent)
|████████████████████████████████| 153kB 2.6MB/s
|████████████████████████████████| 204kB 11.5MB/s
In [ ]:
def traverse_comments(comments, *, breadth_first=False):
queue = deque(comments[:])
result = []
while queue:
e = queue.pop()
if isinstance(e, praw.models.MoreComments):
if breadth_first:
queue.extendleft(e.comments())
else:
queue.extend(e.comments())
else:
if breadth_first:
queue.extendleft(e.replies)
else:
queue.extend(e.replies)
result.append(e.body)
return result
Supply a subreddit url
Preferably an archived one, so the comments will not change during operation
In [ ]:
submission_url = "https://www.reddit.com/r/aww/comments/fo6q11/his_favorite_place_is_his_bed/" #@param {type:"string"}
Our depth-first traversal
In [ ]:
%%time
comments_depthfirst = set(traverse_comments(
reddit.submission(url=submission_url).comments))
CPU times: user 610 ms, sys: 21.1 ms, total: 631 ms Wall time: 1min 10s
Our breadth-first traversal
In [ ]:
%%time
comments_breadthfirst = set(traverse_comments(
reddit.submission(url=submission_url).comments, breadth_first=True))
CPU times: user 588 ms, sys: 26.4 ms, total: 614 ms Wall time: 1min 1s
Built-in breadth-first traversal
In [ ]:
%%time
submission = reddit.submission(url=submission_url)
submission.comments.replace_more(limit=None)
comments_builtin = set([comment.body for comment in submission.comments.list()])
CPU times: user 666 ms, sys: 22.9 ms, total: 688 ms Wall time: 1min 7s
Result validation
In [ ]:
print("Results are equivalent:",
comments_depthfirst ==
comments_breadthfirst ==
comments_builtin)
Results are equivalent: True