Initial Design of Quora
Transform the Quora requirements into a high-level design.
Initial design#
The initial design of Quora will be composed of the following building blocks and components:
- Web and application servers: A typical Quora page is generated by various services. The web and application servers maintain various processes to generate a webpage. The web servers have manager processes and the application servers have worker processes for handling various requests. The manager processes distribute work among the worker processes using a router library. The router library is enqueued with tasks by the manager processes and dequeued by worker processes. Each application server maintains several in-memory queues to handle different user requests. The following illustration provides an abstract view of web and application servers:
- Data stores: Different types of data require storage in different data stores. We can use critical data like questions, answers, comments, and upvotes/downvotes in a relational database like MySQL because it offers a higher degree of consistency. NoSQL databases like HBase can be used to store the number of views of a page, scores used to rank answers, and the extracted features from data to be used for recommendations later on. Because recomputing features is an expensive operation, HBase can be a good option to store and retrieve data at high bandwidth. We require high read/write throughput because big data processing systems use high parallelism to efficiently get the required statistics. Also, blob storage is required to store videos and images posted in questions and answers.
Quora was founded in 2009, whereas HBase was developed in 2008 by Apache. Because it is open-source and modeled after Google’s BigTable, it is suitable for storing a large amount of small-sized data. Furthermore, it has high read/write throughput. Therefore, it was a natural choice for Quora to use in its inception.
-
Distributed cache: For performance improvement, two distributed cache systems are used: Memcached and Redis. Memcached is primarily used to store frequently accessed critical data that is otherwise stored in MySQL. On the other hand, Redis is mainly used to store an online view counter of answers because it allows in-store increments. Therefore, two cache systems are employed according to their use case. Apart from these two, CDNs serve frequently accessed videos and images.
-
Compute servers: A set of compute servers are required to facilitate features like recommendations and ranking based on a set of attributes. These features can be computed in online or offline mode. The compute servers use machine learning (ML) technology to provide effective recommendations. Naturally, these compute servers have a substantially high amount of RAM and processing power.
Of course, other basic building blocks like load balancers, monitoring services, and rate limiters will also be part of the design. A high-level design is provided below:
Workflow#
The design of Quora is complex because we have a large number of functional and non-functional requirements. Therefore, we’ll explain the workflow on the basis of each feature:
- Posting question, answers, comments: The web servers receive user requests through the load balancer and direct them to the application servers. Meanwhile, the web servers generate part of the web page and let the worker process in the application servers do the rest of the page generation. The questions and answers data is stored in a MySQL database, whereas any videos and images are stored in the blob storage. A similar approach is used to post comments and upvote or downvote answers. Task prioritization is performed by employing different queues for different tasks. We perform prioritization because certain tasks require immediate attention—for example, fetching data from the database for a user request—while others are not so urgent—for example, sending a weekly email digest. The worker processes will perform tasks by fetching from these queues.
Points to Ponder
Question 2
How is it possible for Quora to send notifications to users through the above-described mechanism?
Since tasks are added to different priority queues, it is possible to maintain a queue for notifications. While higher priority tasks are enqueued to high-priority queues, a notification task will be added to a medium or lower priority queue. The illustration below visualizes the described concept:
2 of 2
-
Answer ranking system: Answers to questions can be sorted based on date. Although it is convenient to develop a ranking system on the basis of date (using time stamps), users prefer to see the most appropriate answer at the top. Therefore, Quora uses ML to rank answers. Different features are extracted over time and stored in the HBase for each type of question. These features are forwarded to the ML engine to rank the most useful answer at the top. We cannot use the number of upvotes as the only metric for ranking answers because a good number of answers can be jokes—and such answers also get a lot of upvotes. It is good to implement the ranking system offline because good answers get upvotes and views over time. Also, the offline mode poses a lesser burden on the infrastructure. Implementing the ranking system offline and the need for special ML hardware makes it suitable to use some public cloud elastic services.
-
Recommendation system: The recommendation system is responsible for several features. For example, we might need to develop a user feed, find related questions and ads, recommend questions to potential respondents, and even highlight duplicate content and content in violation of the service’s terms of use. Unlike the answer ranking system, the recommendation system must provide both online and offline services. This system receives requests from the application server and forwards selected features to the ML engine.
-
Search feature: Over time, as questions and answers are fed to the Quora system, it is possible to build an index in the HBase. User search queries are matched against the index, and related content is suggested to the user. Frequently accessed indexes can be served from cache for low latency. The index can be constructed from questions, answers, topics labels, and usernames. Tokenization of the search index returns the same results for reordered words also (see Scaling Search and Indexing in Distributed Search chapter for more details).
- Quora is mostly built on Amazon Web Services (AWS).
- Initially, Quora used Amazon’s EC2 instances as their application servers used eight cores with an 8 MB cache.
- Quora used the search server called Sphinx. Later, because of its slow performance, Quora custom-built its search engine using Thrift and Python Unicode libraries only.
- The full-text search feature was launched by Quora in 2013. Before that, it was possible to get results for individual words only.
API design#
We’ll design the API calls for Quora in this section. We’ll define APIs for the following features only:
- Post a question
- Post an answer
- Upvote or downvote a question or answer
- Comment on an answer
- Search
Note: We don’t consider APIs for a recommendation system or ranking because they are not placed as an explicit request by the user. Instead, the web server coordinates with other components to ensure the service.
Post a question#
The POST method of HTTP is used to call the /postQuestion
API:
postQuestion(user_id, question, description, topic_label, video, image)
Let’s understand each parameter of the API call:
Parameter | Description |
| This is the unique identification of the user that posts the question. |
| This is the text of the question posed by the user. |
| This is the description of a question. This is an optional field. |
| This represents a list of domains to which the user’s question is related. |
| This is a video file embedded in a user question. |
| This is an image that is a part of a user question. |
The video and image parameters can be NULL
if no image or video is embedded within the question. Otherwise, it is uploaded as part of the question.
Post an answer#
For posting an answer, the POST method is a suitable choice for /postAnswer
API:
postAnswer(user_id, question_id, answer_text, video, image)
Parameter | Description |
| This refers to the question the answer is posted against. |
| This is the textual answer posted by the responder. |
The rest of the parameters are self-explanatory.
Upvote an answer#
The /upvote
API is below:
upvote(user_id, question_id, answer_id)
Parameter | Description |
| This represents the user upvoting the answer. |
| This represents the identity of the answer that is upvoted for a particular question, which is identified by the |
Note: The downvote API is the same as the upvote API because both are similar functionalities.
Comment on an answer#
The /comment
API has the following structure:
comment(user_id, answer_id, comment_text)
Parameter | Description |
| It represents the user commenting on the answer. |
| It represents the text a user posts against an answer identified by the |
Parameter | Description |
| This is the |
| This is the search query entered by a user. |
We use a sequencer to generate the different IDs mentioned in the API calls.
Point to Ponder
Question
Why is there a custom routing layer between the web and application servers instead of a load-balancing layer?