High-level Design of Twitter
Understand the high-level design of the Twitter service.
User-system interaction#
Let’s begin with the high-level design of our Twitter system. We’ll initially highlight and discuss the building blocks, as well as other components, in the context of the Twitter problem briefly. Later on, we’ll dive deep into a few components in this chapter.
-
Users post Tweets delivered to the server through the load balancer. Then, the system stores it in persistent storage.
-
DNS provides the specified IP address to the end user to start communication with the requested service.
-
CDN is situated near the users to provide requested data with low latency. When users search for a specified term or tag, the system first searches in the CDN proxy servers containing the most frequently requested content.
-
Load balancer chooses the operational application server based on traffic load on the available servers and the user requests.
-
Storage system represents the various types of storage (SQL-based and NoSQL-based) in the above illustration. We’ll discuss significant storage systems later in this chapter.
-
Application servers provide various services and have business logic to orchestrate between different components to meet our functional requirements.
We have detailed chapters on DNS, CDN, specified storage systems (Databases, Key-value store, Blob store), and Load balancers in our building blocks section. We’ll focus on further details specific to the Twitter service in the coming lessons. Let’s first understand the service API.
API design#
This section will focus on designing various APIs regarding the functionalities we are providing. We learn how users request various services through APIs. We’ll only concentrate on the significant parameters of the APIs that are relevant to our design. Although the front-end server can call another API or add more parameters in the API received from the end users to fulfill the given request, we consider all relevant arguments specified for the particular request in a single API. Let’s develop APIs for each of the following features:
- Post Tweet
- Like or dislike Tweet
- Reply to Tweet
- Search Tweet
- View user or home timeline
- Follow or unfollow the account
- Retweet a Tweet
Post Tweet#
The POST method is used to send the Tweet to the server from the user through the /postTweet
API.
postTweet(user_id, access_type, tweet_type, content, tweet_length, media_field, post_time, tweet_location, list_of_used_hashtags, list_of_tagged_people)
Let’s discuss a few of the parameters:
Parameter | Description |
| It indicates the unique ID of the user who posted the Tweet. |
| It tells us whether the Tweet is protected (that is, only visible to followers) or public. |
| It indicates whether the Tweet is text-based, video-clip based, image(s)-based, or consisting of different types. |
| It specifies the Tweet’s actual content (text). |
| It represents the text length in the Tweet. In the case of video, it tells us the duration and size of a video. |
| It specifies the type of media (image, video, GIF, and so on) delivered in each Tweet. |
The rest of the parameters are self-explanatory.
Note: Twitter uses the Snowflake service to generate unique IDs for Tweets. We have a detailed chapter (Sequencer) that explains this service.
Points to Ponder
Question 2
What is the need for the list_of_tagged_people
in the /postTweet
API?
The system has to notify the people tagged in the Tweet.
2 of 2
Like or dislike Tweet#
The /likeTweet
API is used when users like public Tweets.
likeTweet(user_id, tweet_id, tweeted_user_id, user_location)
Parameter | Description |
| It indicates the unique ID of the user who liked the Tweet. |
| It indicates the Tweet's unique ID. |
| This is the unique ID of the user who posted the Tweet. |
| It denotes the location of the user who liked the Tweet. |
The parameters above are also used in the /dislikeTweet
API when users dislike others’ Tweets.
Reply to Tweet#
The /replyTweet
API is used when users reply to public Tweets.
replyTweet(user_id, tweet_id, tweeted_user_id, reply_type, reply_length)
The reply_type
and reply_length
parameters are the same as tweet_type
and tweet_length
respectively.
Search Tweet#
When the user searches any keyword in the home timeline, the GET method is used. The following is the /searchTweet
API:
searchTweet(user_id, search_term, max_result, exclude, media_field, expansions, sort_order, next_token, user_location)
Some new parameters introduced in this case are:
Parameter | Description |
| It is a string containing the search keyword or phrase. |
| It is the number of Tweets returned per response page. By default, the Tweets per response is 10. |
| It specifies what to exclude from the returned Tweets, that is, replies and retweets. The maximum limit on returned Tweets is 3200, but when we exclude replies, the maximum limit is reduced to 800 Tweets. |
| It specifies the media (image, video, GIF) delivered in each returned Tweet. |
| It enables us to request additional data objects in the returned Tweets, such as any mentioned user, referenced Tweet, attached media, attached places objects, and so on. |
| It specifies the order in which Tweets are returned. By default, it will return the most recent Tweets first. |
| It is used to get the next page of results. For instance, if |
Response#
Let’s look at a sample response in JSON format. The id is the user’s unique ID who posted the Tweet and the text is the Tweet’s content. The result_count is the count of the returned Tweet, which we set in the max_result
in the request. Here, we’re displaying the default fields only.
{
"data": [
{
"id": "7300333948034441183",
"text": "This is a most matched Tweet"
},
{
"id": "6498431343154916376",
"text": "This is next to most matched Tweet"
},
:
:
:
{
"id": "6427456107642019844",
"text": "This is the last matched Tweet in the current page of the result.
}
],
"meta": {
"newest_id": "7300333948034441183",
"oldest_id": "6427456107642019844",
"result_count": 10
}
}
Note: Twitter performs various types of searches. The following are two of them:
- One search type returns the result of the last seven days, which all registered users usually use.
- The other type returns all matching results on all Tweets ever posted (remind that service does not delete a posted Tweet). Indeed, matches can contain the first Tweet on Twitter. This search is usually used for academic research.
View home_timeline#
The GET method is suitable when users view their home timelines through the /viewHome_timeline
API.
viewHome_timeline(user_id, tweets_count, max_result, exclude, next_token, user_location)
In the /viewUser_timeline
API, we’ll exclude the user_location
to get the user timeline.
The max_rsult
parameter determines the number of tweets a client application can show the user. The server sends the max_result
number of tweets in each response. Further, the server will also send a paginated list_of_followers
to reduce the client latency.
Point to Ponder
Question
Which parameter in the viewHome_timeline
method is the most relevant when deciding which promoted ads (Tweets) to be returned in response?
Follow the account#
The /followAccount
API is used when users follow someone’s account on Twitter.
followAccount(account_id, followed_account_id)
Parameter | Description |
| It specifies the unique ID of a user who follows that account on Twitter. |
| It indicates the unique ID of the account that the user follows. |
The /unfollowAccount
API will use the same parameters when a user unfollows someone’s account on Twitter.
Retweet a Tweet#
When a registered user Retweets (re-posts) someone’s Tweet on Twitter, the following /retweet
API is called:
retweet(user_id, tweet_id, retweet_user_id)
The same parameters will be required in the /undoRetweet
API when users undo a Retweet of someone’s Tweet.