Detailed Design of Google Maps

In this lesson, we’ll discuss our detailed design by answering the following questions:

How do user requests benefit from segments?
How do we improve the user experience by increasing the accuracy of ETAs?

Segment setup and request handling#

This section will describe how the segment data is stored in the database and how user requests are handled using the already stored data.

Starting with the storage schema, we discuss how the segments are added and hosted on the servers and also how the user requests are processed.

Storage schema#

We store the following information for each segment:

Key-value store:

The segment’s ID.
The serverID on which the segment is hosted.
In reality, each segment is a polygon, so we store boundary coordinates (latitude/longitude), possibly as a list.
A list of segment IDs of the neighbors segments.

Graph database

The road network inside the segment in the form of a graph.

Relational DB

We store the information to determine whether, at a particular hour of the day, the roads are congested. This later helps us decide whether or not to update the graph (weights) based on the live data.

edgeID identifies the edge.
hourRange tells us which hour of the day it is when there are typical road conditions (non-rush hour) on the road.
rush is a Boolean value that depicts whether there is congestion or not on a specific road at a specific time.

Note:segID, serverID, and edgeID are unique IDs generated by a uniqueID generator (see the Sequencer lessons for details).

Add segment

Each segment has its latitude/longitude boundary coordinates and the graph of its road network.
The segment adder processes the request to add the segment along with the segment information. The segment adder assigns a unique ID to each segment using a unique ID generator system.
After assigning the ID to the segment, the segment adder forwards the segment information to the server allocator.
The server allocator assigns a server to the segment, hosts that segment graph on that server, and returns the serverID to the segment adder.
After the segment is assigned to the server, the segment adder stores the segment to server mapping in the key-value store. It helps in finding the appropriate servers to process user requests. It also stores each segment’s boundary latitude/longitude coordinates in a separate key-value object.

Handle the user’s request

The user provides the source and the destination so that our service can find the path between them.
The latitude and longitude of the source and the destination are determined through a distributed search.
The latitude/longitude for the source and the destination are passed to the graph processing service that finds the segments in which the source and the destination latitude/longitude lie.
After finding the segment IDs, the graph processing service finds the servers that are hosting these segments from the key-value store.
The graph processing service connects to the relevant servers to find the shortest path between the source and the destination. If the source and the destination belong to the same segment, the graph processing service returns the shortest path by running the query only on a single segment. Otherwise, it will connect the segments from different servers, as we have seen in the previous lesson.

Improve estimations using live data#

This section describes how we can improve the ETA estimation accuracy using live data. If we have a sequence of location data for different devices, we can find movement patterns and perform analytics to predict various factors that may influence a user’s trip. Google Maps uses a combination of GPS, Wi-Fi, and cell towers to track users’ locations. To collect this data, our maps system servers need to have a persistent connection with all the devices that have their location turned on. Below, we discuss the tools, techniques, and components involved in the process of improving estimations using live data.

WebSocket is a communication protocol that allows users and servers to have a two-way, interactive communication session. This helps in the real-time transfer of data between user and server.
The load balancer balances the connection load between different servers since there is a limit on the number of WebSocket connections per server. It connects some devices to server 1, some to server 2, and so on.

A pub-sub system collects the location data streams (device, time, location) from all servers. The location data from pub-sub is read by a data analytics engine like Apache Spark. The data analytics engine uses data science techniques—such as machine learning, clustering, and so on—to measure and predict traffic on the roads, identify gatherings, hotspots, events, find out new roads, and so on. These analytics help our system improve ETAs.

Note: The amount of traffic, road conditions, and hotspots directly affect average travel speed, which ultimately affects users’ ETAs.

The data analytics engine publishes the analytics data to a new pub-sub topic.
The map update service listens to the updates from the pub-sub topic for the analytics. It updates the segment graphs if there is a new road identified or if there is a change in the weight (average speed (traffic, road condition)) on the edges of the graph. Depending on the location, we know which segment the update belongs to. We find the routing server on which that segment is placed from the key-value store and update the graph on that server.

Segment setup and request handling#

Storage schema#

Design#

Improve estimations using live data#

Question