# How Does It Work?

## Overview​

PolyScale.ai proxies and caches using native database wire protocols. It transparently connects to your current database and requires no code or infrastructure changes; just update your database origin host address.

Queries are inspected and reads (SQL SELECT and SHOW) are cached and served geographically close to the requesting origin for accelerated performance. All other traffic (INSERT, UPDATE and DELETE etc) seamlessly pass through to the origin database, as well as automatically invalidating the respective cached data globally (Smart Invalidation).

## Global Edge Network​

PolyScale provides a global edge network for database caching. Connecting to PolyScale via a database client application will automatically route the request to the closest Point Of Presence (POP) where the query results are then cached for low latency access. Read more about our network here.

## Cached Data​

PolyScale caches database responses i.e. the payloads returned from the origin database when a query is executed. These are stored locally to where the request originated.

## Caching Protocol​

### AI-driven​

PolyScale uses Artificial Intelligence to automate caching behavior. Caching operates at the level of unique SQL queries. Typically an application generates thousands of unique queries with complex temporal dynamics and inter-dependencies. Managing the caching of these queries requires automation, and optimizing it requires machine learning.

### Automated Caching​

PolyScale's machine learning algorithms identify caching opportunities by recognizing and remembering patterns in query traffic. The algorithms run continuously in the background and update on every query, rapidly adapting to changes in traffic as they arise. At any moment, the most up-to-date optimal cache configurations are applied. The algorithms manage caching at a scale far beyond the abilities of a human.

#### Time To First Hit (TTFH)​

The time to first hit is determined by multiple attributes of the particular query being processed: the arrival rate, the change rate of the query result, the number of related queries seen, among other things. If the cache is automatically managed (default behavior) and the query being processed is new i.e. it has never been seen and no related queries have been seen by PolyScale previously, a cache hit will occur on or about the 3rd query. This will be reduced to as low as the 2nd query if the query being processed is similar to a previously processed query i.e. the same query structure with different parameters.

### Manual Caching​

Sometimes the developer knows best. It is always possible to override automated caching with manual caching rules. Time To Live (TTL) values can be configured for query templates that span queries with common structure or entire tables. This gives fine grained control over what is cached and for how long.

### Smart Invalidation​

PolyScale automatically detects data changes and removes stale data from the cache, globally.

The automation deployed to invalidate stale cache data aims for 100% accurate eventual consistency. It achieves this through a multi-layer approach. The first layer involves parsing the SQL queries at a row level. It analyzes which rows are read (in the case of read queries) and which rows are written to (in the case of write queries). This analysis is performed on fully anonymized representations of the queries to maintain data security and privacy. From these query details targeted invalidations of cached data are performed.

The analysis performed by the first layer can sometimes be overwhelmed by complex queries. That is where the second layer of invalidation comes into play. When a query is deemed too complex to determine what specific cached data may have become invalidated, a fallback to a simple but effective table level invalidation occurs. In this case, all cached data coming from any affected tables is invalidated. This layer of invalidation errs on the side of caution, typically overreaching, but ensuring stale cache data is not served.

The first two layers of invalidation are very effective, however, they can be thwarted in some circumstances. For example, out of band data changes not visible to PolyScale, or tightly coupled (in time) reads and writes. To address this, a third layer of invalidation exists within the cache automation itself. The automated caching layer monitors for and detects unexplained data changes. If these events are detected, it disables caching on the relevant queries. This provides an empirical safety net on the predictive actions of the first two layers.

PolyScale Smart Invalidation provides global eventual consistency through a multi-layered approach that resolves in a few tens of milliseconds.

### API Based Invalidation​

Cached files can be globally purged programmatically by calling PUT /v1/caches/{cacheId}/purge. See the API documentation for further details.

### Traffic Shaping​

PolyScale will perform various optimizations around connectivity and query behavior to reduce overall latency and increase performance. For example, eliding repetitive USE <database> or SET <setting> commands.

### Database and Cache Permissions​

A PolyScale cache identifies a single server i.e. a server host. PolyScale guarantees a unique cache based on a combination of both the database name and database user. This means that database and user level cache uniqueness is guaranteed.

## Connection Pooling​

PolyScale provides inbuilt connection pooling to help scale TCP connections for massive concurrent and/or ephemeral workloads. Read more about connection pooling here.

## Example Architectures​

### Scale a Single Database/Region​

PolyScale can be introduced to a single region to scale database reads and lower resource requirements on the origin database. Once cached, PolyScale can execute any database read query in single digit milliseconds, while also reducing performance variability.

Example single region architectures: before and after PolyScal enabled.

### Scale Multiple Regions​

PolyScale allows a single database to be scaled to multiple regions. Typically the application tier is scaled i.e. deployed to additional cloud or edge regions.

Multi-region Example Architecture

## Performance Considerations​

The following diagram decomposes the path a typical query traverses before and with PolyScale. On the left, before PolyScale, a query simply travelled from the App-Tier (the application server, micro-service, etc) to the database. The database took time to process the query and the result was returned. On the right, using PolyScale, a query first travels to the PolyScale PoP, if the query result is present in the cache, the result is returned immediately to the App-Tier. If the result is not present in the cache, the query must travel to the database where the query is processed and the result is returned to PolyScale and then returned to the App-Tier.

Query paths before and with PolyScale.

Accounting for the time taken for each leg of these paths helps to determine how much PolyScale will improve overall performance. In the diagram the times taken for each leg are denoted:

$\begin{array}{ll} t_0 & \text{App Tier to Database latency} \\ s_1 & \text{App Tier to PolyScale latency} \\ s_2 & \text{PolyScale to Database latency} \\ \delta & \text{Query execution time} \\ \epsilon & \text{Cache retrieval time} \end{array}$

With these symbols in hand, the overall query processing times before and with PolyScale are:

$\begin{array}{ll} \text{Before} &= t_0 + \delta \\ \text{With PolyScale} &= s_1 + (1-h) (s_2 + \delta) \end{array}$

where $h$ is the cache hit rate. Here it is understood that the network hops ($t_0$, $s_1$, and $s_2$) are round trip times. It is also worth noting that the cache retrieval time, $\epsilon$, is effectively zero compared to all the other times and is ignored. In the diagram the dashed lines are suggestive of the fact that not all queries will need to make that trip to the database, only the cache misses will incur that cost (hence the $(1-h)$ factor in the total time with PolyScale).

These formulae can be simplified for some specific situations. For example, assume the App Tier and the PolyScale PoP exist within the same data center, and the database is located in another region. In that scenario $s_1 \approx 0$ and $s_2 \approx t_0$, which results in total times:

$\begin{array}{ll} \text{Before} &= t_0 + \delta \\ \text{With PolyScale} &= (1-h) (t_0 + \delta) \end{array}$

latency is reduced in direct proportion to the hit rate.

Another possible scenario would be if the App Tier and database exist within the same datacenter, but the PolyScale PoP was (relatively) distant. In that case $t_0 \approx 0$ and $s_1 \approx s_2 = s$. If we further assume a modest hit rate of $\frac{1}{2}$, the total times become:

$\begin{array}{ll} \text{Before} &= \delta \\ \text{With PolyScale} &= \frac{3}{2} s + \frac{1}{2} \delta \end{array}$

If the query execution times are large relative to the latency $s$, significant latency savings are still achievable.

In general, in order to yield the lowest latency, a PolyScale PoP should be as geographically close as possible to the application server(s), or any other database client. This can be seen in the formulae above, getting $s_1 \approx 0$ maximizes the impact of caching on performance. Geometry dictates that in that case $s_2 \approx t_0$ and the gains from using PolyScale become proportional to the hit rates achieved.