Skip to main content

How Does It Work?

Overview

PolyScale.ai proxies and caches using native database wire protocols. It transparently connects to your current database and requires no code or infrastructure changes; just update your database origin host address.

Queries are inspected and reads (SQL SELECT and SHOW) are cached and served geographically close to the requesting origin for accelerated performance. All other traffic (INSERT, UPDATE and DELETE etc) seamlessly pass through to the origin database, as well as automatically invalidating the respective cached data globally (Smart Invalidation).

Deployment Options

PolyScale Serverless

Plug-and-play database caching with a global edge network.

PolyScale provides a SaaS global edge network for database caching. Connecting to PolyScale Serverless via a database client application will automatically route the request to the closest Point Of Presence (POP) where the query results are then cached for low latency access. Read more about our network and signup for a free account here.

PolyScale Self Hosted

Run PolyScale anywhere - in the cloud, on your laptop, or in your data center. PolyScale can be downloaded immediately and run with an unlimited usage 2 week free trial. Read more to get started with Self Hosted.

Protocol support

PolyScale can be accessed using:

Cached Data

PolyScale caches database responses i.e. the payloads returned from the origin database when a query is executed. These are stored locally to where the request originated.

Caching Protocol

AI-driven

PolyScale uses Artificial Intelligence to automate caching behavior. Caching operates at the level of unique SQL queries. Typically an application generates thousands of unique queries with complex temporal dynamics and inter-dependencies. Managing the caching of these queries requires automation, and optimizing it requires machine learning.

Automated Caching

PolyScale's machine learning algorithms identify caching opportunities by recognizing and remembering patterns in query traffic. The algorithms run continuously in the background and update on every query, rapidly adapting to changes in traffic as they arise. At any moment, the most up-to-date optimal cache configurations are applied. The algorithms manage caching at a scale far beyond the abilities of a human.

Time To First Hit (TTFH)

The time to first hit is determined by multiple attributes of the particular query being processed: the arrival rate, the change rate of the query result, the number of related queries seen, among other things. If the cache is automatically managed (default behavior) and the query being processed is new i.e. it has never been seen and no related queries have been seen by PolyScale previously, a cache hit will occur on or about the 3rd query. This will be reduced to as low as the 2nd query if the query being processed is similar to a previously processed query i.e. the same query structure with different parameters.

Manual Caching

Sometimes the developer knows best. It is always possible to override automated caching with manual caching rules. Time To Live (TTL) values can be configured for query templates that span queries with common structure or entire tables. This gives fine grained control over what is cached and for how long.

Read more about Cache Configuration here.

Cache Invalidation

PolyScale automatically detects data changes and removes stale data from the cache, globally.

Smart Invalidation

Smart Invalidation is PolyScale's automated cache invalidation system. It is designed to provide 100% accurate, global eventual consistency for cached data.

At a high level, Smart Invalidation automatically detects and calculates when cached data may be stale and removes it from the cache.

Smart Invalidation has several, out of the box advantages:

  • Performance - Smart Invalidation operates in real-time and invalidates the cache as soon as a change/mutation query is detected. This can quite often mean that the region where a change has been detected invalidates the cache, even before the change query has executed on the origin database.
  • Highly scalable - unlike other caching systems that require access to the Write Ahead Log (WAL) or similar, PolyScale inspects the wire protocol traffic to invalidate and hence an unlimited number of cache regions (PoP's) can be supported without degradation to the origin database.
  • 100% automatic - no code or configuration is required t be developed
  • Global eventual consistency - Smart Invalidation is designed to provide 100% accurate, global eventual consistency for cached data.

Smart Invalidation achieves this through a multi-layer approach. The first layer involves parsing the SQL queries at a row level. It analyzes which rows are read (in the case of read queries) and which rows are written to (in the case of write queries). This analysis is performed on fully anonymized representations of the queries to maintain data security and privacy. From these query details, targeted invalidations of cached data are performed.

The analysis performed by the first layer can sometimes be overwhelmed by complex queries. That is where the second layer of invalidation comes into play. When a query is deemed too complex to determine what specific cached data may have become invalidated, a fallback to a simple but effective table level invalidation occurs. In this case, all cached data coming from any affected tables is invalidated. This layer of invalidation errs on the side of caution, typically overreaching, but ensuring stale cache data is not served.

The first two layers of invalidation are very effective, however, they can be thwarted in some circumstances. For example, out of band data changes not visible to PolyScale, or tightly coupled (in time) reads and writes. To address this, a third layer of invalidation exists within the cache automation itself. The automated caching layer monitors for and detects unexplained data changes. If these events are detected, it disables caching on the relevant queries. This provides an empirical safety net on the predictive actions of the first two layers.

CDC Based Invalidation

For use cases where Smart Invalidation cannot be effective i.e. where out of band changes are occuring on the database, PolyScale can be configured to use Change Data Capture (CDC) to invalidate cached data. Read more about CDC invalidation here.

Storage Efficiency

The aforementioned layers invalidate entries because it is believed the underlying data has changed. Another layer of invalidation occurs for cache data that is believed to be superfluous. It is an empirical fact that most query results written to the cache will never be served as cache hits. For a cache that sees one million distinct queries in a day, perhaps only one hundred thousand those will have cache hits. Removing the entries from the cache that will not be needed is an important part of en efficient cache implementation. PolyScale uses automated machine learning algorithms (specifically, non-homogeneous Poisson process modeling) to eliminate the build up of never-to-be-used data.

PolyScale Smart Invalidation provides global eventual consistency through a multi-layered approach that resolves in a few tens of milliseconds.

Multi-cache Invalidations

Smart Invalidations can be configured to trigger across two or more caches. Simply put, any invalidations to the data on cache A are repeated for cache B and vice versa.

This is useful for use cases where multiple cache configurations are required that are connecting to the same database. For example, an application may choose to configure one cache to have an admin interface, or schema migrations for writes with uncached reads, and the other is an application channel with just cached reads, but require changes on either to invalidate each other.

Each workspace supports a single invalidation group and any cache within a workspace can be part of the invalidation group.

Multi-cache invalidation can be configured within the user interface within the Smart Invalidation section for each cache.

Cache Revalidation

tip

This feature is in production for Postgres databases. It is currently in development for other supported databases.

Cache Revalidation increases the cache hit rate and puts an upper bound on time to eventual consistency. In order to maximize hit rates and minimize serving potentially stale data, PolyScale will periodically pass through a "hit" query to the origin database. While a response is served immediately from cache, in the background, PolyScale receives the response from the database and updates the cache.

In the case where the response payload has not changed, the current TTL clock will be restarted, effectively lengthing the TTL and increasing the cache hit rate.

In the case where the response payload has changed, a subsquent request (within the TTL window) will be served from the now updated cache, and the future TTL will be downwardly adjusted to account for the unexplained change in the payload. This feature ensures that even if PolyScale is not seeing the underlying updates to your database that the amount of stale data served is limited.

In short, Cache Revaldation increases the cache hit rate and puts an upper bound on time to eventual consistency for use cases where PolyScale does not have access to invalidation data. Cache Revalidation also benefits users who are interested in maximizing hit rates and unconcerned about potentially serving a limited amount of stale data -- this can be done by turning off Smart Invalidation and relying on Cache Revalidation to catch changes in the query response.

API Based Invalidation

Cached files can be globally purged programmatically by calling PUT /v1/caches/{cacheId}/purge. See the API documentation for further details.

Traffic Shaping

PolyScale will perform various optimizations around connectivity and query behavior to reduce overall latency and increase performance. For example, eliding repetitive USE <database> or SET <setting> commands.

Database and Cache Permissions

A PolyScale cache identifies a single server i.e. a server host. PolyScale guarantees a unique cache based on a combination of both the database name and database user. This means that database and user level cache uniqueness is guaranteed.

Connection Pooling

PolyScale provides inbuilt connection pooling to help scale TCP connections for massive concurrent and/or ephemeral workloads. Read more about connection pooling here.

Example Architectures

Scale a Single Database/Region

PolyScale can be introduced to a single region to scale database reads and lower resource requirements on the origin database. Once cached, PolyScale can execute any database read query in single digit milliseconds, while also reducing performance variability.

Single region ArchitectureExample single region architectures: before and after PolyScal enabled.

Scale Multiple Regions

PolyScale allows a single database to be scaled to multiple regions. Typically the application tier is scaled i.e. deployed to additional cloud or edge regions.

Multi-region ArchitectureMulti-region Example Architecture

Performance Considerations

The following diagram decomposes the path a typical query traverses before and with PolyScale. On the left, before PolyScale, a query simply travelled from the App-Tier (the application server, micro-service, etc) to the database. The database took time to process the query and the result was returned. On the right, using PolyScale, a query first travels to the PolyScale PoP, if the query result is present in the cache, the result is returned immediately to the App-Tier. If the result is not present in the cache, the query must travel to the database where the query is processed and the result is returned to PolyScale and then returned to the App-Tier.

Latency TriangleQuery paths before and with PolyScale.

Accounting for the time taken for each leg of these paths helps to determine how much PolyScale will improve overall performance. In the diagram the times taken for each leg are denoted:

t0App Tier to Database latencys1App Tier to PolyScale latencys2PolyScale to Database latencyδQuery execution timeϵCache retrieval time\begin{array}{ll} t_0 & \text{App Tier to Database latency} \\ s_1 & \text{App Tier to PolyScale latency} \\ s_2 & \text{PolyScale to Database latency} \\ \delta & \text{Query execution time} \\ \epsilon & \text{Cache retrieval time} \end{array}

With these symbols in hand, the overall query processing times before and with PolyScale are:

Before=t0+δWith PolyScale=s1+(1h)(s2+δ)\begin{array}{ll} \text{Before} &= t_0 + \delta \\ \text{With PolyScale} &= s_1 + (1-h) (s_2 + \delta) \end{array}

where hh is the cache hit rate. Here it is understood that the network hops (t0t_0, s1s_1, and s2s_2) are round trip times. It is also worth noting that the cache retrieval time, ϵ\epsilon, is effectively zero compared to all the other times and is ignored. In the diagram the dashed lines are suggestive of the fact that not all queries will need to make that trip to the database, only the cache misses will incur that cost (hence the (1h)(1-h) factor in the total time with PolyScale).

These formulae can be simplified for some specific situations. For example, assume the App Tier and the PolyScale PoP exist within the same data center, and the database is located in another region. In that scenario s10s_1 \approx 0 and s2t0s_2 \approx t_0, which results in total times:

Before=t0+δWith PolyScale=(1h)(t0+δ)\begin{array}{ll} \text{Before} &= t_0 + \delta \\ \text{With PolyScale} &= (1-h) (t_0 + \delta) \end{array}

latency is reduced in direct proportion to the hit rate.

Another possible scenario would be if the App Tier and database exist within the same datacenter, but the PolyScale PoP was (relatively) distant. In that case t00t_0 \approx 0 and s1s2=ss_1 \approx s_2 = s. If we further assume a modest hit rate of 12\frac{1}{2}, the total times become:

Before=δWith PolyScale=32s+12δ\begin{array}{ll} \text{Before} &= \delta \\ \text{With PolyScale} &= \frac{3}{2} s + \frac{1}{2} \delta \end{array}

If the query execution times are large relative to the latency ss, significant latency savings are still achievable.

In general, in order to yield the lowest latency, a PolyScale PoP should be as geographically close as possible to the application server(s), or any other database client. This can be seen in the formulae above, getting s10s_1 \approx 0 maximizes the impact of caching on performance. Geometry dictates that in that case s2t0s_2 \approx t_0 and the gains from using PolyScale become proportional to the hit rates achieved.