Channable

Tech

Postgres performance lessons - part 2: write amplification, B(ad)-Trees, and batching

September 26, 2024

This is the second part of the Postgres performance lessons we learned building a batch image editor. You can read part 1 here:

In this edition we'll continue our in-depth investigation of database related performance bottlenecks. At the end we'll share a list of bite sized takeaways for engineering and maintaining performant Postgres applications in a production environment - including the knowledge from part 1.

1. B(ad)-Trees and write amplification

Previously we explored why using completely random keys can be detrimental to performance. We resolved that issue by using sequential integers instead, but that was not the end of the story.

Consider the ramifications of changing the primary key of our image_links from a hash to a generated integer, we lost uniqueness! By guaranteeing that the SHA256 of an image must be unique, we had a defacto uniqueness guarantee on the image link itself (SHA256 collisions are improbable enough for our domain). This is obviously a fairly important property to maintain, otherwise we risk storing a lot of redundant data - and worse, performing a lot of superfluous downloads. As a solution we added a unique index on the image_link column:

CREATE UNIQUE INDEX unique_image_link ON image_links (image_link);

Whilst this solves one problem, it introduces another. Consider how much storage space we sacrificed for this index:

db => SELECT pg_size_pretty(pg_relation_size('unique_image_link'));
 pg_size_pretty
----------------
 1841 MB
(1 row)

Putting that number into perspective, the storage cost of the ~13M rows in the image_link table itself is currently:

db => SELECT pg_size_pretty(pg_relation_size('image_links'));
 pg_size_pretty
----------------
 2336 MB
(1 row)

Ouch! We've just nearly doubled our storage requirement in order to maintain uniqueness.

But that's not the only problem. We have yet to consider a phenomenon known as write amplification. As a result of Postgres' Multi Version Concurrency Control (MVCC), every update on the table also requires an update on every index on that table. This becomes clear from the fact that an UPDATE is more closely related to a DELETE + INSERT, and an insert in the table also means an insert in every index on that table. In other words, every index you add increases the cost of writes. Luckily, Postgres does have an optimization that can circumvent this overhead in some cases called Heap Only Tuples (or HOT updates). Unfortunately, this optimization does not apply when updating a column that is covered by an index, which is exactly what are doing when updating download_after or referenced_at. Therefore, there is no way to avoid paying the cost of write amplification. But how much are paying exactly?

To experimentally quantify the cost, we can measure the time spent updating download_after on 1% of our 13M image links. Note that we ran the query multiple times before taking a measurement to eliminate any disadvantage resulting from a cold cache. First without the proposed unique index added:

db => \timing
Timing is on.
db => UPDATE image_links SET download_after = NOW() WHERE id % 100 = 0;
UPDATE 132071
Time: 1755,119 ms (00:01,755)

But the total execution time does not adequately measure what we want to know. There are effectively 2 steps to this query: first finding the records that match id % 100 == 0 and then updating them. Finding the records based on the id is a task that will not be affected by an index on image_link and its cost is irrelevant to the experiment. Let's analyze the query and decompose the plan to discover how much time is truly spent on updating alone.

db => EXPLAIN (analyse, buffers, costs, verbose) UPDATE image_links SET download_after = NOW() WHERE id % 100 = 0;`

update without any index
EXPLAIN ANALYZE result - visualization showing the decomposition of the cost of finding and updating the records before adding the proposed unique index on image_link.

From this we learn that the actual baseline number we are looking for is just the cost of updating, which is 898ms in this case. Now we can create the unique index and repeat the experiment:

db => CREATE UNIQUE INDEX unique_image_link ON image_links (image_link);

db => EXPLAIN (analyse, buffers, costs, verbose) UPDATE image_links SET download_after = NOW() WHERE id % 100 = 0;`

update bad index
EXPLAIN ANALYZE result - visualization showing the cost of updating the records after adding a unique index on image_link. From this we learn that the new index has significantly increased the cost of updating.

So not only did we nearly double our storage requirements, we increased the cost of updating fivefold. So just to re-iterate how surprising this sounds: we created an index on the column image_link which increased the cost of updating the value of download_after by a factor 5!

But how can we do better? After all, we really need the uniqueness of image links. One possible reason for the slowdown is that the B-Tree (the default index type) is very unbalanced because almost all values share a prefix like https://www. But another issue that we already confirmed is that simply more memory needs to be read and written. We measured these numbers when analyzing the query, and they can be found under IO & Buffers tab in the update node in the query analyzes above. Let's lay out the values in a table to compare:

ReadWritten
Before0.990GB0.750GB
After2.7GB1.8GB

Comparing the values side by side reveals that a lot more memory had be read and written when the index was present. It goes without saying that touching more memory usually takes more time. So perhaps our first focus should be to reduce the amount of memory the update has to touch by reducing the size of the index.

One way to achieve that is by storing a smaller but still unique digest of the image link in the index - such as the hash. Image links are not particularly high entropy, not only because of the repeated https://www., but also because 1 client likely imports a large batch of image links with at most a handful of unique hostnames. Therefore, although we previously decided to use a SHA256 hash, we might be able to get away with using a smaller MD5 hash. We know that an MD5 hash is 128 bits in size, which gives us 2^128 possible values. Let's say that at some point in the future we find ourselves having to manage a ridiculous 1 trillion (10^12) image links, then the probability of a hash collision can be approximated using:

Substituting k = 10^12 and N = 2^128 gives us an approximate collision probability of just 1.4432 * 10^(-15). Another welcome serendipity is the fact that MD5 is supported natively by Postgres, whilst SHA256 requires an extension like pgcrypto. That said, the md5 function returns a hex encoded string, which takes up much more space than the 128 bits of information it contains. Luckily, there is native datatype in Postgres that is exactly 128-bits: UUID. Since UUID supports casting from such a hex encoded string, we can define our unique index as:

CREATE UNIQUE INDEX unique_image_link ON image_links(CAST(md5(image_link) AS UUID));

This yields a much smaller index (previous was 2336 MB):

db => SELECT pg_size_pretty(pg_relation_size('unique_image_link'));
 pg_size_pretty
----------------
 398 MB
(1 row)

As expected, this results in a lot less memory touched, and thus much less time to update:

update good index
EXPLAIN ANALYZE result - visualization showing the the cost of updating the records using an md5 as uuid index on image_link. From this we learn that this index leads to a much less severe write amplification effect than an index directly on image_link.

Of course the total time it costs to update is still significantly higher than back when we had no index at all, but there is simply no such things as a free lunch. On the other hand, we gained something beyond a uniqueness guarantee: faster lookups by image_link. The B-Tree used to enforce uniqueness may just as well be used to speedup queries that match the specific condition. As a result, whenever we want to obtain some information about an image link in some other part of our stack, we can query that efficiently using:

SELECT * FROM image_links WHERE md5(image_link)::UUID = md5('https://example.com')::UUID

Bonus: hash based exclusion constraints

Before ending the chapter we wanted address another option that we did not end up implementing but was fascinating to discover: hash based exclusion constraints. Hash indexes have gotten a bad reputation in the past because they supposedly performed very poorly. But these issues have - according to e.g. Hakibenita - since been resolved. That said, we can't just use a hash index as a unique index in Postgres, since only B-Trees can fulfill that role:

db => CREATE UNIQUE INDEX unique_image_link ON image_links USING HASH(image_link);
ERROR:  access method "hash" does not support unique indexes

Interestingly enough though, an exclusion constraint can also be used to enforce uniqueness:

ALTER TABLE image_links ADD CONSTRAINT unique_image_link EXCLUDE USING HASH (image_link WITH =);

Unlike a B-Tree over an MD5 hash, Postgres now has freedom to choose whatever hash function and data structure it deems best. Although it performs slightly better at on inserts and updates, it requires a bit more storage than the md5 as uuid index - suggesting that memory touched is not the full story.

db=> select pg_size_pretty(pg_relation_size('unique_image_link'));
 pg_size_pretty
----------------
 451 MB
(1 row)

update exclusion constraint
EXPLAIN ANALYZE result - visualization showing the cost of updating the records after adding an exclusion constraint on image_link. This shows a marginal improvement in speed over the md5 as uuid index, despite a regression in the amount of memory touched.

The linked Hakibenita article found that inserts on a hash index were about 10% faster than a B-Tree. But if you compare it against our md5 as uuid index, the gain is a slightly less impressive 2.77s ÷ 2.62s = 5.7% Furthermore, they concluded that SELECT queries using the hash index performed significantly better. We used their benchmark script to compare the lookup speed between a plain B-Tree on the image link, Our md5 as uuid B-Tree and, the hash index on our dataset:

Index kindMean lookup benchmark time
B-Tree over image_link:2.286456s
B-Tree over md5(image_link)::uuid:0.683087s
Exclusion using hash(image_link)0.555995s

The speedup between the plain B-Tree and the hash index is even more dramatic than the article found. One possible explanation is that their URLs were randomly generated and relatively short compared to the URLs in our real world dataset, yielding a more optimal B-Tree. So if anything, our experiments very much confirm how bad a B-Tree over medium to large, low entropy strings performs. On the other hand, the speedup between the md5 as uuid B-Tree and the hash index is not nearly as pronounced.

Although it is a positive result overall, we decided against adopting this method for one important reason: exclusion constraints cannot be created as INVALID. This means that the constraint cannot be created without obtaining a full table lock for potentially minutes, which is not something we can afford in a running production environment. Given that the performance benefit is marginal, we concluded that downtime in order to build this constraint was not worth it for now. We do however keep this option in our toolbox - especially for new tables where the exclusion constraint can be used from day 1.

2. The batch that didn't

At Channable - and especially at the Infra team - we like to build things in Haskell. But that does not mean that everything is Haskell. There is also a significant body of Python code to maintain and interact with. As a result, it was more practical to write the extraction job - which ensures that image links from the customers' feeds are tracked in the database - using Python. For that same reason, we use use SQLAlchemy for any database communication, just like the rest of our Python infrastructure.

The implementation is really rather simply, it boils down to upserting batches of image links to ensure that they exist, and are being kept live. In pseudocode:

from more_itertools import batched
import sqlalchemy as sa

BATCH_SIZE = 1000
db = get_db_context()
all_image_links = get_stream()

for batch in batched(all_image_links, BATCH_SIZE):
     with db.begin() as conn:
        conn.execute(sa.text(
            """
            INSERT INTO image_links
            VALUES %s
            ON CONFLICT image_link
            DO UPDATE SET referenced_at = NOW()
            """,
            batch
        ))

But something wasn't adding up. We were only seeing an average throughput of about 200 image links upserted per second. What's more, we've just taken the effort of tweaking our indexes to minimize the cost of updating, so we know things ought to be faster than that. Hoping that our trusted friend pgbadger could once again give us some insight, we opened the report:

queryduration
INSERT INTO image_links VALUES (...24ms

That was surprising, if it takes 24ms to insert 1000 links, that should have brought us well above 200 per second. Perhaps the Python process was blocked on some other operation? After all, the input feed needs to be streamed from another service - but we recently improved its latency, so that seemed unlikely. Regardless, having learned from our mistakes, we stopped speculating and started measuring.

Using pspy, we can attach to a running python process in production and produce a flamegraph of the callstack to help us figure out what the process is spending its time on:

flame graph bad batching

Once again a surprising result. Pretty much all of its time is spent executing calls to SQLAlchemy, but the database itself reports that the inserts take no time at all. At this point we were rather puzzled as to what might be happening.

It turned out that SQLAlchemy had been rather disingenuous about its internal implementation. Yes, it supports an argument list when executing a query. But no, it is not actually batching the data. In fact, we might just as well have written:

for batch in batched(all_image_links, BATCH_SIZE):
    with db.begin() as conn:
        for single_link in batch
            conn.execute(sa.text("..."), single_link)

Semantically this is not a problem, because everything is still upserted within the same transaction block, but performance-wise this is terrible. Every single image link that needs to be upserted will require a network round trip to the database server. Even if these machines live in the same datacenter, this adds up quickly. This also explains why the query only took 24ms, it wasn't inserting a batch of 1000 links, but just a single one.

Now solving this issue was a lot trickier than we expected. The SQLAlchemy documentation is overall fairly impressive, but the sheer surface that the API covers - including the different ways to accomplish the same thing - will make anyone confused. In the end, we were not able to solve the issue using plain SQLAlchemy and instead opted to take fate in our own hands by grabbing the underlying psycopg2 cursor in order to use a batching execute from psycopg2.extras:

from more_itertools import batched
from pssycopg2.extras import execute_values

db = get_db_context()
all_image_links = get_stream()
for batch in batched(all_image_links, BATCH_SIZE):
    with db.begin() as conn:
        cursor = conn.connection.cursor()
        execute_values(cursor, "...", batch, page_size=BATCH_SIZE)

This increased the throughput of the extraction job by 21x! In other words, we were now upserting a much more reasonable 4200 image links per second. To corroborate the accusation that SQLAlchemy incurs a network round trip for every item the argument list, the following graph clearly shows a reduction in incoming traffic to the database server as a result of the described change:

network traffic down with batching
Network traffic volume over time - the moment the change was deployed is marked with an M, showing a clear and significant reduction is network traffic volume. This finding aligns with the hypothesis that fewer network roundtrips are being made.

Of course the total amount of useful data is unchanged, but sending each row separately will have an overhead in both time and total data. Unsurprisingly, the database server itself also benefitted significantly from not having to process so many individual requests, as seen by a drop in CPU usage:

cpu drops with batching
Database CPU usage over time - showing that the reduced number of statements and roundtrips has a positive effect on CPU usage.

3. UPDATE now, pay later

We already discussed the relative cost of updating in the presence of certain indexes. But an optimistic reader might rightfully point out that - especially after properly batching our updates - the time spent on updating is not worryingly high anymore. Yet, the next step of our optimization journey is to drastically reduce the number of updates as much as we can. Why? Because updates have much more insidious, delayed cost…

vacuuming

Updates in Postgres are - in spite of the write-amplification issue - generally very fast. That's because deleting doesn't concern itself with moving memory around. In fact, deleting a row doesn't actually delete anything, it just marks the row as dead. Then whenever another query reads such a row, it simply pretends it isn't there. It is then up to the AUTOVACUUM process to periodically verify that the row is no longer used by any transaction, and reclaim the memory for new rows. But this has a few major consequences:

  • Vacuuming is not free - and the more rows have to be cleaned the more frequently autovacuum kicks in.
  • When new rows are inserted and reclaim these unused pieces of memory, they might end up on completely different pages which, as we know by now, can increase page fetching and is detrimental to overall query performance.
  • When querying, Postgres has to discard these dead rows looking for actual live data, slowing it down and trashing caches.

So knowing that updates should be avoided as much as possible, which ones can we eliminate in our case? If you remember the very first optimization we did, building a composite index on (download_after, referenced_at), we clarified that we use referenced_at to track the lifetime of images. In order to do that, we have to update referenced_at = NOW() every time a customer imports their feed. Problematically, this can often be multiple times per day with up to millions of image links per feed. However, tracking the lifetime of each image individually is much more granular than it needs to be. After all, customers will import their entire feed at once, so all those images can share the same referenced_at timestamp.

Concretely, this part of our schema used to look like the following:

CREATE TABLE link_extractions (
  id INT8 NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  project_id INT8 NOT NULL
);

CREATE TABLE link_extraction_x_image_link (
  image_link_id INT8 NOT NULL REFERENCES image_links(id),
  link_extraction_id INT8 NOT NULL REFERENCES link_extractions(id),
  PRIMARY KEY (image_link_id, link_extraction_id)
);

CREATE TABLE image_links (
  id INT8 NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  image_link TEXT NOT NULL,
  download_after TIMESTAMPTZ,
  referenced_at TIMESTAMPTZ NOT NULL,
  queued_at TIMESTAMPTZ
);

But if we drop referenced_at from image_links, and instead tracked whenever an extraction ran:

ALTER TABLE image_links DROP COLUMN referenced_at;
ALTER TABLE link_extractions ADD COLUMN last_ran TIMESTAMPTZ NOT NULL;

We would only have to update a single row in link_extractions instead of potentially millions of entries in image_links. Managing the lifetime of images can then be done by periodically deleting link_extractions where last_ran is some specified time in the past. We can then clean up any image_links that are not referenced by any link_extraction. Finally, what used to be a batch upsert of referenced_at, can now simply be an insert with an ON CONFLICT DO NOTHING clause:

INSERT INTO image_links
VALUES %s
ON CONFLICT DO NOTHING

This drastically reduces the number of dead tuples deleted, as shown by the data in pgbadger:

tuples removed from image_links per day
before75_301_695
after872_892

As concluded earlier, the query executing the updates was already not that slow. It was therefore not surprising that this change did not make the extraction job - which executed the updates - measurably faster. Yet, the payoff on the system overall was undeniable. Note that in the graph below we are seeing the benefits of a simplified polling query as well as reduced updates. The approximate moment of both these changes is marked with an M:

cpu drop after referenced at gone
Database CPU usage over time - Showing a reduction as a result of fewer updates and a simplified polling query that does not consider referenced_at anymore. Note that this was the optimization alluded in chapter 4 of part 1.

Bonus: note on redundant updates

On a final note, if you do need to upsert you should still take care to avoid redundant updates. For example, we store the content hash of an image to indicate a successful download. So whenever we (re)download an image we could upsert the download_result:

INSERT INTO download_result (image_link_id, content_hash)
VALUES ($1, $2)
ON CONFLICT image_link_id
DO UPDATE SET content_hash = $2

But updates are never free in Postgres, even if the old and new value are the same!
Precisely because we expect most images to remain unchanged between downloads, most of these updates are not changing the value. Instead, we can make the update conditional on there being any data changed:

INSERT INTO download_result (image_link_id, content_hash)
VALUES ($1, $2)
ON CONFLICT image_link_id
DO UPDATE SET content_hash = $2
-- Keep in mind that NULL != x is False for all values of x, so we need a special case for that
WHERE (download_result.content_hash IS NULL) != (EXCLUDED.content_hash IS NULL)
OR    download_result.content_hash != EXCLUDED.content_hash

This relatively simple change further reduces updates. Do note that this optimization is only safe if you know that such a row is not updated concurrently, otherwise it is prone to a race condition because the WHERE clause of a query is evaluated before any locks are taken. In our case we already know that we have exclusive access, as we made sure that the image links are unique and queued at most once at any given moment.

4. Death by a thousand papercuts queries

At this point in our journey we have arrived in a pretty good place. The stack is running stably, and the database is not that busy most of the time. But there was still something eating away at us: occasional spikes to 100% database CPU. After discussing these gripes with a colleague, he kindly helped me to create a bespoke dashboard that would allow us to compare various application- and database metrics. It was through this that we discovered that the database's CPU usage correlates very strongly with the number of queries per second, as well as with the active threads of the rendering service.

piet causes spike loads
From top to bottom: queries/s, db cpu, active rendering threads - showing a strong correlation between these 3 metrics, suggesting some unoptimality in the logic of the rendering task.

This rendering service takes the template created by our customers in the editor - containing references to images from their feed which have been downloaded by the downloader service - and renders them. To obtain these images from our private storage, we have to check the database to see if the downloader service has already successfully processed this image. The flow of that service - which is written is Haskell - roughly looks like the following:

data ImageRep = CustomerUrl Text | PrivateUrl Text

data Template (imageRep :: ImageRep)
    = Group [Template imageRep]
    | TextBox Text
    | ImageFromFeed imageRep

renderTemplate :: Template PrivateUrl -> IO Image
renderTemplate (ImageFromFeed path) = fetchFromPrivateStorage path
renderTemplate (TextBox text)       = renderText text
renderTemplate (Group xs)           = mergeImages =<< mapM renderTemplate xs

-- | converts customer urls to urls pointing at our self hosted cache.
resolveSingleImage :: CustomerUrl -> IO PrivateUrl
resolveSingleImage url = do
    let query = [str|
        SELECT to_private_url(download_result.content_hash)
        FROM image_links
        LEFT JOIN download_result ON image_link_id = image_links.id
        WHERE md5(image_link)::uuid = md5($1)::uuid
    |]

    result :: Maybe PrivateUrl <- selectOne query [url]
    case result of
        Nothing -> error "image was not yet downloaded"
        Just internalUrl -> pure internalUrl

-- | Processes a chunk of items
processChunkOfWork :: [Template CustomerUrl] -> IO [Image]
processChunkOfwork = mapM $ \template -> do
    -- transform the representation of urls from the customer's original url
    -- to an object reference in our private bucket.
    resolvedTemplate <- traverse resolveSingleImage template
    -- render the template, without having to worry about downloading from
    -- customer's servers.
    renderTemplate resolvedTemplate

But this causes us to do a lot of small queries, which is problematic because each statement first has to be parsed, compiled, optimized, scheduled, etc. Conveniently, we already introduced the notion of a 'a chunk of work' in our pseudocode, which we could leverage to batch lookups and amortize the overhead. Moreover, this also allows us to de-duplicate lookups for images that occur more than once within the batch.

-- | Some pure fold that collects all the referenced images in a template
uniqueLinksUsedInTemplate :: Template CustomerUrl -> [CustomerUrl]

-- Instead of transforming the template beforehand,
-- we provide a lookup table to convert in place
renderTemplate :: Map Url (Maybe PrivateUrl) -> Template CustomerUrl -> IO Image
renderTemplate lookupTable (ImageFromFeed url) =
    fetchFromPrivateStorage $
        case lookupTable Map.! url of
            Nothing -> error "image not yet downloaded"
            Just internalUrl -> internalUrl
renderTemplate _           (TextBox text)      =
    renderText text
renderTemplate lookupTable (Group xs)          =
    mergeImages =<< mapM (renderTemplate lookupTable) xs

resolveBatchOfImages :: [CustomerUrl] -> IO (Map CustomerUrl (Maybe PrivateUrl))
resolveBatchOfImages urls = do
    let query = [str|
        WITH batch_of_links AS (
            SELECT link FROM UNNEST($1) as link
        )
        SELECT to_private_url(download_result.content_hash)
        FROM batch_of_links
        INNER JOIN image_links ON md5(image_links.image_link)::uuid = md5(batch_of_links.link)::uuid
        LEFT JOIN download_result ON image_link_id = image_links.id
    |]

    Map.fromList <$> selectAll query [urls]

processChunkOfWork :: [Template CustomerUrl] -> IO [Image]
processChunkOfwork templates = do
    lookupTable <- resolveBatchOfImages (concatMap uniqueLinksUsedInTemplate templates)
    mapM (renderTemplate lookupTable) templates

This simple changed had a profound effect on our peak loads:

n plus 1 query resolved
From top to bottom: queries/s, db cpu, active rendering threads - Marking the moment lookups were grouped in small batches with M1, and in even larger batches with M2. The decorrelation of the first two metrics and the thirds suggests that the amortized cost of these lookups is significantly reduced.

What's more, processing jobs experienced less latency because they amortized network roundtrips to the database server. It is important to note however that this optimization is a trade-off between performance and memory. After all, all those image links that we fetched in a batch have to be kept in memory for the entire duration of us processing that chunk. Therefore, it is important to make sure that the size of your batches are bounded by some acceptable maximum. In this particular case, we did not observe any significant increase in memory consumption.

5. Conclusion

At the time of writing, the average database CPU usage is around 4%, with rare spikes to 25%. As a result, our services run extremely stable and have plenty of headroom to scale in the future. We can now comfortably focus on improving the customer experience and adding new features.

Let us now take a moment to look back on our accomplishments:

cpu alltime
Databse CPU usage over time - showing the CPU usage covering the entire timespan of the optimization project, showing the amalgamation and positive impact of our efforts.

Now there is a graph that tells a long story of hypothesizing, measuring, and implementing various optimizations. Amusingly, if you want a graph that shows how much we learned, you simply have to invert the y-axis.

But in all seriousness, it is also important to realize that implementing further optimizations right now is almost certainly a bad idea. Whenever there is a lack of a bottleneck, it is very hard to measure if your changes actually had a positive effect in the long run. What's worse, your change might actually be making things a lot slower, but you won't actually notice until your usage increases. A great analogy is that of lock picking, where it only makes sense to pry on the specific pin that is currently experiencing friction.

Finally as promised, here is everything that we learned - distilled into a list of takeaways to keep in mind when it comes to performance engineering with Postgres.

  1. Make an hypothesis backed by measurements before optimizing anything. Ensure that you have useful metrics and intuitive dashboards to obtain and archive these measurements.
  2. Profile, profile, profile! After fixing one bottleneck all previous measurements should be regarded as invalid.
  3. If something is faster locally, it doesn't always mean something is faster in production, always validate your changes in production.
  4. Whenever you change a query, make sure that the query plan is still sensible. This will also allow you to be precise about which indexes you actually do need and are willing to pay overhead for.
  5. Make sure that you have a database specific monitoring tool like pgbadger to identify problems like slow queries or excessive amounts of dead tuples.
  6. Be weary of building a B-Tree index over low entropy text columns, consider using hash indexes if possible.
  7. Updates incur a delayed cost, keep that in mind when deciding whether frequent updates are necessary. Also remember that even SET x = x is not free.
  8. Try to batch statements if possible, many small queries are much more costly than one large one.
  9. Avoid random keys if you can, prefer using sequential INT8 keys for the best performance.
  10. Database abstractions are lossy. If you are building a performance sensitive application then consider staying as close to the database as is practical.
    If you are worried about the continued validity of raw queries, you can verify them by running an EXPLAIN for each of them in your test suite.
  11. Place markers on dashboards to label potentially performance changing deploys. This allows everyone to verify that changes have had the desired effect, or perhaps caused problems elsewhere.
  12. Do not optimize parts that aren't a bottleneck, you won't be able to verify it and might accidentally make things worse for the future.
avatar
Hugo PetersSoftware Development

We are hiring

Are you interested in working at Channable? Check out our vacancy page to see if we have an open position that suits you!

Apply now