Tech
February 5, 2024
By default, Haskell does not provide call stacks when errors occur. To get call stacks, one can add the HasCallStack
constraint to any function to request it. However, did you know that doing this carelessly can cause memory usage to explode? We didn't either! In fact, we had this very issue in production after sprinkling some HasCallStack
constraints over our codebase. It doesn't have to be this way, though. In this post, we will show you why this happens, and how to have complete call stacks without your application's memory usage going kaboom.
Let's take a look at the following minimal code example:
import GHC.Stack
import System.Environment
main :: (HasCallStack) => IO ()
main = do
[v] <- getArgs
loopThenError $ read v
loopThenError :: (HasCallStack) => Integer -> IO ()
loopThenError 0 = logError
loopThenError n = loopThenError (n - 1)
logError :: (HasCallStack) => IO ()
logError = do
putStrLn "Look at my callstack!"
putStrLn $ prettyCallStack callStack
Let's run it, and tell it to loop 5
times before throwing an error. Let's then look at the output to see what happens:
$ ghc --make -O2 Callstack.hs && ./Callstack 5
Look at my callstack!
CallStack (from HasCallStack):
logError, called at Callstack.hs:10:19 in main:Main
loopThenError, called at Callstack.hs:11:19 in main:Main
loopThenError, called at Callstack.hs:11:19 in main:Main
loopThenError, called at Callstack.hs:11:19 in main:Main
loopThenError, called at Callstack.hs:11:19 in main:Main
loopThenError, called at Callstack.hs:11:19 in main:Main
loopThenError, called at Callstack.hs:7:3 in main:Main
main, called at Callstack.hs:5:1 in main:Main
Note that the call stack has an entry for every time loopThenError
is called. This means that the size of the call stack is proportional to the amount of times loopThenError
recurses! That also explains what happened in our production application: it was running in some loop for quite a long time. That then built up a huge call stack. Then, when some exception was thrown, it tried to render the huge call stack as a string. That in turn blew up the memory and got the application OOM killed.
Calling the above example with ./Callstack <count> +RTS -s
, and playing with the value for count
, will show that that the application indeed uses O(count)
memory.
Once we realized that this was the cause of our memory explosion, our first thought was "oh yeah of course". Nevertheless, we would argue that this is a pitfall. Ideally we want errors to have a call stack from problem-site all the way up to main
, as that gives us the most useful information. However, we have to remember that we cannot carelessly go around the code and sprinkle HasCallStack
to any relevant function. Having to watch out for memory explosion pitfalls is a barrier to getting proper call stacks in production.
That said, let's take look at what we can do about it.
One rather trivial solution here, is to just remove HasCallStack
from loopThenError
. This cuts the chain of HasCallStack
constraints, thus preventing a big call stack from ever being built up. Sadly, though, this means that your call stacks will no longer go all the way to main
. Perhaps in your application this is still enough information to debug problems, perhaps it is not.
After removing the HasCallStack
constraint from loopThenError
, the printed call stack looks like this:
Look at my callstack!
CallStack (from HasCallStack):
logError, called at CallstackAvoid.hs:10:19 in main:Main
One idea here, regardless of other solutions, is to add a warning to the Haddock page of HasCallStack
telling people to "mind the pit": be careful where you add HasCallStack
, and do not mindlessly add it to recursive functions. The exact wording of the warning is debatable, but it would be valuable to have it there. In a way, this blog post is already one such warning sign, though it is not placed next to the pit.
In defense of HasCallStack
, the behavior makes perfect sense. The call stack mechanism is conceptually simple: it is just a stack that is pushed to whenever a function is called. It has no intelligent behavior that prevents cycles. Arguably, adding such intelligence would also make it more complicated, slower, and harder to reason about.
Because we now know that this is a risk, we can prevent it from happening. Let's start by grabbing a plank we have lying in basement, and use popCallStack
to prevent the stack trace from building indefinitely:
loopThenError :: (HasCallStack) => Integer -> IO ()
loopThenError 0 = logError
loopThenError n =
let ?callStack = popCallStack callStack
in ?callStack `seq` loopThenError (n - 1)
This requires the ImplicitParams
to work. What it does, is grab the call stack, remove loopThenError
from it, before passing the popped call stack to loopThenError
.
Note: The seq
is there to prevent the thunk of the popped call stack from building up. Since ?callStack
is not actually used by loopThenError
, there is no demand for it. So that would build O(count)
thunks until throwError
finally evaluates it for printing.
Anyway, let's run it and see what it does:
ghc --make -O2 CallstackPop.hs && ./CallstackPop 10000000 +RTS -s
Look at my callstack!
CallStack (from HasCallStack):
logError, called at CallstackPop.hs:11:19 in main:Main
loopThenError, called at CallstackPop.hs:14:23 in main:Main
main, called at CallstackPop.hs:6:1 in main:Main
720,091,712 bytes allocated in the heap
12,936 bytes copied during GC
45,072 bytes maximum residency (1 sample(s))
32,752 bytes maximum slop
6 MiB total memory in use (0 MB lost due to fragmentation)
...
Sweet! Only 6MiB total memory, the call stack is short, and still all the way from logError
to main
! Problem solved! There are still some downsides, tough:
seq
, the memory still builds up. This introduces a new thing to be very careful of. Metaphorically, this is a slippery plank.In the previous section, we used popCallStack
. Let's see how we can use Matt's thawCallStack
function. This function undoes the effect of freezeCallStack
. freezeCallStack
places a marker on top of the call stack, one that prevents new entries from being added. thawCallStack
simply pops that marker off the call stack:
-- | Reverses the effect of 'freezeCallStack' by popping off the 'FreezeCallStack' marker off the call stack.
{-# INLINE thawCallStack #-}
thawCallStack :: CallStack -> CallStack
thawCallStack callStack = case callStack of
FreezeCallStack stack -> stack
alreadyThawedStack -> alreadyThawedStack
With that function ready, let's rewrite loopThenError
to strategically freeze and thaw:
loopThenError :: (HasCallStack) => Integer -> IO ()
loopThenError n = let ?callStack = thawCallStack callStack in case n of
0 -> logError
n -> withFrozenCallStack $ loopThenError $ n - 1
This shares all but one of the downsides of the previous plank: it does not require seq
. Instead, the INLINE
pragma prevents that the call to thawCallStack
itself does not become an unevaluated thunk.
The output is the same as the solution above, and the performance looks similar:
ghc --make -O2 CallstackThawed.hs && ./CallstackThawed 10000000 +RTS -s
[1 of 2] Compiling Main ( CallstackThawed.hs, CallstackThawed.o )
[2 of 2] Linking CallstackThawed
Look at my callstack!
CallStack (from HasCallStack):
logError, called at CallstackThawed.hs:15:8 in main:Main
loopThenError, called at CallstackThawed.hs:11:3 in main:Main
main, called at CallstackThawed.hs:9:1 in main:Main
160,092,208 bytes allocated in the heap
9,176 bytes copied during GC
45,072 bytes maximum residency (1 sample(s))
32,752 bytes maximum slop
6 MiB total memory in use (0 MB lost due to fragmentation)
...
One trivial, but effective and clean way to solve this problem, is to walk around the pit and avoid the problem altogether. The indefinite growth of the call stack comes from the recursion of a function with HasCallStack
. This buildup can be prevented by delegating the recursion itself to some go
function:
loopThenError :: (HasCallStack) => Integer -> IO ()
loopThenError = go
where
go n = case n of
0 -> logError
_ -> go (n - 1)
Because the call stack is an implicit parameter, it is in scope for the entire definition of loopThenError
. That includes the go
function. This means that the stack trace will not grow when go
calls itself. This method works really well to solve the problem after you have found it. It also has a benefit over popCallStack
and thawCallStack
in that it requires no explicit stack manipulation. One still needs to be careful, though, as one small refactor would reintroduce the exact same problem.
This solution was the simplest for our use case, so here is where we take a step back from staring at the pit. Let's see what we have learned so far.
The first and foremost lesson is that one cannot carelessly sprinkle HasCallStack
without checking for recursive functions. The second lesson is in understanding why it happened, and how to work around it. Importantly, none of the above solutions are perfect. Each solution has its own tradeoff. The trick is in deciding which solution is good enough. That may just be avoiding the pit altogether by not bothering with call stacks, or indeed coming up with a solution that outshines the ones above.
We would like to invite the reader to play with the different examples above, and answer the following question: What does the call stack look like at different points, and why does it end up the way it does? Hint: you can use trace (prettyCallStack callStack) $ ...
at various points to inspect the call stack.
Are you interested in working at Channable? Check out our vacancy page to see if we have an open position that suits you!
Apply now