Modal cuts inference cold start times by 40x, pushing serverless GPU limits
Modal details its engineering approach combining cloud buffers, custom filesystems, process checkpointing, and CUDA checkpointing to slash inference cold starts from minutes to tens of seconds.