Skip to main content
Longer conversations in a single thread can exceed the model’s context window, leaving you unable to continue with that thread. Compaction solves this by condensing older messages in a thread into a high-density summary. This immediately restores available tokens upto a certain extent, letting you continue with the thread for a longer period of time. It also allows the Agent to focus fully on your most recent queries while maintaining a clear understanding of the thread history.

When is compaction triggered

Kombai monitors your context usage in a thread and dynamically decides the best points to compact context. Below are two common scenarios when Kombai will trigger compaction automatically:
  • Context window saturation: When the thread consumes the majority (typically above 90%) of the available context window.
  • Cache expiry: To optimize costs, Kombai auto-compacts the thread when the cache is close to expiring and context usage is at 50-60%. Some examples of cache expiry include toggling Agent Variant and Extended Thinking.
If you edit or restore a message, the previous compaction is reset, which may temporarily increase context usage. As you continue the conversation, Kombai will re-evaluate and trigger compaction again if necessary.

How context is managed

The context window is shared across several elements beyond your message history:
  • Infrastructure overhead: Tools, system prompts, skills, and RAG attachments typically occupy 15–35% of the window, depending on your tech stack.
  • Persistent rules: Project and User Rules are never compacted. They remain in their full form to ensure the Agent strictly adheres to your constraints at all times.

Toggle auto-compaction

Compaction is enabled by default. If you prefer to manage the context window manually, you can disable the Auto compact threads toggle under the
icon.

Manually compact a thread

You can trigger compaction at any time by clicking the icon at the end of your thread and selecting Compact thread.

Why compact a thread

  1. Extended conversations: Restore available tokens to maintain a single, continuous thread for long-term projects.
  2. Optimized performance: Experience faster response times as compaction will reduce the massive overhead of uncompressed history.
  3. Preserved context: Retain critical information and decision points through intelligent summarization while discarding redundant data.
  4. Cost efficiency: Dramatically reduce token consumption in API calls by compressing the massive context.