Skip to main content
Longer conversations in a single thread can exceed the model’s context window, leaving you unable to continue with that thread. Compaction solves this by condensing older messages in a thread into a high-density summary. This immediately restores available tokens upto a certain extent, letting you continue with the thread for a longer period of time. It also allows the Agent to focus fully on your most recent queries while maintaining a clear understanding of the thread history.

When is compaction triggered

Kombai monitors your context usage in a thread and dynamically decides the best points to compact context. Below are two common scenarios when Kombai will trigger compaction automatically:
  • Context window saturation: When the thread consumes the majority (typically above 90%) of the available context window.
  • Cache expiry: To optimize costs, Kombai auto-compacts the thread when the cache is close to expiring and context usage is at 50-60%. Some examples of cache expiry include switching the model and changing the thinking effort.
If you edit or restore a message, the previous compaction is reset, which may temporarily increase context usage. As you continue the conversation, Kombai will re-evaluate and trigger compaction again if necessary.

How context is managed

The context window is shared across several elements beyond your message history:
  • Infrastructure overhead: Tools, system prompts, skills, and RAG attachments typically occupy 15–35% of the window, depending on your tech stack.
  • Persistent rules: Project and User Rules are never compacted. They remain in their full form to ensure the Agent strictly adheres to your constraints at all times.

Toggle auto-compaction

Compaction is enabled by default. If you prefer to manage the context window manually, you can disable the Auto compact threads toggle under the General settings tab of the Settings page.

Manually compact a thread

You can trigger compaction at any time by clicking the icon at the end of your thread and selecting Compact thread.

Why compact a thread

  1. Extended conversations: Restore available tokens to maintain a single, continuous thread for long-term projects.
  2. Optimized performance: Experience faster response times as compaction will reduce the massive overhead of uncompressed history.
  3. Preserved context: Retain critical information and decision points through intelligent summarization while discarding redundant data.
  4. Cost efficiency: Dramatically reduce token consumption in API calls by compressing the massive context.