If your bot can’t stay in character, breaks formatting, or seems forgetful, you might be dealing with token bloat. In this article, I’ll explain how context memory works, what causes token bloat, and how to avoid it to have good chats.

Common Symptoms of Token Bloat:

Short attention span: The bot doesn’t remember much of the previous conversation.
Dementia: It loses the roleplay thread, ignores format or facts, switches scenes erratically.
Psychosis: It breaks format severely, writes responses for you, or starts spitting prompts and example messages.

But let’s start with the basics.

📏 Context Size

The context size is the amount of information—measured in tokens (we’ll get to those next)—that LLM can keep in mind at one time while chatting or roleplay.

📜 Structure And Types Of Memory

I already posted about how chat context is structured—it’s hard to put into a couple of words, so better study 👇 this diagram. Now we need to understand priority and permanence of memory.

Priority

🌟 High: The most recent messages at the bottom of the context. Heavily influences the AI’s behavior, driving it here and now.

💭 Low: Content at the top (system prompt, bot description). Considered background information, less immediate but still impactful.

Memory

💎 Permanent: Always present in the context (constant lorebook entries, scenario, bot’s description, your persona, character note, etc)

⌛ Temporal: Eventually pushed out as the chat continues (greeting, older messages in chat, examples)

🎡 Dynamic: standard lorebook entries triggered by keywords.

📏 Context Size

📜 Structure And Types Of Memory

🧩 TOKENS