The Markovian Thinker
Collection
Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.
•
7 items
•
Updated
•
10
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, trained with the Delethink RL paradigm. See the paper for full details.deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\boxed{}.import asyncio
import sglang as sgl
def main():
llm = sgl.Engine(
model_path="McGill-NLP/longcot-8k-1.5b",
dtype="bfloat16",
attention_backend="flashinfer",
mem_fraction_static=0.8,
log_level="WARNING",
)
prompt = (
r"There exist real numbers $x$ and $y$, both greater than 1, such that "
r"$\log_x\left(y^x\right)=\log_y\left(x^{4y}\right)=10$. Find $xy$."
"\n\nPlease reason step by step, and put your final answer within \\boxed{}."
)
tok = llm.tokenizer_manager.tokenizer
query_ids = tok.apply_chat_template(
[{"role": "user", "content": prompt}],
tokenize=True,
add_generation_prompt=True,
)
params = {"temperature": 0.6, "max_new_tokens": 8192}
ids = llm.generate(input_ids=query_ids, sampling_params=params, return_logprob=True)
print(tok.decode(ids, skip_special_tokens=False))
if __name__ == "__main__":
main()
@misc{Aghajohari2025:TheMarkovianThinker,
title={The Markovian Thinker},
author={Milad Aghajohari and Kamran Chitsaz and Amirhossein Kazemnejad and Sarath Chandar and Alessandro Sordoni and Aaron Courville and Siva Reddy},
year={2025},
eprint={2510.06557},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.06557},
}
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B