I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face!
What's new: I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations.
I've also released new Kuvera models trained on this improved dataset: - Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context. - Kuvera-12B: A first experimental reasoning model focused on the query resolution.
As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI.
I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out!
P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!
The world's first Intermediate Thinking Model is now available to everyone!
Dhanishtha 2.0 Preview brings revolutionary intermediate thinking capabilities to the open-source community. Unlike traditional reasoning models that think once, Dhanishtha can think, answer, rethink, answer again, and continue rethinking as needed using multiple blocks between responses.
🚀 Key Features - Intermediate thinking: Think → Answer → Rethink → Answer → Rethink if needed... - Token efficient: Uses up to 79% fewer tokens than DeepSeek R1 on similar queries - Transparent thinking: See the model's reasoning process in real-time - Open source: Freely available for research and development
If you find this cool, please like the spaces and videos ❤️. Now that they extended the time by 2 days, I will polish the custom component a little more and update the submission.
Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):
1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196) + history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) 2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011) Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs
3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247) Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both
4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549) A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.
5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047) Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.
Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇
We (@osma, @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.
We are glad to report that our system performed well; it was ranked
🥇 1st in the category where the full vocabulary was used 🥈 2nd in the smaller vocabulary category 🏅 4th in the qualitative evaluations.
14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.
We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.