Liquid AI

LFM2-ColBERT-350M

LFM2-ColBERT-350M is a late interaction retriever with excellent multilingual performance. It allows you to store documents in one language (for example, a product description in English) and retrieve them in many languages with high accuracy.

  • LFM2-ColBERT-350M offers best-in-class accuracy across different languages.
  • Inference speed is on par with models 2.3 times smaller, thanks to the efficient LFM2 backbone.
  • You can use it as a drop-in replacement in your current RAG pipelines to improve performance.

Find more information about LFM2-ColBERT-350M in our blog post.

🚀 Try our demo: https://huggingface.co/spaces/LiquidAI/LFM2-ColBERT

📄 Model details

Late interaction retrievers like LFM2-ColBERT-350M are particularly interesting because they preserve much of the expressivity of re-rankers while retaining the efficiency of bi-encoders. In practice, they're used to both retrieve documents at scale (like bi-encoders) and rank them at the same time (like rerankers).

image

We recommend using this model for various RAG use cases, such as:

  • E-commerce: Find products across many languages with semantic search at scale.
  • On-device semantic search: Ask questions to your phone in natural language to retrieve files, emails, and notes.
  • Enterprise knowledge assistants: Retrieve internal legal, financial, and technical documents in different languages.
Property LFM2-ColBERT-350M
Total parameters 353,322,752
Layers 25 (18 conv + 6 attn + 1 dense)
Context length 32,768 tokens
Vocabulary size 65,536
Training precision BF16
License LFM Open License v1.0

Document length: 512 tokens

Query length: 32 tokens

Output dimensionality: 128 tokens

Similarity function: MaxSim

Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

ColBERT(
  (0): Transformer({'max_seq_length': 511, 'do_lower_case': False}) with Transformer model: Lfm2Model 
  (1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

🏃 How to run

Colab link

First, install the PyLate and transformers library:

pip install -U pylate

Retrieval

Use this model with PyLate to index and retrieve documents. The index uses FastPLAID for efficient similarity search.

Indexing documents

Load LFM2-ColBERT-350M and initialize the PLAID index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="LiquidAI/LFM2-ColBERT-350M",
)
model.tokenizer.pad_token = model.tokenizer.eos_token

# Step 2: Initialize the PLAID index
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["query for document 3", "query for document 1"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use LFM2-ColBERT-350M to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "query A",
    "query B",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="LiquidAI/LFM2-ColBERT-350M",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

📈 Performance

Accuracy

We extended the NanoBEIR benchmark to include Japanese and Korean languages. We open-sourced this dataset on Hugging Face at LiquidAI/nanobeir-multilingual-extended for reproducibility. On this NanoBEIR benchmark, LFM2-ColBERT-350M displays significantly stronger multilingual capabilities (especially in German, Arabic, Korean, and Japanese) while maintaining English performance.

image

Even more interestingly, LFM2-ColBERT-350M is an excellent cross-lingual retriever. This means that it is capable of retrieving documents based on queries from other languages. This is ideal for client-facing applications, like in e-commerce, where a description might be in English but the query is in another language.

LFM2-ColBERT-350M works especially well for English, French, Spanish, Italian, Portuguese, and German, as shown with these NDCG@10 scores on NanoBEIR:

Doc / Query AR DE EN ES FR IT JA KO PT AVG
AR 0.490 0.288 0.339 0.303 0.304 0.286 0.357 0.338 0.291 33.30%
DE 0.383 0.563 0.547 0.498 0.502 0.489 0.424 0.368 0.486 47.33%
EN 0.416 0.554 0.661 0.553 0.551 0.522 0.477 0.395 0.535 51.82%
ES 0.412 0.514 0.578 0.563 0.547 0.529 0.436 0.394 0.547 50.21%
FR 0.408 0.527 0.573 0.552 0.564 0.537 0.450 0.388 0.549 50.53%
IT 0.395 0.512 0.554 0.535 0.535 0.543 0.439 0.386 0.529 49.20%
JA 0.375 0.365 0.409 0.358 0.345 0.337 0.557 0.491 0.330 39.63%
KO 0.326 0.274 0.310 0.282 0.265 0.266 0.440 0.527 0.271 32.89%
PT 0.402 0.499 0.558 0.545 0.528 0.529 0.436 0.382 0.547 49.17%
AVG 40.07% 45.51% 50.32% 46.54% 46.00% 44.86% 44.62% 40.78% 45.38%

In comparison, GTE-ModernColBERT-v1 consistently gets lower scores when documents and queries are not in the same language:

Doc / Query AR DE EN ES FR IT JA KO PT AVG
AR 0.309 0.089 0.107 0.089 0.094 0.092 0.070 0.049 0.087 10.96%
DE 0.039 0.499 0.454 0.362 0.393 0.367 0.133 0.061 0.361 29.65%
EN 0.042 0.408 0.680 0.446 0.484 0.420 0.167 0.073 0.438 35.08%
ES 0.044 0.360 0.485 0.525 0.465 0.437 0.149 0.061 0.487 33.48%
FR 0.044 0.381 0.505 0.455 0.546 0.428 0.136 0.057 0.467 33.35%
IT 0.043 0.369 0.449 0.446 0.451 0.516 0.143 0.054 0.448 32.36%
JA 0.031 0.169 0.250 0.172 0.177 0.169 0.459 0.059 0.165 18.35%
KO 0.030 0.134 0.169 0.127 0.133 0.125 0.090 0.368 0.124 14.45%
PT 0.043 0.368 0.479 0.492 0.467 0.448 0.138 0.062 0.530 33.63%
AVG 6.94% 30.84% 39.75% 34.59% 35.68% 33.35% 16.53% 9.37% 34.24%

This makes retrieval a lot more reliable and can replace architectures with multiple models with a single, unified retriever.

Inference speed

Despite being more than twice as big, LFM2-ColBERT-350M demonstrates throughput performance on par with GTE-ModernColBERT-v1 for query and document encoding across various batch sizes.

Query encoding was evaluated using realistic query patterns from datasets like MS MARCO and Natural Questions.

image

Document encoding was measured on realistic documents with varying lengths and domains.

image

📬 Contact

If you are interested in custom solutions with edge deployment, please contact our sales team.

Please cite the PyLate library if you use it for inference or training:

@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}
Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using LiquidAI/LFM2-ColBERT-350M 1

Evaluation results