NOXTUA VOYAGE EMBED Benchmarking Report
NOXTUA VOYAGE EMBED is an embedding model fine-tuned on legal documents that Xayn shared with Voyage. It provides a substantial quality improvement of 25.3% over OpenAIʼs text-embedding-3-large.
NOXTUA VOYAGE EMBED is a model customized for retrieval tasks on legal documents that Xayn shared with Voyage. The model is based on voyage-multilingual-2, Voyageʼs latest and most powerful embedding model tailored to multilingual retrieval, and fine-tuned on EU_GER_Xayn_DeJure_Laws_Decisions, a proprietary dataset provided by Xayn. The context length is 32K tokens and the embedding dimension is 1024. The rest of the document will describe the evaluation results of NOXTUA VOYAGE EMBED against other baseline models, including voyage-multilingual-2 and OpenAIʼs text-embedding- 3-large.
Evaluation Datasets
EU_GER_Xayn_DeJure_Laws_Decisions is an extensive collection of German and EU law books as well as law cases, comprising decisions from various courts across Germany and the EU courts. The dataset is meticulously organized, with each case document containing essential metadata such as the date of decision, court hierarchy, and a summary of the key facts and outcomes. The cases cover a broad range of legal domains, including civil, criminal, administrative, and labor law, ensuring a comprehensive representation of the German and EU legal system.
The dataset consists of totally more than 20B tokens. Voyage generates 47k queries from the documents in the dataset to form an evaluation dataset, named xayn-syn-pairs-eval.
Example of pairs in xayn-syn-pairs:
Query
Relevant Doc
Evaluation Results
We compare the NOXTUA VOYAGE EMBED against other embedding models on xayn-syn-pairs-eval.
OpenAI embedding model: text-embedding-3-large
voyage-law-2, Voyage AI embedding model optimized for legal retrieval quality
voyage-multilingual-2, Voyage AI embedding model optimized for multilingual legal retrieval quality
Given a query, we retrieve the top-100 documents based on cosine similarities. We report NDCG10 and Recall@100. Both are standard metric for retrieval quality - higher is better. The table below presents the results.
The NOXTUA VOYAGE EMBED model significantly outperforms other embedding models, achieving a substantial average improvement of 25.3% over OpenAI text-embedding-3-large. Compared with voyage-multilingual-2 and voyage-law-2, NOXTUA VOYAGE EMBED achieves a 10.7% improvement in Recall@100, which validates the effectiveness of fine-tuning.
In addition, we also evaluate NOXTUA VOYAGE EMBED and baselines on a few common public legal retrieval benchmarks, such as legal_summarization, legalbench_consumer_contracts_qa, GerDaLIRSmall, and LegalQuAD. NOXTUA VOYAGE EMBED significantly outperforms text-embedding-3-large as well as other voyage models on these datasets as well.