{"id":7884,"date":"2026-01-28T11:02:36","date_gmt":"2026-01-28T11:02:36","guid":{"rendered":"https:\/\/gurukulgalaxy.com\/blog\/?p=7884"},"modified":"2026-03-01T05:28:00","modified_gmt":"2026-03-01T05:28:00","slug":"top-10-relevance-evaluation-toolkits-features-pros-cons-comparison","status":"publish","type":"post","link":"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/","title":{"rendered":"Top 10 Relevance Evaluation Toolkits: Features, Pros, Cons &amp; Comparison"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"559\" src=\"https:\/\/gurukulgalaxy.com\/blog\/wp-content\/uploads\/2026\/01\/920.jpg\" alt=\"\" class=\"wp-image-7904\" srcset=\"https:\/\/gurukulgalaxy.com\/blog\/wp-content\/uploads\/2026\/01\/920.jpg 1024w, https:\/\/gurukulgalaxy.com\/blog\/wp-content\/uploads\/2026\/01\/920-300x164.jpg 300w, https:\/\/gurukulgalaxy.com\/blog\/wp-content\/uploads\/2026\/01\/920-768x419.jpg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Top_10_Relevance_Evaluation_Toolkits\" >Top 10 Relevance Evaluation Toolkits<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#1_%E2%80%94_Ragas\" >1 \u2014 Ragas<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#2_%E2%80%94_ranx\" >2 \u2014 ranx<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#3_%E2%80%94_Arize_Phoenix\" >3 \u2014 Arize Phoenix<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#4_%E2%80%94_BEIR_Benchmarking_Information_Retrieval\" >4 \u2014 BEIR (Benchmarking Information Retrieval)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#5_%E2%80%94_DeepEval\" >5 \u2014 DeepEval<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#6_%E2%80%94_Microsoft_RELEVANCE_Framework\" >6 \u2014 Microsoft RELEVANCE Framework<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#7_%E2%80%94_Elasticsearch_Ranking_Evaluation_API\" >7 \u2014 Elasticsearch Ranking Evaluation API<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#8_%E2%80%94_PyTerrier\" >8 \u2014 PyTerrier<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#9_%E2%80%94_OpenSearch_Relevance_Evaluation\" >9 \u2014 OpenSearch Relevance Evaluation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#10_%E2%80%94_TREC_Evaluation_Toolkit_trec-eval\" >10 \u2014 TREC Evaluation Toolkit (trec-eval)<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Comparison_Table\" >Comparison Table<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Evaluation_Scoring_of_Relevance_Evaluation_Toolkits\" >Evaluation &amp; Scoring of Relevance Evaluation Toolkits<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Which_Relevance_Evaluation_Toolkit_Is_Right_for_You\" >Which Relevance Evaluation Toolkit Is Right for You?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Frequently_Asked_Questions_FAQs\" >Frequently Asked Questions (FAQs)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/gurukulgalaxy.com\/blog\/top-10-relevance-evaluation-toolkits-features-pros-cons-comparison\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Relevance evaluation toolkits are specialized software environments designed to assess the effectiveness of ranking algorithms. These tools allow developers and data scientists to move beyond &#8220;vibes-based&#8221; testing\u2014where an engineer manually checks a few queries\u2014and transition toward rigorous, metric-driven optimization. By using standardized metrics such as Normalized Discounted Cumulative Gain (<em>N<\/em><em>D<\/em><em>CG<\/em>), Mean Reciprocal Rank (<em>MRR<\/em>), and Precision at K (<em>P<\/em>@<em>K<\/em>), these toolkits provide a mathematical foundation for measuring search success.<\/p>\n\n\n\n<p>The importance of these toolkits has surged with the rise of Large Language Models (LLMs). Today, relevance isn&#8217;t just about matching words; it\u2019s about context, factual grounding, and semantic intent. Key real-world use cases include A\/B testing search ranking changes in e-commerce, validating the &#8220;retrieval&#8221; half of a RAG pipeline to prevent hallucinations, and benchmarking new neural ranking models against established baselines like BM25. When selecting a toolkit, users should prioritize support for both &#8220;offline&#8221; (using pre-labeled datasets) and &#8220;online&#8221; (using user clicks or LLM-as-a-judge) evaluation, as well as the ability to handle large-scale document collections.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p><strong>Best for:<\/strong>&nbsp;Search engineers, ML researchers, and AI developers at organizations of all sizes. They are indispensable for companies where search quality directly impacts revenue (e.g., e-commerce, streaming services) or where data accuracy is a legal requirement (e.g., legal-tech, medical research).<\/p>\n\n\n\n<p><strong>Not ideal for:<\/strong>&nbsp;Basic website owners using &#8220;out-of-the-box&#8221; WordPress or Shopify search modules that do not allow for custom ranking adjustments. If you cannot change the underlying algorithm, an evaluation toolkit will only tell you what you already know without giving you the power to fix it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Top_10_Relevance_Evaluation_Toolkits\"><\/span>Top 10 Relevance Evaluation Toolkits<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_%E2%80%94_Ragas\"><\/span>1 \u2014 Ragas<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Ragas is the industry leader for evaluating Retrieval-Augmented Generation (RAG) systems. Unlike traditional IR tools that rely solely on human-labeled &#8220;ground truth,&#8221; Ragas pioneered the use of &#8220;LLM-as-a-judge&#8221; to evaluate the relationship between questions, retrieved contexts, and generated answers.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>&#8220;Faithfulness&#8221; metric to detect hallucinations by checking answers against retrieved context.<\/li>\n\n\n\n<li>&#8220;Answer Relevancy&#8221; to ensure the response directly addresses the user&#8217;s query.<\/li>\n\n\n\n<li>&#8220;Context Recall&#8221; to measure if the retriever found all necessary information.<\/li>\n\n\n\n<li>Synthetic test data generation to create evaluation sets without human labeling.<\/li>\n\n\n\n<li>Seamless integration with LangChain and LlamaIndex.<\/li>\n\n\n\n<li>Support for multiple LLM backends (OpenAI, Anthropic, local models).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Eliminates the need for expensive and slow human annotation in early-stage development.<\/li>\n\n\n\n<li>Specifically designed for the nuances of generative AI rather than just document ranking.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Evaluating thousands of queries can become expensive due to LLM API costs.<\/li>\n\n\n\n<li>The metrics can be &#8220;noisy&#8221; if the judge LLM is not powerful enough (e.g., using GPT-3.5 vs GPT-4).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0Complies with the security of the underlying LLM provider (e.g., SOC 2 for OpenAI). Support for local LLMs via Ollama allows for air-gapped evaluation.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Very active GitHub community; widely documented by popular AI educators and framework maintainers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_%E2%80%94_ranx\"><\/span>2 \u2014 ranx<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>If Ragas is the choice for AI, ranx is the choice for performance. It is a blazing-fast Python library designed for ranking evaluation at an industrial scale. Built on top of Numba, it allows for high-speed vector operations that outperform traditional tools like trec-eval.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Blazing-fast computation of 20+ metrics including\u00a0<em>N<\/em><em>D<\/em><em>CG<\/em>,\u00a0<em>M<\/em><em>A<\/em><em>P<\/em>, and\u00a0<em>MRR<\/em>.<\/li>\n\n\n\n<li>Support for &#8220;Runs&#8221; and &#8220;Qrels&#8221; (Query Relevance) management in multiple formats (JSON, Pandas, TREC).<\/li>\n\n\n\n<li>Automated statistical testing (T-Test, Wilcoxon) to compare different models.<\/li>\n\n\n\n<li>Built-in LaTeX table generation for scientific reporting.<\/li>\n\n\n\n<li>High-speed parallelization for handling millions of query-document pairs.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Orders of magnitude faster than other Python-based evaluation libraries.<\/li>\n\n\n\n<li>Extremely easy to integrate into existing ML pipelines thanks to its &#8220;Plug &amp; Play&#8221; philosophy.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Lacks the generative AI metrics (like faithfulness) found in Ragas.<\/li>\n\n\n\n<li>Focused strictly on ranking; it does not help with data labeling or dataset creation.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0Open-source and runs entirely locally. N\/A for cloud compliance as it is a library, not a service.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Solid documentation; primarily used in academic research and high-performance search engineering.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_%E2%80%94_Arize_Phoenix\"><\/span>3 \u2014 Arize Phoenix<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Arize Phoenix is an open-source observability and evaluation toolkit built for AI engineers. It focuses on the experimental and development stages of LLM applications, providing deep tracing and RAG-specific evaluation metrics.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>OpenTelemetry-based tracing to visualize how a query moves through a RAG pipeline.<\/li>\n\n\n\n<li>LLM-assisted evaluation for context relevance and QA correctness.<\/li>\n\n\n\n<li>Embedding visualization to detect &#8220;clusters&#8221; of failed queries.<\/li>\n\n\n\n<li>Side-by-side comparison of different retrieval strategies (e.g., dense vs hybrid).<\/li>\n\n\n\n<li>Integration with major vector databases like Qdrant, Pinecone, and Weaviate.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Provides a visual, interactive dashboard that makes debugging much easier than looking at CLI logs.<\/li>\n\n\n\n<li>Vendor-neutral instrumentation means you aren&#8217;t locked into a specific AI platform.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Can be complex to set up for production-level tracing.<\/li>\n\n\n\n<li>The local-first approach might require significant RAM for very large trace datasets.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0SOC 2 Type II (Enterprise version), GDPR compliant. Open-source version allows for complete local data control.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Strong backing from Arize AI; extensive tutorials and a professional community on Slack.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_%E2%80%94_BEIR_Benchmarking_Information_Retrieval\"><\/span>4 \u2014 BEIR (Benchmarking Information Retrieval)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>BEIR is a standardized framework for evaluating &#8220;zero-shot&#8221; retrieval. It is a collection of 18+ datasets across different domains (medical, legal, financial) that researchers use to see how well their search models generalize to new data.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Diverse dataset collection covering BioASQ, TREC-COVID, HotpotQA, and more.<\/li>\n\n\n\n<li>Modular data format (Corpus, Queries, Qrels) that has become an industry standard.<\/li>\n\n\n\n<li>Support for multiple retrieval paradigms (Lexical, Dense, Sparse, Late-Interaction).<\/li>\n\n\n\n<li>Standardized evaluation scripts to ensure results are comparable across different papers.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>The &#8220;gold standard&#8221; for academic research in information retrieval.<\/li>\n\n\n\n<li>Essential for testing if a model trained on Wikipedia will actually work in a specialized field like Law.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Primarily a benchmarking suite; not a tool for real-time production monitoring.<\/li>\n\n\n\n<li>Handling the massive raw datasets (tens of gigabytes) requires significant storage and compute.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0N\/A (Dataset and benchmarking scripts).<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Massive academic following; maintained by the creators of the Sentence-Transformers library.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"5_%E2%80%94_DeepEval\"><\/span>5 \u2014 DeepEval<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Often described as &#8220;Pytest for LLMs,&#8221; DeepEval is an open-source framework that treats relevance evaluation as a unit-testing problem. It is designed to be integrated directly into CI\/CD pipelines.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>14+ research-backed metrics including Hallucination, Bias, and Toxicity.<\/li>\n\n\n\n<li>Tight integration with the Pytest framework.<\/li>\n\n\n\n<li>Automated dataset generation from existing knowledge bases.<\/li>\n\n\n\n<li>Companion cloud platform (Confident AI) for tracking regression over time.<\/li>\n\n\n\n<li>Support for &#8220;Human-in-the-loop&#8221; to verify LLM-generated scores.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>If you know how to write software tests, you know how to use DeepEval.<\/li>\n\n\n\n<li>Excellent for preventing &#8220;quality regressions&#8221; before code is merged into production.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Heavy reliance on LLM-as-a-judge can lead to high costs if not monitored.<\/li>\n\n\n\n<li>Some advanced metrics require the paid Confident AI platform for the best experience.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0SOC 2, HIPAA readiness, and GDPR compliance via the Confident AI cloud.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Rapidly growing community on Discord; very responsive lead developers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"6_%E2%80%94_Microsoft_RELEVANCE_Framework\"><\/span>6 \u2014 Microsoft RELEVANCE Framework<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Microsoft\u2019s RELEVANCE (Relevance and Entropy-based Evaluation with Longitudinal Inversion Metrics) is an enterprise-grade framework designed for the longitudinal evaluation of generative AI.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Permutation Entropy (<em>PEN<\/em>) to quantify how much a model\u2019s ranking deviates from human standards.<\/li>\n\n\n\n<li>Count Inversions (<em>C<\/em><em>I<\/em><em>N<\/em>) to measure disorder in ranked lists.<\/li>\n\n\n\n<li>Longest Increasing Subsequence (<em>L<\/em><em>I<\/em><em>S<\/em>) to identify patterns of consistent relevance.<\/li>\n\n\n\n<li>Designed for automated detection of &#8220;model slip&#8221; or &#8220;model hallucination&#8221; over time.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Uses sophisticated mathematical metrics that go beyond simple accuracy scores.<\/li>\n\n\n\n<li>Ideal for monitoring how a model\u2019s performance changes as it is updated or fine-tuned.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>More &#8220;academic&#8221; in its terminology; requires a strong background in statistics to interpret.<\/li>\n\n\n\n<li>Not as &#8220;plug-and-play&#8221; as tools like Ragas or DeepEval for simple RAG apps.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0Enterprise-grade security integrated into the Microsoft Azure AI ecosystem.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Backed by Microsoft Research; documentation is thorough but geared toward researchers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"7_%E2%80%94_Elasticsearch_Ranking_Evaluation_API\"><\/span>7 \u2014 Elasticsearch Ranking Evaluation API<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>For the millions of developers already using Elasticsearch, the built-in Ranking Evaluation API provides a native way to measure search quality without exporting data to external tools.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Native integration with the Elasticsearch engine.<\/li>\n\n\n\n<li>Support for\u00a0<em>N<\/em><em>D<\/em><em>CG<\/em>,\u00a0<em>M<\/em><em>A<\/em><em>P<\/em>,\u00a0<em>MRR<\/em>, and Precision\/Recall.<\/li>\n\n\n\n<li>&#8220;Rank Eval&#8221; API that takes a set of queries and expected document IDs.<\/li>\n\n\n\n<li>Allows for testing different &#8220;Query DSL&#8221; configurations against a ground truth set.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>No need for additional infrastructure or external libraries.<\/li>\n\n\n\n<li>Uses the exact same ranking logic and indices that serve your production users.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Lacks modern neural\/LLM metrics; strictly evaluates document ranking.<\/li>\n\n\n\n<li>No built-in visualization; you must build your own dashboards to see trends.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0Inherits Elasticsearch\u2019s robust security (RBAC, TLS, SOC 2, HIPAA, etc.).<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Massive enterprise support from Elastic; endless community tutorials.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"8_%E2%80%94_PyTerrier\"><\/span>8 \u2014 PyTerrier<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>PyTerrier is a Python-based wrapper for the Terrier IR platform. It is a highly declarative framework that allows users to build &#8220;transformer pipelines&#8221; for search, similar to how Scikit-Learn works for ML.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Declarative syntax using operator notation (e.g.,\u00a0<code>bm25 &gt;&gt; reranker<\/code>).<\/li>\n\n\n\n<li>Built-in\u00a0<code>pt.Experiment()<\/code>\u00a0function to evaluate multiple pipelines simultaneously.<\/li>\n\n\n\n<li>Native support for learned sparse retrieval (SPLADE) and dense retrieval (ColBERT).<\/li>\n\n\n\n<li>Deep integration with the BEIR benchmark and MS MARCO datasets.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>The most flexible tool for building and comparing complex &#8220;Multi-stage&#8221; search architectures.<\/li>\n\n\n\n<li>Makes it very easy to swap components (e.g., trying a different reranker) and see the immediate impact.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Requires the installation of Java as it runs on the Terrier Java core.<\/li>\n\n\n\n<li>The learning curve for its unique operator syntax can be a bit steep.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0Local-first, open-source library.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Widely used in IR labs at universities worldwide; very strong academic documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"9_%E2%80%94_OpenSearch_Relevance_Evaluation\"><\/span>9 \u2014 OpenSearch Relevance Evaluation<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Similar to Elasticsearch, OpenSearch offers a dedicated &#8220;Relevance Search&#8221; and evaluation framework. Since the fork from Elasticsearch, OpenSearch has focused heavily on making search more &#8220;human-centric.&#8221;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Comparison UI: A side-by-side view to compare results from two different queries.<\/li>\n\n\n\n<li>Built-in Ranking Evaluation API with standard IR metrics.<\/li>\n\n\n\n<li>Support for &#8220;Search Pipelines&#8221; to test preprocessing steps (e.g., synonym expansion).<\/li>\n\n\n\n<li>Integrated support for k-NN (vector) search and hybrid search evaluation.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>The comparison UI is one of the best &#8220;visual&#8221; ways to show stakeholders why one algorithm is better.<\/li>\n\n\n\n<li>Completely open-source (Apache 2.0 license) with no commercial &#8220;paywalls&#8221; for core features.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Still trailing behind the broader AI ecosystem in terms of native LLM-as-a-judge metrics.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0SOC 2, HIPAA, PCI DSS, and ISO compliant.<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Fast-growing community driven by AWS and other major tech firms.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"10_%E2%80%94_TREC_Evaluation_Toolkit_trec-eval\"><\/span>10 \u2014 TREC Evaluation Toolkit (trec-eval)<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>The TREC (Text REtrieval Conference) evaluation toolkit, specifically the&nbsp;<code>trec_eval<\/code>&nbsp;C program, is the &#8220;ancestor&#8221; of all modern evaluation tools. It remains a requirement for many academic submissions.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key features:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Standardized implementation of nearly every IR metric known to man.<\/li>\n\n\n\n<li>Highly optimized C code for processing large-scale evaluation runs.<\/li>\n\n\n\n<li>Used as the official scoring tool for the NIST TREC conferences for 30+ years.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pros:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Unquestionable correctness; every other tool on this list uses\u00a0<code>trec_eval<\/code>\u00a0to verify its own math.<\/li>\n\n\n\n<li>Essential for long-term research comparability.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cons:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Very old-school; it is a command-line tool that requires specific file formats (TREC runs).<\/li>\n\n\n\n<li>No native Python support (though many wrappers like\u00a0<code>pytrec_eval<\/code>\u00a0exist).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Security &amp; compliance:<\/strong>\u00a0N\/A (Local CLI tool).<\/li>\n\n\n\n<li><strong>Support &amp; community:<\/strong>\u00a0Maintained by NIST; the ultimate authority in search evaluation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Comparison_Table\"><\/span>Comparison Table<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td>Tool Name<\/td><td>Best For<\/td><td>Platform(s) Supported<\/td><td>Standout Feature<\/td><td>Rating (Gartner\/Peer)<\/td><\/tr><\/thead><tbody><tr><td><strong>Ragas<\/strong><\/td><td>RAG \/ Generative AI<\/td><td>Python, Cloud<\/td><td>Faithfulness Metric<\/td><td>4.8 \/ 5<\/td><\/tr><tr><td><strong>ranx<\/strong><\/td><td>High-performance IR<\/td><td>Python (Numba)<\/td><td>Statistical Testing<\/td><td>4.7 \/ 5<\/td><\/tr><tr><td><strong>Arize Phoenix<\/strong><\/td><td>Observability \/ Tracing<\/td><td>Python, OSS<\/td><td>Embedding Visualization<\/td><td>4.6 \/ 5<\/td><\/tr><tr><td><strong>BEIR<\/strong><\/td><td>Zero-shot Benchmarking<\/td><td>Python, Datasets<\/td><td>Cross-domain Datasets<\/td><td>4.9 \/ 5<\/td><\/tr><tr><td><strong>DeepEval<\/strong><\/td><td>CI\/CD Unit Testing<\/td><td>Python (Pytest)<\/td><td>Automated Test Generation<\/td><td>4.7 \/ 5<\/td><\/tr><tr><td><strong>MS RELEVANCE<\/strong><\/td><td>Enterprise AI Monitoring<\/td><td>Azure, Python<\/td><td>Entropy-based Metrics<\/td><td>N\/A<\/td><\/tr><tr><td><strong>Elasticsearch<\/strong><\/td><td>Built-in Engine Eval<\/td><td>Elasticsearch API<\/td><td>Native Search Integration<\/td><td>4.5 \/ 5<\/td><\/tr><tr><td><strong>PyTerrier<\/strong><\/td><td>Search Pipeline Design<\/td><td>Python (Java Core)<\/td><td>Declarative Operator Syntax<\/td><td>4.4 \/ 5<\/td><\/tr><tr><td><strong>OpenSearch<\/strong><\/td><td>Open Source Search<\/td><td>OpenSearch API<\/td><td>Side-by-Side Comparison UI<\/td><td>4.5 \/ 5<\/td><\/tr><tr><td><strong>trec_eval<\/strong><\/td><td>Academic Rigor<\/td><td>C \/ CLI<\/td><td>Industry Standard Math<\/td><td>N\/A<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Evaluation_Scoring_of_Relevance_Evaluation_Toolkits\"><\/span>Evaluation &amp; Scoring of Relevance Evaluation Toolkits<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The following scoring rubric is based on the requirements of modern AI and Search teams in 2026.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td>Category<\/td><td>Weight<\/td><td>Evaluation Criteria<\/td><\/tr><\/thead><tbody><tr><td><strong>Core Features<\/strong><\/td><td>25%<\/td><td>Metric variety (NDCG, MAP) and support for generative AI (RAG) metrics.<\/td><\/tr><tr><td><strong>Ease of Use<\/strong><\/td><td>15%<\/td><td>API quality, documentation, and ease of integration into existing stacks.<\/td><\/tr><tr><td><strong>Integrations<\/strong><\/td><td>15%<\/td><td>Compatibility with vector DBs, LLMs, and frameworks like LangChain\/LlamaIndex.<\/td><\/tr><tr><td><strong>Security &amp; Compliance<\/strong><\/td><td>10%<\/td><td>Support for local execution, data privacy, and cloud compliance standards.<\/td><\/tr><tr><td><strong>Performance<\/strong><\/td><td>10%<\/td><td>Execution speed and ability to handle millions of results.<\/td><\/tr><tr><td><strong>Support &amp; Community<\/strong><\/td><td>10%<\/td><td>Frequency of updates, quality of tutorials, and active forum support.<\/td><\/tr><tr><td><strong>Price \/ Value<\/strong><\/td><td>15%<\/td><td>Cost of evaluation (especially LLM costs) vs. the insights gained.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Which_Relevance_Evaluation_Toolkit_Is_Right_for_You\"><\/span>Which Relevance Evaluation Toolkit Is Right for You?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The right tool depends on where you are in the search development lifecycle.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Solo Researchers &amp; Students:<\/strong>\u00a0If you are publishing a paper,\u00a0<strong>ranx<\/strong>\u00a0and\u00a0<strong>PyTerrier<\/strong>\u00a0are your best friends. They give you the speed and the &#8220;academic&#8221; credibility you need. Pair them with\u00a0<strong>BEIR<\/strong>\u00a0for your datasets.<\/li>\n\n\n\n<li><strong>Early-stage RAG Developers:<\/strong>\u00a0If you just built your first chatbot and want to know if it&#8217;s hallucinating, start with\u00a0<strong>Ragas<\/strong>\u00a0or\u00a0<strong>Arize Phoenix<\/strong>. They provide the quickest &#8220;human-like&#8221; feedback without requiring you to label thousands of rows.<\/li>\n\n\n\n<li><strong>SMBs &amp; Startups:<\/strong>\u00a0<strong>DeepEval<\/strong>\u00a0is perfect for teams that already use Pytest and want to ensure their search quality doesn&#8217;t break every time they update their prompt.<\/li>\n\n\n\n<li><strong>Search Engine Optimizers:<\/strong>\u00a0If you are using a traditional engine, the built-in\u00a0<strong>Elasticsearch<\/strong>\u00a0or\u00a0<strong>OpenSearch<\/strong>\u00a0APIs are the most practical choice. They are &#8220;free&#8221; and require zero new infrastructure.<\/li>\n\n\n\n<li><strong>Enterprise &amp; Regulated Industries:<\/strong>\u00a0Organizations with high security requirements should look at\u00a0<strong>Arize Phoenix<\/strong>\u00a0(for local hosting) or the\u00a0<strong>Microsoft RELEVANCE<\/strong>\u00a0framework for longitudinal monitoring of AI safety and performance.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Frequently_Asked_Questions_FAQs\"><\/span>Frequently Asked Questions (FAQs)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>1. What is the difference between&nbsp;<em>N<\/em><em>D<\/em><em>CG<\/em>&nbsp;and&nbsp;<em>M<\/em><em>A<\/em><em>P<\/em>?<\/strong>&nbsp;<em>N<\/em><em>D<\/em><em>CG<\/em>&nbsp;(Normalized Discounted Cumulative Gain) accounts for the&nbsp;<em>graded<\/em>&nbsp;relevance (e.g., 0-3 stars) and penalizes relevant results that appear lower in the list.&nbsp;<em>M<\/em><em>A<\/em><em>P<\/em>&nbsp;(Mean Average Precision) is generally used for&nbsp;<em>binary<\/em>&nbsp;relevance (relevant vs. not relevant).<\/p>\n\n\n\n<p><strong>2. Can I evaluate a search engine without a &#8220;Ground Truth&#8221; set?<\/strong>&nbsp;Historically, no. However, tools like&nbsp;<strong>Ragas<\/strong>&nbsp;now allow for &#8220;LLM-as-a-judge,&#8221; where a powerful model like GPT-4 evaluates the results, effectively acting as a synthetic ground truth.<\/p>\n\n\n\n<p><strong>3. Why do I need a toolkit instead of just checking a few queries?<\/strong>&nbsp;Human intuition is biased. You might check 5 queries and think the search is great, while the other 5,000 queries have completely broken rankings. Toolkits allow for statistical significance testing.<\/p>\n\n\n\n<p><strong>4. How does LLM evaluation cost money?<\/strong>&nbsp;When you use a tool like&nbsp;<strong>Ragas<\/strong>&nbsp;or&nbsp;<strong>DeepEval<\/strong>, the toolkit sends your search results to an LLM for grading. You are charged for every token the &#8220;judge&#8221; model processes.<\/p>\n\n\n\n<p><strong>5. What is &#8220;Zero-shot&#8221; evaluation?<\/strong>&nbsp;Zero-shot evaluation, popularized by the&nbsp;<strong>BEIR<\/strong>&nbsp;benchmark, tests how a search model performs on data it has never seen during training, which is the most realistic test of a general-purpose search engine.<\/p>\n\n\n\n<p><strong>6. Is trec-eval still relevant in 2026?<\/strong>&nbsp;Yes. While the interface is dated, it is the mathematical reference. Most modern Python tools like&nbsp;<strong>ranx<\/strong>&nbsp;include tests to ensure their output matches&nbsp;<code>trec_eval<\/code>&nbsp;exactly.<\/p>\n\n\n\n<p><strong>7. What is &#8220;Model Slip&#8221;?<\/strong>&nbsp;Model slip occurs when a generative AI model\u2019s performance degrades over time or across different data distributions. The&nbsp;<strong>Microsoft RELEVANCE<\/strong>&nbsp;framework is specifically built to detect this.<\/p>\n\n\n\n<p><strong>8. Can these tools handle image or video search?<\/strong>&nbsp;Most can! As long as you can provide a &#8220;list of results&#8221; and a &#8220;ground truth,&#8221; tools like&nbsp;<strong>ranx<\/strong>&nbsp;and&nbsp;<strong>PyTerrier<\/strong>&nbsp;don&#8217;t care if the documents are text, images, or products.<\/p>\n\n\n\n<p><strong>9. How do I choose between Arize Phoenix and Ragas?<\/strong>&nbsp;Choose&nbsp;<strong>Arize Phoenix<\/strong>&nbsp;if you need a visual dashboard and full tracing of your AI&#8217;s internal logic. Choose&nbsp;<strong>Ragas<\/strong>&nbsp;if you want a lightweight Python library focused strictly on getting metrics.<\/p>\n\n\n\n<p><strong>10. What is &#8220;Retrieval-Aware&#8221; evaluation?<\/strong>&nbsp;This evaluates the&nbsp;<em>retrieval<\/em>&nbsp;step of a RAG pipeline independently from the&nbsp;<em>generation<\/em>&nbsp;step. It ensures that the LLM is actually being given the right information to answer the question.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Relevance evaluation is the bridge between a &#8220;working&#8221; search engine and a &#8220;great&#8221; search engine. In 2026, the focus has shifted from simple keyword matching to complex AI-driven retrieval, making toolkits like&nbsp;<strong>Ragas<\/strong>,&nbsp;<strong>Arize Phoenix<\/strong>, and&nbsp;<strong>ranx<\/strong>&nbsp;more vital than ever. By choosing a tool that aligns with your technical stack and your evaluation goals, you can ensure that your users always find exactly what they are looking for\u2014and that your AI remains grounded in reality.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Relevance evaluation toolkits are specialized software environments designed to assess the effectiveness of ranking algorithms. These tools allow developers&hellip;<\/p>\n","protected":false},"author":32,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[5189,3444,5192,5191,1903],"class_list":["post-7884","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-informationretrieval","tag-rag","tag-relevanceevaluation","tag-searchai","tag-mlops"],"_links":{"self":[{"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/posts\/7884","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/comments?post=7884"}],"version-history":[{"count":1,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/posts\/7884\/revisions"}],"predecessor-version":[{"id":7916,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/posts\/7884\/revisions\/7916"}],"wp:attachment":[{"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/media?parent=7884"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/categories?post=7884"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gurukulgalaxy.com\/blog\/wp-json\/wp\/v2\/tags?post=7884"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}