Microsoft has recently introduced significant enhancements to Bing search, integrating advanced language models to improve both speed and accuracy. This update aims to reduce operational costs, providing users with faster and more precise search results. By optimising Bing’s underlying technology, Microsoft is positioning its search engine to deliver a more efficient experience without compromising the quality of the results.

The key to these improvements lies in the combination of large and small language models, which work in tandem to better understand and respond to user queries. This dual-model approach allows Bing to provide more relevant and accurate search results, improving the overall user experience. In addition, Microsoft has partnered with NVIDIA to leverage its cutting-edge technology, which has enabled the company to reduce operational costs and enhance the search engine’s latency.

This integration of new technology is expected to result in quicker response times when users search for information, all while maintaining the integrity of the results. Bing’s ability to balance speed and accuracy, without sacrificing the quality of the information provided, is central to the success of this update. With these advancements, Microsoft aims to ensure that Bing remains competitive in the ever-evolving search engine landscape, providing users with a faster, smarter, and more cost-effective platform.

Microsoft has unveiled a major update to Bing’s search infrastructure, incorporating both large language models (LLMs) and small language models (SLMs) along with new optimisation techniques. The goal of this update is to improve the performance of Bing’s search results while simultaneously reducing operational costs.

In a recent announcement, the company highlighted its commitment to advancing search technology. Microsoft stated, “At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities.” The company further explained that although transformer models had proven effective in the past, the increasing complexity of user queries demanded more powerful models to deliver more accurate and relevant results.

This shift in Bing’s infrastructure represents a significant step forward in refining search capabilities, with the promise of faster and more precise search results, catering to the growing needs of users and businesses alike.

 

Performance Gains

The use of large language models (LLMs) in search systems has undeniably brought about significant advancements in search technology, but it has also introduced a set of challenges. Specifically, LLMs can be expensive to run, and they can slow down response times due to their complexity and size. These issues can lead to inefficiencies in search result delivery, making it difficult to scale the system effectively while keeping costs under control.

In response to these challenges, Bing has introduced a novel solution: small language models (SLMs). According to Microsoft, SLMs offer a substantial improvement in both speed and efficiency, claiming they are up to 100 times faster than their LLM counterparts. As stated in their announcement, “LLMs can be expensive to serve and slow. To improve efficiency, we trained SLM models (~100x throughput improvement over LLM), which process and understand search queries more precisely.” This shift to smaller models allows Bing to optimise the speed at which search queries are processed without compromising on accuracy or relevance.

Furthermore, to ensure that SLMs perform optimally, Bing has partnered with NVIDIA and integrated TensorRT-LLM into their infrastructure. TensorRT-LLM is a specialised tool designed to improve the performance of large models running on NVIDIA GPUs. By optimising how these models are executed, TensorRT-LLM helps reduce both the time and cost associated with their operation. This synergy between the SLMs and the TensorRT-LLM tool ensures that Bing’s search system can handle queries more quickly and efficiently, offering faster, more precise results for users.

This combination of advanced small language models and cutting-edge optimisation technology positions Bing as a leader in the search space, allowing it to scale more effectively and continue providing high-quality search experiences at a lower operational cost. By adopting these innovations, Bing demonstrates its commitment to improving the user experience while keeping up with the ever-evolving demands of search technology.

 

Impact On “Deep Search”

 

According to a technical report from Microsoft, the integration of NVIDIA’s TensorRT-LLM technology has significantly enhanced the company’s “Deep Search” feature, improving the overall performance of Bing’s search infrastructure. Deep Search is designed to leverage small language models (SLMs) in real-time, providing users with more relevant and accurate web results by processing queries more efficiently.

Before the optimisation, Bing’s original transformer model was operating at a 95th percentile latency of 4.76 seconds per batch, handling 20 queries, with a throughput of just 4.2 queries per second per instance. However, after implementing TensorRT-LLM, both latency and throughput saw substantial improvements. The new setup reduced latency to 3.03 seconds per batch and increased throughput to 6.6 queries per second per instance. This marks a 36% reduction in latency and a significant 57% decrease in operational costs.

Microsoft emphasised that this upgrade does not come at the cost of quality. The company explained, “Our product is built on the foundation of providing the best results, and we will not compromise on quality for speed. This is where TensorRT-LLM comes into play, reducing model inference time and, consequently, the end-to-end experience latency without sacrificing result quality.” By incorporating TensorRT-LLM, Microsoft has been able to enhance the performance of its search system, ensuring faster query responses while maintaining the relevance and accuracy of the search results, a crucial aspect of user satisfaction.

 

Benefits For Bing Users

This update brings several potential benefits to Bing users, with the primary improvement being faster search results. By optimising inference and enhancing response times, Bing aims to deliver quicker results, ensuring users get the information they need without unnecessary delays.

In addition to speed, the update also boosts the accuracy of search results. With the enhanced capabilities of Small Language Models (SLMs), Bing can now offer more contextualised and relevant results, improving the overall user experience.

Furthermore, this update brings cost efficiency to the platform, allowing Bing to reduce operational costs. This efficiency frees up resources, enabling the company to invest in further innovations and improvements, ensuring continuous enhancement of its search capabilities.

 

Why Bing’s Move to LLM/SLM Models Matters

Bing’s decision to switch to Large Language Models (LLMs) and Small Language Models (SLMs), alongside the integration of TensorRT optimisation, could significantly shape the future of search engines.

As users begin to ask more complex questions, search engines must evolve to better understand these queries and deliver relevant results in a timely manner. Bing aims to tackle this challenge by employing smaller language models and advanced optimisation techniques to improve both speed and accuracy.

While the full impact of these changes remains to be seen, Bing’s move marks the beginning of a new era in search, potentially setting the stage for more efficient and intelligent search experiences in the future.

 

 

More Digital Marketing BLOGS here: 

Local SEO 2024 – How To Get More Local Business Calls

3 Strategies To Grow Your Business

Is Google Effective for Lead Generation?

What is SEO and How It Works?

How To Get More Customers On Facebook Without Spending Money

How Do I Get Clients Fast On Facebook?

How Do I Retarget Customers?

How Do You Use Retargeting In Marketing?

How To Get Clients From Facebook Groups

What Is The Best Way To Generate Leads On Facebook?

How Do I Get Leads From A Facebook Group?

How To Generate Leads On Facebook For FREE

How Do I Choose A Good SEO Agency?

How Much Should I Pay For Local SEO?

>