Faster without Risk
Using Feature Flags to Cut ChatGPT Operating Costs
ChatGPT, a renowned transformer-based large language model, has revolutionized human-machine interaction by offering intelligent and natural conversational experiences. It has led to the development of numerous innovative software applications. However, the operational costs associated with ChatGPT remain a significant challenge in many scenarios. In this article, we will explore strategies to reduce the costs of using ChatGPT through feature flags.
There are principally two methods to reduce ChatGPT's operating costs:
- Limiting the number of input tokens and requests made to ChatGPT.
- Utilizing smaller models in place of ChatGPT.
Each method, however, could potentially degrade the user experience. Reducing input tokens or switching to smaller models might result in inadequate responses to user queries. Therefore, a blend of smaller models and ChatGPT might be an optimal approach for balancing cost reduction with user experience. The choice depends on specific scenarios and user requirements. For instance, a 7B model could suffice for summarizing a blog, while GPT-3.5-Turbo with 16k tokens might be more appropriate for blog writing, offering a lower-cost solution.
Feature flags are a development technique used to manage the rollout of new features or code segments. They enable control over request traffic and determine which language model will handle a specific request. The diagram below illustrates how feature flags can be used to control the deployment of new functionalities or code sections.
When a request comes to your API interface, you simply need to call a function to determine which model to use. This can be implemented with a very simple switch statement.
Imagine you have a chatbot capable of summarizing articles submitted by users. You've observed some patterns:
- Users seldom use this feature during the night.
- For lengthy articles, the optimal user experience is achieved using the ChatGpt 4 32K model.
- You are developing an agent to handle long articles using cost-effective models like OpenChat, but it's still in the testing phase.
Here's how you can utilize feature flags to minimize costs:
- From 12 PM to 8 AM, shut down the servers hosting your open-source large language models. This way, you incur no costs when they are not in use. During this period, rely solely on ChatGpt models.
- For articles exceeding 3000 words, use ChatGpt 3.5 and above. For those over 8000 words, employ ChatGpt_4_32K.
- For articles with a higher word count, you can experiment with your agent on a small user segment to test and refine. Continue this until your agent can effectively handle most long-text articles.
To achieve the goal, utilizing free feature flag tools is crucial. You need to make a slight modification to the code we discussed in the 'Code Example' chapter.
We need to provide the feature flag service with some information about your request, such as time, article length, and user ID, to identify which users you can test with for your own agent. Below is a code sample:
A feature flag controls the result of the switch variable. You need to use the feature flag service's control center to configure and update the strategy. As shown in the image below, it adds two rules to control the switch variable remotely and in real time.
- From 12 PM to 8 AM & less than 3000 words, don't use Open Source LLMs.
- From 12 PM to 8 AM & less than 3000 words, use Open Source LLMs
The image below shows how you can modify the second rule from the above image to release your testing agent to a small segment of users (5% of all users).
If you want to learn more about how to use Feature Flags effectively, you can follow the tutorials provided by many open-source feature flags tools.
This article has demonstrated the strategic use of feature flags to effectively manage ChatGPT operational costs, focusing on switching between different models to find the optimal one for each situation. By selectively using various model sizes at the most appropriate times, such as employing smaller models for simpler requests or during low-usage periods, we achieve a cost-effective compromise. This approach ensures that while we maintain high-quality user experiences, we also keep operational expenses in check. Feature flags offer a dynamic and adaptable solution, crucial for maximizing ChatGPT's potential in a financially sustainable way.