With the maturity of large language model (LLM) technology, its application scope is expanding. From intelligent writing to search engines, the potential of LLM applications is being tapped little by little.
Recently, Microsoft Research Asia suggested that LLM can be used for industrial control, and only a small number of sample samples can achieve better results than traditional reinforcement learning methods. The study attempted to use GPT-4 to control air conditioning systems (HVAC), with quite positive results.
The paper address: http://export.arxiv.org/abs/2308.03028
In the field of intelligent control, reinforcement learning (RL) is one of the most popular decision-making methods, but it has the problem of sample inefficiency and the resulting high training cost. When an agent learns a task from scratch. The traditional reinforcement learning paradigm is fundamentally difficult to solve these problems. After all, even humans typically need thousands of hours of learning to become domain experts, which presumably corresponds to millions of interactions.
However, for many control tasks in industrial scenarios, such as inventory management, quantitative trading and HVAC control, people prefer to use high-performance controllers to handle different tasks at low cost, which is a great challenge to traditional control methods.
For example, we might want to control the HVAC of different buildings with minimal fine-tuning and a limited number of reference demonstrations. The basic principles of HVAC control may be similar for different tasks, but the dynamics of the scene migration and even the state/action space may be different.
Not only that, there are often not enough demonstrations for training reinforcement learning agents from scratch. Therefore, it is difficult to use reinforcement learning or other traditional control methods to train agents that are generally suitable for such scenarios.
Using prior knowledge of the underlying model is a promising approach. These foundational models are pre-trained using diverse data sets at the scale of the Internet and can therefore be used for a variety of industrial control tasks as a source of rich prior knowledge. The base model has demonstrated strong emergence capabilities and rapid adaptation to a variety of downstream tasks, including GPT-4, Bard, DALL-E, and CLIP. The first two are representatives of large language models (LLMS), while the latter two handle text and images.
The recent success of the underlying model has given rise to a number of ways to use LLMS to execute decisions. These approaches fall broadly into three categories: fine-tuning the LLM for specific downstream tasks, combining the LLM with trainable components, and directly using the pre-trained LLM.
While previous studies have used basic models to conduct control experiments, typically selecting tasks such as robot control, home assistants, or gaming environments, the team at Microsoft Research Asia focused on industrial control tasks. For traditional reinforcement learning methods, this task has three difficulties:
1) Decision agents typically face a heterogeneous set of tasks, such as having different state and action Spaces or migrating dynamic situations. Reinforcement learning methods require training different models for heterogeneous tasks, which is expensive to do.
2) The development process of decision agents requires very low technical debt, which indicates that the number of samples provided is not enough (or may not even be available), and traditional reinforcement learning algorithms require big data to train, so it may not be possible to design models for specific tasks.
3) Decision agents need to quickly adapt to new scenarios or changing dynamics in an online manner, such as relying entirely on new online interaction experiences without training.
To address these challenges, researchers such as Lei Song of Microsoft Research Asia have proposed directly using pre-trained LLMS to control HVAC. This method only needs a small number of samples to solve heterogeneous tasks, and its process does not involve any training, and only uses samples as examples of small sample learning for context learning.
The goal of the study was to explore the potential of using pre-trained LLMS directly to perform industrial control tasks. Specifically, they designed a mechanism to pick examples from expert presentations and historical interactions, as well as a prompt generator that converts goals, instructions, presentations, and current states to prompt. The generated prompt is then used to give control through the LLM.