The goal, the researchers say, is to explore how different design approaches affect LLM performance on industrial control tasks, and many aspects of the approach are difficult to control.
First, although the concept of this method is simple, its performance is not clear compared to traditional decision methods.
Second, the ability of the basic model to generalize to different tasks (e.g. for different contexts, action Spaces, etc.) remains to be studied.
Third, the sensitivity of the approach to different designs of language wrappers is also worth investigating (for example, which part of the prompt has the greatest impact on performance).
By answering these questions, the researchers hope to highlight the potential of these approaches and show how solutions can be designed for industrial control tasks with low technical debt.
The main contributions of this paper include:
A method for applying the basic model to industrial control without training is developed, which can be used for a variety of heterogeneous tasks with low technical debt.
The researchers conducted experiments using GPT-4 to control HVAC and obtained positive experimental results, demonstrating the potential of these methods.
The researchers conducted extensive ablation studies (involving generalization, sample selection, and prompt design) to clarify future developments in this direction.
method
The study used GPT-4 to optimize the control of HVAC equipment, and the workflow is shown in Figure 1 below:
Figure 1: Schematic diagram of the workflow for controlling HVAC using GPT-4
The LLM and environment components in this workflow are as follows:
LLM: A pre-trained large language model used as a decision maker. It generates a response based on the prompt given. The prompt should contain a description of the current status, simple HVAC control instructions, a demonstration of the relevant status, etc.
Environment: An interactive environment or simulator that performs the actions recommended by the LLM and provides feedback. The specific assessment environment used in the experiment was BEAR (Zhang et al., 2022a). In order to create an environment in BEAR, two parameters must be provided: building type (such as large office, small office, hospital, etc.) and weather conditions (such as hot dry, hot humid, warm dry, etc.). Also, it is worth noting that each weather condition corresponds to a specific city. For example, hot, dry weather conditions are associated with Buffalo.
In BEAR, each state is represented by a numerical vector where each dimension, except the last four, corresponds to the current temperature of a room in the building. The last four dimensions represent outdoor temperature, global horizontal radiation (GHI), ground temperature, and occupant power. In all environments, the primary goal is to maintain room temperature around 22 ° C while minimizing energy consumption.
Operations in BEAR are encoded as real numbers ranging from -1 to 1. Negative values indicate cooling mode and positive values indicate heating mode. The absolute value of these actions corresponds to the degree to which the valve is opened, which indicates the energy consumption. If the absolute value is greater, then the energy consumption is greater. Taking into account both comfort and energy consumption, the researchers used the following reward function in the experiment:
Where n represents the number of rooms, T=22 ° C is the target temperature, and t_i represents the temperature of the i th room. The hyperparameter α is used to achieve a balance between energy consumption and comfort.
In addition, the workflow includes online buffers, translators, embedded models, expert presentation datasets, KNN models, clustering models, prompt generators, and more. The execution of the prompt generator is shown in Figure 2, where the purple text is for illustration only, not part of the prompt.
Figure 2: How does the new method generate prompt
experiment
The study demonstrated the effect of GPT-4 control of HVAC equipment through experiments involving different building and weather conditions. As long as appropriate instructions and demonstrations are provided (not necessarily related to the target building and weather conditions), GPT-4 can outperform reinforcement learning strategies carefully trained for specific building and weather conditions. In addition, the researchers conducted a comprehensive ablation study to determine the contribution of each component in the prompt.