The heat of large models of artificial intelligence also urges the heat of humanoid robots.
“Simply put, at present, humanoid robots, including AI applications, as long as the technology of existing large models is transplanted or reduced, it is enough for humanoid robots.” Recently, Wang Xingxing, founder and CEO of Yushu Technology, made the above judgment to reporters.
In his view, humanoid robots have experienced decades of research and development in various universities and scientific research institutes, and there have been highs and lows in the past, but with the AI iteration speed getting faster and faster, the current human control technology has the hope of controlling humanoid robots such complex robot forms, and the progress rate of AI this year has far exceeded the technology required by robots.
In fact, there are many people in the industry who hold this view. At the 2023 World Robot Conference – General Humanoid Robot Technology and Industrial Innovation Forum held on the afternoon of August 18, Yao Qizhi, academician of the Chinese Academy of Sciences and dean of the Institute of Interdisciplinary Information of Tsinghua University, said on the spot that ChatGPT’s ability is currently only mainly reflected in the processing of language, and if the future really let general artificial intelligence play its value, AGI must require embodied entities to interact with the real physical world to accomplish various tasks. The humanoid robot is one of the most ideal forms of intelligent landing.
Hardware, algorithms and models
In the view of many industry experts, humanoid robots basically have three parts, the first part is the body, the second part is the cerebellum, and the third part is the brain. The body must have enough hardware, such as sensors and actuators, the cerebellum will dominate visual and tactile perception to control the body and complete complex tasks, and finally the brain will lead the upper level logical reasoning, decision-making, long-term planning, and natural language communication with other agents and the environment.
In terms of body shape, Yao Zhizhi said that because the human form can adapt to various environments, and the human social environment is mainly customized for humans, such as the structure of stairs, the height of doorknobs, the shape of quilts, etc., so if you can create a universal robot with universal applications, the human form is the most appropriate form.
In terms of the cerebellum of humanoid robots, Yao Zhizhi pointed out that a set of robot motion control algorithms is built on the robot entity and plays the role of the cerebellum. The upper layer is a solid state planning layer, and the lower layer is a real-time whole-body motion control based on dynamics, which can calculate the precise instructions given to the motor joints and match the corresponding state of the holder.
However, he also said that the industry has not yet achieved better control of robots on the algorithmic side of building the cerebellum, so the industry is also using artificial intelligence and reinforcement learning methods to study more flexible corresponding strategies.
Among them, the advantage of using reinforcement learning framework is that it has no model limitations, so it can show stronger adaptability in complex and uncertain environments, and it can use actual data of human movement to give better guidance to deep learning.
“Through reinforcement learning, robots are able to simulate human walking in a natural way, which also allows them to consume less energy, but the problem with reinforcement learning applications is that they require a large number of samples, which has been a challenge for the industry.” Yao Zhizhi said.
Therefore, how to achieve more efficient sample learning ability through the architecture of innovative algorithms is the direction that the industry needs to continue to explore.
In addition, in Yao’s view, another problem that puzzles reinforcement learning is the generalization of humanoid robots, that is, whether the system can generalize better for the uncertainty and interference between these tasks and their environment.
“The palm-e launched by Google is a very important technology route for the industry, but this framework also has a problem, that is, its next layer may not be able to perform well on the previous layer of planning, especially if there are some unexpected interruptions in the middle.” The solution is to first describe the tasks required by the robot like a large language model, and the robot will perform the tasks according to the task, such as moving boxes. On the right side of the robot is a camera, in which its visual language model will detect whether there is an accident and how to correct it through this perspective. Seeing the box fall to the ground, the robot was able to figure out a way to finally pick it up and finally complete the task.” Yao Zhizhi explained.