IS200TBCIH1CCD Humanoid robots and AI large models: Universal scenarios accelerate revolutionary C-terminal advancement

4. Humanoid robots and AI large models: Universal scenarios accelerate the revolutionary advancement of the C-end

With the continuous breakthrough of key technologies such as integrated design technology, motion management and control technology, and sensor perception technology, as well as the continuous integration and application of new generation information technologies such as artificial intelligence and 5G, special robots are accelerating their application in coal mines, deep sea, polar and other scenes, releasing huge production and scientific research value. Among them, what most makes cutting-edge technology companies and ordinary consumers “fascinated” is the emergence and iteration of intelligent mobile robots represented by humanoid robots.

At present, AI technology makes it possible for robots to operate autonomously by building intelligent systems with comprehensive perception, real-time interconnection, analysis and decision-making, and autonomous learning. AI strengthens the robot’s perception ability through robot vision technology, and improves its ability to analyze and make decisions and learn independently by building an algorithm model, so that the robot can complete tasks independently.

1. The ability to perceive the world (robot eyes)

Laser and visual navigation are the main applications in the sensing and positioning technology of robot autonomous movement. The development of computer vision has experienced the traditional vision methods represented by feature descriptors and the deep learning technology represented by CNN convolutional neural network. At present, the general vision large model is in the research and exploration stage. The scene of humanoid robot is more general and complex than that of industrial robot. The multi-task training scheme of All in One vision model can make the robot better adapt to the human life scene.

On the one hand, the strong fitting ability of large models enables humanoid robots to have higher accuracy in target recognition, obstacle avoidance, three-dimensional reconstruction, semantic segmentation and other tasks. On the other hand, the large model solves the problem that deep learning technology relies too much on the data distribution of a single task, and the scene generalization effect is poor. The general vision large model learns more general knowledge through a large amount of data and migrates to downstream tasks. The pre-trained model obtained based on massive data has better knowledge completeness and improves the scene generalization effect.

Typical product: Tesla “Optimus”

At the perceptual level, the Tesla robot head uses eight cameras to collect visual information. At the computational level, the robot will use the FSD (Full Self-Driving) computer used by Tesla cars at present, and use neural networks and other models to process information in real time. Tesla will use the supercomputer “Dojo” to train the AI model used by the robot to recognize and react to external objects more efficiently.

IS200TBCIH1CCD

IS200TBCIH1CCD

IS200TBCIH1CCD

2. Ability to think and make decisions (robot brain)

The current robots are dedicated robots, can only be applied in the limited scene, even if the robot grasp, based on computer vision, is still in the limited scene, the algorithm is only used to identify objects, how to do, do what still need to be defined by people. To make the robot universal, ask him to water the flowers, he knows to get the kettle, get the water, and then water the flowers, which is something that requires common sense to complete. How can a robot have common sense? Before the advent of large models, this problem was almost unsolvable. Large models allow robots to have common sense and thus have versatility to complete various tasks, completely changing the mode of universal robot implementation. The adaptability of human tools and the environment, no longer need to build tools for robots.

Typical product: The first robot citizen “Sophia”

In 2017, Sophia became the first robot in the world to be granted citizenship. She said she would use her wisdom to help human development, let us not be afraid of her, she is very friendly. In 2018, she also became the world’s first AI teacher to conduct an online education course. Sophia said that the robot of the future is fully qualified for the work of teachers, and can effectively solve the psychological and emotional problems encountered by students based on interaction with students.

3. Executive ability (robot limbs)

Mobility (legs) + fine manipulation (hands). The purpose of making robots humanoid is to make the robot’s executive ability more universal. The environment in which robots perform their tasks is built to fit the human body: buildings, roads, facilities, tools, etc. The world is designed for the convenience of humanoid beings. If a new form of robot emerges, a completely new environment for the robot will have to be redesigned. It is relatively easy to design a robot to perform a task within a specific range, and if you want to improve the versatility of the robot, you must choose a humanoid robot that can act as a doppelgge. In addition, humans and humanoid robots are more likely to have emotional communication, and humanoid robots will make people feel close.

Typical product: Boston Dynamics Atlas

In December 2020, Boston Dynamics released a video of Atlas dancing, with smooth and expressive movements. In dance, the robot needs to adjust its posture while jumping in the air in order to maintain balance and make precise movements. In August 2021, in the latest official video, Atlas can “parkour” in an obstacle environment, making a series of difficult full-body actions such as jumping, diving and tumbling, and somersault.

4. “Embodied Intelligence” + robots: the ultimate form of artificial intelligence

The problem of how to make a computer feel and act like a one-year-old child gave birth to the concept of “embodied intelligence.” It can be simply understood as a variety of different forms of robots, so that they perform a variety of tasks in the real physical environment to complete the evolution of artificial intelligence, such as: humanoid robots, intelligent driving cars, or future “Transformers”.

Nvidia founder Jen-Hsun Huang said at the ITF World 2023 Semiconductor Conference that Embodied AI is an intelligent system that can understand, reason, and interact with the physical world, and is the next wave of artificial intelligence.

The biggest characteristic of embodied intelligence is that it can autonomously perceive the physical world from the protagonist’s perspective and learn with anthropomorphic thinking path, so as to make behavioral feedback expected by human beings, rather than passively waiting for data feed. Humanoid robots provide a variety of learning and feedback systems based on human behavior, providing an iterative foundation and testing ground for implementing more complex behavioral semantics. Therefore, the gradual improvement of humanoid robots also provides the direction for the landing of embodied intelligence, which is an important application scenario of embodied intelligence, and will also provide direction and space for the iterative optimization of embodied intelligence.