秘密直播

News

From Tool to Teammate: Opening the Aperture on Human-Robot Teaming

Imagine a world where medics and soldiers are partnered with robots that can not only assist with complex tasks 鈥 such as transporting casualties to safety or maneuvering quickly through cities or rough terrain 鈥 but can also advise on problems and adapt to new information without human intervention.

To achieve this future, robots need a series of complex skills. Those skills include the ability to understand natural language and their surrounding environment in real time, to create and execute plans, and to evaluate their progress and replan as needed.

Researchers at the 秘密直播 Applied Physics Laboratory (APL) in Laurel, Maryland, are using generative artificial intelligence (GenAI) and cutting-edge scene-mapping technology to elevate robots from simple tools to full teammates capable of providing aid in disaster and battlefield scenarios. The team鈥檚 work is funded by the Army Research Laboratory through the Army Artificial Intelligence Innovation Institute (A2I2) and through APL鈥檚 Independent Research and Development (IRAD) program.

鈥淭his research takes advantage of cutting-edge AI technology for a significant step forward in robotics,鈥 said Chris Korpela, the Robotics and Autonomy program manager within APL鈥檚 Research and Exploratory Development Mission Area. 鈥淔ully autonomous robots will provide new and exciting capabilities in austere environments.鈥

The Current State of Robotics

Today, it鈥檚 possible to buy fairly advanced robots on the open market 鈥 although they cost about the same as a luxury car. Out of the box, these robots do not operate on their own and must receive commands via a controller. There is no option to control them using spoken language, as is possible with tasks on cell phones and tablets. Humans must perform the basic tasks of understanding the robots鈥 surroundings, creating an execution plan for the tasks the robot will perform, evaluating progress and replanning as needed 鈥 severely limiting the level to which people can meaningfully team with a robot.

鈥淲hile they鈥檙e useful for many scenarios, current robots are closer to a remote-controlled car than an autonomous vehicle,鈥 said Corban Rivera, a senior AI researcher at APL and principal investigator for this research. 鈥淭hey can be used as a great tool to enable certain operations, but humans can鈥檛 take their hands off the wheel, so to speak.鈥

Opening Robotic Eyes

An international research team with members from APL, 秘密直播 University, University of Toronto, Universit茅 de Montr茅al, Massachusetts Institute of Technology, Army Research Laboratory and University of Massachusetts Amherst created a technology that enables a more beneficial partnership to enhance robots鈥 perception and understanding of their surrounding environment. This technology 鈥 鈥 enables robots to have a near-human understanding of a 3D environment.

Using the technology, robots create 3D scene graphs that compactly and efficiently represent an environment. Through training on image-caption pairs from large language models (LLMs) and large visual language models, objects in the scene are assigned tags. These tags help robots understand the meanings and uses of objects as well as the relationships between them.

鈥淢any robots in commercial industry are created to work in factories or distribution centers, which are pristine and predictable environments,鈥 Korpela said. 鈥淭here are very different needs when robots walk through the woods, for example, where there are numerous and unpredictable obstacles in the way, from rocks on the ground to trees in their path.鈥

ConceptGraphs is open-vocabulary 鈥 meaning it is not limited to the language in its training set 鈥 which enables humans to give robots instructions in plain language, either in text or voice, rather than through fixed commands. Robots can even support multimodal queries, which combine an image and a question or instruction. For example, when given an image of Michael Jordan and asked to find 鈥渟omething he would play with,鈥 a robot was able to identify and find a basketball in the environment because of its training on image-caption pairs that provide context to images and objects.

鈥淣ow, not only can the robot build up a semantic description of the world, but you can query it in natural language,鈥 said Dave Handelman, a senior roboticist at APL and a collaborator on the project. 鈥淵ou don鈥檛 have to ask it if it sees a car 鈥 you can say, 鈥楽how me everything with four wheels,鈥 or 鈥楽how me everything that can carry me places.鈥欌

In a real-world scenario, this might translate to a medic asking a robot to locate casualties on a battlefield and transport them to safety until the medic can attend to them. The robot would be able to not only identify casualties but also determine what 鈥渟afety鈥 means and how to achieve it.

While ConceptGraphs resolved several challenges in human-robot teaming, significant obstacles still remained. Scanning and developing an understanding of the environment took a robot several minutes, and as the robot moved through the environment, more time was needed for additional scanning.

Related Work