Fusing Cobots with the Power of Large Language Models

Today, the task of integrating robots into a production environment involves mapping off spaces, defining robot paths, integrating them with software systems, and training human operators on what to expect. But if we want truly collaborative robots, or cobots for short, they should respond dynamically to the environment around them, and to us. Ideally, we should be able to talk to them, provide feedback, give guidance, or ask them to take care of a task for us. The combination of large language models (LLMs), voice recognition and speech synthesis is bringing this goal closer by redefining how we interact with machines.

Instead of customized rules and complex integrations, imagine a new robot showing up able to immediately wander around, observe the space, ask questions, and then begin to take on tasks. This is what we believe a true cobot should be able to do. We have this working in our labs. Let's dig a little deeper into how we made it happen.

The breakthrough in our labs didn't happen overnight. In fact, it didn’t exactly start “in our lab.” Instead, we started in a digital twin of our lab that we created in NVIDIA Isaac Sim. The combination of Isaac Sim’s photorealistic environments, PhysX physics engine, and high fidelity sensor simulation allowed us to develop our perception and planning entirely in simulation. The magic came in when we combine our perception and planning and integrate them with large language models like GPT-3.5 and GPT-4.

These models, known for their ability to process and generate human-like text, are now at the core of our robots' decision-making processes. They enable robots to understand instructions in natural language, process them, and respond appropriately. This capability marks a significant departure from traditional, command-driven robotics, opening a new chapter in human-robot collaboration. It's not just about robots understanding us; it's about them becoming an active, learning part of our teams.

Next, we'll explore the specific role of an intermediary robot programming language, which acts as the crucial link between human speech and robotic action, ensuring this collaboration is as seamless and effective as possible.

The Role of an Intermediary Robot Programming Language

The core innovation enabling this new level of human-robot collaboration is our Auditable Collaboration and Planning Framework (“ACoP” for short). This language serves as a crucial bridge, translating the complexities of human speech and text into precise commands that robots can understand and execute. ACoP’s role in this ecosystem is pivotal, as it ensures that the fluidity of human communication is accurately and safely rendered into robotic actions.

At its essence, ACoP functions as a translator and mediator. When a person speaks to a robot, LLMs processes this input, interpreting the intent and context. However, the leap from understanding a command to executing it involves several layers of complexity. This is where ACoP comes in. It takes the interpreted command and converts it into a structured format that the robot's control systems can understand and act upon. This process involves breaking down natural language inputs into discrete, actionable tasks while maintaining the nuances of the original instruction. The ACoP plan is generated by the LLM in the cloud using OpenAI's API endpoints.

Why can't we directly feed natural language inputs into robotic systems? The challenge lies in the need for precision and reliability, especially in environments where safety is paramount. ACoP addresses this by adding a layer of predictability and control. It ensures that every command is not only understood but also executed within the bounds of safety and operational parameters. For instance, a request to "move faster" is interpreted by the language model, and ACoP translates it into specific speed parameters that align with the robot's capabilities and safety protocols.

Moreover, ACoP is designed to be auditable. In an environment where robots are making decisions based on human interactions, having a traceable and reviewable process is essential. This aspect of ACoP provides transparency and accountability, allowing operators to review and understand how each command was interpreted and executed. This feature is not just about ensuring operational efficiency; it's about building trust in a system where machines are making increasingly autonomous decisions based on human input.

In summary, ACoP is not just a technical solution; it's a gateway to a future where robots understand and interact with us in a manner that is both intuitive and safe. Its development marks a significant step in our journey towards creating truly collaborative robots that can work alongside humans as intelligent, responsive partners. In the next section, we'll delve into a practical application of this technology, showcasing its transformative potential in a real-world setting.

Practical Application: Healthcare

To illustrate the practical application of ACoP and LLMs in human-robot collaboration, let’s turn to a healthcare setting, a domain where precision and reliability are not just ideals but necessities. In this environment, our robots are not just performing tasks; they are becoming an integral part of the healthcare team, enhancing efficiency and supporting patient care.

Imagine a busy hospital ward, where nurses and doctors are often stretched thin. Here, a robot equipped with ACoP can make a significant difference. For example, a nurse might say to the robot, “We’re running low on surgical masks in Room 5.” The LLMs processes this request, understanding the context and urgency, and ACoP translates it into a specific set of instructions. The robot then navigates to the supply room, retrieves the masks, and delivers them to Room 5, all while maneuvering safely through the bustling corridors of the hospital.

This scenario demonstrates more than the robot’s ability to understand and execute tasks. It shows how ACoP enables the robot to integrate into dynamic environments, where conditions and requirements can change rapidly. In this case, the robot needs to understand not only explicit commands but also the implicit priorities and safety protocols of a hospital. It must navigate crowded spaces, recognize and avoid obstacles, and interact with people in a manner that is both efficient and non-intrusive.

Beyond task execution, the adaptability of these robots is crucial. In a healthcare setting, new situations and unexpected challenges are common. ACoP’s flexibility allows the robot to learn from each interaction and improve its performance over time. For instance, if the robot finds a certain corridor frequently congested, it can learn to choose an alternate route, thereby optimizing its efficiency and reducing the risk of disruption.

This example is just a glimpse into the potential of ACoP and LLMs in transforming how robots can be used in real-world settings. By enabling robots to understand and adapt to their environment in a human-like manner, we are not just automating tasks; we are creating intelligent assistants that can enhance the capabilities of human teams. As we move forward, the possibilities for such technology extend far beyond healthcare, opening doors to innovative applications in various industries.

Building Toward a Trustworthy Future

The future potential of this technology extends beyond individual sectors like healthcare. In logistics, for example, robots could autonomously manage inventory, using ACoP to interpret and execute complex instructions in real-time. In customer service, robots equipped with ACoP could interact with customers more effectively, understanding and responding to queries with human-like understanding.

The effectiveness of ACoP in our lab settings, for instance, is just a first step. As we move forward, the emphasis must remain on developing these systems with a focus on trustworthy interactions. Interactions that are fundamentally safe, reliable, and transparent. Today, we perform the perception and planning on the robot using an NVIDIA Orin AGX module to fuse our sensor data and run our perception and planning algorithms, while we run the speech and language-model processing in the cloud. In the future, we look to bring all of this onboard as we move the LLM processing to the edge. NVIDIA has already demonstrated this is possible by quantizing open source LLMs to run on an NVIDIA RTX A6000 GPU attached to an Orin AGX through their Holoscan SDK. This approach will be crucial in ensuring that the adoption of AI-driven robotics serves the needs of the human operators rather than requiring humans to adapt to robots, even when network connectivity isn’t robust.

Our journey towards advanced human-robot collaboration is marked by a commitment to developing that trustworthy future. As we explore the boundaries of AI and robotics, our focus remains on creating systems that enhance human capabilities while adhering to the highest standards of operational safety. This balanced approach paves the way for a future where robots are not just tools, but trusted partners in our daily lives and work environments.

News

Blog

Fusing Cobots with the Power of Large Language Models

The Role of an Intermediary Robot Programming Language

Practical Application: Healthcare

Building Toward a Trustworthy Future