A team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University and Microsoft (NASDAQ:) Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing.
The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs, such as code attacks, before they happen.