AI Agents from a product perspective

Jun 24, 2024

AI Agents represent a significant advancement in artificial intelligence, leveraging natural language processing (NLP) to understand queries, generate relevant responses, and interact effectively with users. They excel in perceiving their environment, making informed decisions, and executing actions to achieve specific objectives. If this concept is new to you, let me share my insights and product perspectives on the fascinating world of AI agents and multi-agent systems.

What is AI Agent?

First, a quick example to tell the difference between LLM, RAG, and AI Agent:

LLM: Provide general advice on financial management.

RAG: Find relevant blogs and articles about financial planning.

AI Agent:
  >> Create a personalized budget based on your income and expenses.
  >> Monitor your account balances and transaction activity.
  >> Recommend suitable investment opportunities.
  >> Automatically pay bills and send reminders for upcoming payments.
  >> Prioritize high-late-fee payments when your cash flow is tight.

AI agents are distinguished by their ability of:

Tool-Use: AI agents utilize APIs or directly interact with external systems, enabling data retrieval, action execution, and smooth integration with other software.
Reasoning: They employ logical processes to interpret data, make decisions, understand context, infer meaning, and predict outcomes.
Memory: Storing information allows these agents to learn from past experiences and improve their performance over time.
Self-Reflection: Advanced AI agents analyze their own actions and outcomes, refining their strategies and behaviors for continuous improvement.

However, Reasoning and Self-Reflection are extremely challenging, because the data that these models are trained on are normally snapshots of the conclusions rather than a thinking process. If there are large-quantity of data where the whole process of perception-comtemplation-decision-reflection-iteration is labeled, AI-agents might become much more successful in reasoning.

If you have heard about Reinforcement Learning (RL), you might find “interaction with human and environment” familiar. RL agents function through interactions with their environment, performing actions, receiving rewards or penalties, and adjusting their strategies based on the states of the environment. This trial-and-error approach helps them learn optimal behaviors to maximize cumulative rewards over time. In contrast, Generative AI agents use pre-trained models to infer patterns and structures from given inputs to generate coherent and relevant outputs. They excel at content creation and problem-solving by leveraging learned patterns, rather than through continuous interaction and adaptation within an environment.

Gen-AI Agents Landscape

Many companies are leveraging Gen-AI agents to revolutionize various industries, including code and software development, customer service, data analysis, marketing and sales, etc. These agents utilize advanced machine learning models to automate time-consuming tasks such as code writing, debugging, optimization, and generating creative content. By streamlining these processes, Gen-AI agents enhance efficiency and productivity, allowing professionals to focus on more strategic and innovative aspects of their work, ultimately driving significant advancements across multiple sectors.

Coding: GitHub Copilot, TabNine, DeepCode, Devin
Customer Service: Ada, Forethought, Intercom (Resolution Bot), Sierra
Data Analytics: Hex, Sisu Data, Tellius.
Sales & Marketing: Drift, Conversica, Clari
Financial Services: Kasisto, Zest AI, Sigma Ratings,
Human Resources: Pymetrics, HireVue, Humu
Legal Services:Luminance, LawGeex
Education: Evisort, Squirrel AI, Gradescope, Duolingo

HireVue Leads interviewing Transformation: HireVue Builder — A product demo of HireVue where AI automatically detects candidate reponses and gives ratings

We see a trend where AI is solving the more long-tail problems everywhere, those occur infrequently but collectively represent a significant portion of issues or opportunities, such as Youtube to detect and flag inappropriate or non-compliant content in videos, even for less popular ones.

Paradigms of AI Agents

Fully Autonomous - Operate independently without human intervention, making decisions and executing tasks autonomously.
Semi-Automation - Perform tasks with some level of human input or supervision, operating autonomously under human guidance.
Fixed Workflow - Follow predefined instructions or workflows, operating within their programming constraints.
Nocode (Black Box) - Focus on ease of use and integration, operating without exposing internal workings.
High Code (White Box) - Fully customizable, allowing developers to understand and modify internal processes.

There are also many tools and framework for building AI-driven chatbots and virtual assistants: Rasa, Hugging Face, Dialogflow, Microsoft Bot Framework, Amazon Lex, Wit.ai, IBM Watson Assistant, Chatbot.com, SnatchBot, Landbot.io, Kore.ai, Pandorabots, , Coze.

Benchmarks for AI Agents

How to decide how good an AI agent is? Many companies have introduced their own AI agent benchmarks, such as WebArena, SWE-bench, and Agentbench. These AI Agent Benchmarks measure overall performance across various tasks and scenarios, assessing capabilities and limitations. You can find a full list here.

Most recently, Sierra's AI research team released TAU(𝜏)-bench, a sophisticated appraoch to assess AI agents' reliability and performance in real-world settings involving dynamic user and tool interactions. Unlike existing benchmarks, 𝜏-bench measures agents' abilities to complete complex tasks over multiple exchanges with simulated users and APIs, adhering to domain-specific policies consistently. In order to provide a comprehensive framework for developing and assessing AI agents, they measure these three abilities for AI Agents:

interact seamlessly with both humans and programmatic APIs over long horizons
accurately follow complex policies or rules specific to the task or domain
maintain consistency and reliability at scale, across millions of interactions.

I find that using mock data APIs allows for controlled and reproducible testing conditions, but they may not capture the full variability and unpredictability of real-world data. Real-world data often includes noise, inconsistencies, and unexpected user behaviors that are difficult to simulate accurately. I would love to see a Hybrid Approach by combining mock data APIs with real-world data in the testing process could provide a more comprehensive evaluation.

Reflections from a Product Perspective

Why are most AI-Agent products B2B instead of B2C?

From the Supply side: we all know AI could make mistakes. When algorithms mess up, the nearest human needs to be there to bear the blame. High-risk industries such as healthcare and finance demand highly reliable and accurate solutions, as mistakes can have significant consequences, so it often requires another human layer between AI and users to control.

From the Demand side: A survey shows consumers don’t like to talk to AI but rather a real human. B2B organizations are significantly more likely to already be using AI for creating content as well as for the organization.

Are startups doomed without proprietary data?

Although large established software providers hold significant advantages due to their comprehensive one-stop-shop solutions and dataset, opportunities in Integrated Software Solutions remain as it will be more lightweight for them to synthesize the multiple platforms.

For example, Microsoft is at the perfect position to enable Ai synergy between all their products but find it hard to fully integrate their multiple products given the complex organizational structure and the “who lead whom” problem. This opens opportunities for startups to innovate by offering niche solutions that bridge these integration gaps, enhancing functionality and user experience.

Otter AI is a great example on integrating tech giants platforms: Zoom, Google Meet, Microsoft Teams, Slack, and even Dropbox to facilitate a closed-loop workplace experience.

Imperfect technology can lead to successful products.

Even though agent technology is not yet fully mature, imperfect products can still achieve success. In some cases, AI products do not need to be flawless to provide significant value, especially in markets with higher error tolerance. Functionality and convenience often take precedence over perfection.

For instance, this AI-powered PowerPoint generators may produce just-so-so presentations and below-the-average contents, but users prioritize their accessibility and cost-effectiveness and can always adjust the content by themselves.

In emerging markets, there is typically a higher flexibility regarding quality, as long as the products meet basic needs. This adaptability allows AI technologies to gain traction and deliver practical benefits despite their imperfections, highlighting the potential for widespread adoption and continuous improvement.

Innovative Data Collection

Data is always the gold and start-ups normally don’t have it. However, companies can creatively gather and refine data, creating closed-loop systems that enhance AI tool effectiveness.

For example, suppose you are building an AI Meeting Assistant, instead of relying solely on traditional data sources, businesses can utilize internal meetings, note-taking, and project tracking activities for valuable data collection. For instance, an AI-driven meeting assistant recording minutes, tracking action items, and monitoring progress can create a self-sustaining data loop, leading to accurate and efficient outcomes.

AI Agent in Fintech

As a product manager in fintech, I see vast potential for AI-Agent capabilities to revolutionize the financial industry. Imagine you won't need to call customer service or talk to representatives for many tasks. Generally, there a few capabilities that I am optimistic about:

Merchant Onboarding: during merchant onboarding to platforms like marketplace and payment processors, generative AI can automate numerous processes such as underwriting, anti-money laundering (AML), and know your customer (KYC).
Identity Management and Fraud Detection: Real-time Identity Verification, Behavioral Biometrics, Device Recognition, Cybersecurity network analysis
Payment Reconciliation and Reporting: automated payment reconciliation and reporting tools with built-in accounting capabilities, ensuring financial compliance.
Flexible Fund Management: provide flexible liquidity management solutions, especially for high-risk industries like sports betting, ensuring smooth fund flow during weekends and peak times.
Algorithmic Trading: AI Agents can interact with each other and with the market data to analyze trends, make trading decisions, execute trades, monitor performance, and ensure compliance with trading regulations.

These shifts allows human workers to focus on value-added services, making their roles more significant and skillful, while administrative tasks can be managed by generative AI. However, fintech applications are as challenging as Healthcare as they are all subject to data and privacy security scrutiny. There are challenges such as deepfakes and fraud, this also presents an opportunity for leading companies to develop robust AI protocols to combat these issues.

Conclusion

AI agents are transforming the landscape of technology with their ability to understand, reason, and act autonomously across various domains. From enhancing efficiency in software development and customer service to revolutionizing fintech with automated processes and intelligent trading, AI agents are paving the way for innovative solutions and streamlined operations.

As these technologies continue to evolve, integrating advanced reasoning, memory, and self-reflection capabilities, they will further bridge the gap between human potential and machine intelligence. Embracing the potential of AI agents while addressing challenges with robust protocols and ethical considerations will be crucial in harnessing their full benefits. The journey ahead promises exciting advancements and opportunities, reshaping industries and enhancing our interaction with technology.

Relavant Linkes

A Banking Chatbot Security Control Procedure for Protecting User Data Security and Privacy
Understanding Sparse vs. Dense Data in Machine Learning: Pros, Cons, and Use Cases
Voices on AI-Agent: Ethical Concerns on AI Agent; AI Risk Assessment by Luciano Floridi

Luciana’s Substack

Discussion about this post