agent-eval-runner

v0.3.1 safe
4.0
Medium Risk

Run the AI Agent QA Eval Pack against your tool-using LLM agent + grade it with a shareable scorecard badge. Deterministic, OWASP Agentic Top 10 aligned, no LLM-judge.

πŸ€– AI Analysis

Final verdict: SAFE

The package appears safe with no direct evidence of malicious activities. However, the metadata suggests it may be under-maintained.

  • No network calls or shell executions detected
  • Low activity and maintenance effort indicated in metadata
Per-check LLM notes
  • Network: No network calls detected, which is normal unless the package requires external services.
  • Shell: No shell execution patterns detected, indicating no immediate signs of malicious shell command execution.
  • Obfuscation: No obfuscation patterns detected, indicating low risk of malicious intent.
  • Credentials: No credential harvesting patterns detected, suggesting no immediate threat to secrets or credentials.
  • Metadata: The package shows signs of low activity and maintenance effort, raising some suspicion but not definitive evidence of malice.

πŸ”¬ Heuristic Checks

βœ“ Outbound Network Calls

No suspicious network call patterns found

βœ“ Code Obfuscation

No obfuscation patterns detected

βœ“ Shell / Subprocess Execution

No shell execution patterns detected

βœ“ Credential Harvesting

No credential harvesting patterns detected

βœ“ Typosquatting

No typosquatting candidates detected

βœ“ Registered Email Domain

No author email provided

βœ“ Suspicious Page Links

All external links appear legitimate

⚠ Git Repository History score 2.5

Git history flags: Repository has zero stars and zero forks

  • Repository has zero stars and zero forks
⚠ Maintainer History score 4.0

2 maintainer concern(s) found

  • Author "Weiseer" appears to have only 1 package on PyPI (new or inactive account)
  • Package has no PyPI classifiers (low effort / metadata quality)
βœ“ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

πŸ’‘ AI App Starter Prompt

Use this prompt to build a project with agent-eval-runner
Create a Python-based mini-application named 'AgentEvalSuite' that integrates the 'agent-eval-runner' package to evaluate the security and functionality of AI agents that interact with external tools or APIs. This application will serve as a comprehensive evaluation suite for developers to test their agents under various scenarios, ensuring they comply with security standards and perform tasks accurately. Here’s a detailed breakdown of what the application should accomplish:

1. **Setup and Configuration**: Begin by setting up a Python environment with virtualenv or venv. Install necessary packages including 'agent-eval-runner'. Configure the application to accept user input for specifying the path to the agent script and any required API keys or configurations.

2. **Agent Integration**: Design a modular system within 'AgentEvalSuite' to integrate different types of agents. Agents can range from simple chatbots to complex systems interacting with multiple APIs. Ensure the application can dynamically load and run these agents based on user inputs or predefined configurations.

3. **Evaluation Scenarios**: Implement a series of evaluation scenarios that simulate real-world interactions. These scenarios should cover a wide range of use cases, such as handling sensitive data, dealing with unexpected inputs, and performing complex tasks involving multiple steps. Use 'agent-eval-runner' to run these scenarios against the integrated agents.

4. **Security Testing**: Utilize 'agent-eval-runner' to conduct security-focused evaluations aligned with the OWASP Agentic Top 10 guidelines. This includes testing for vulnerabilities like injection flaws, broken authentication, and other common security issues specific to agentic applications.

5. **Scoring and Reporting**: After running the evaluation scenarios, use 'agent-eval-runner' to generate a detailed scorecard that grades the agent's performance and security posture. This scorecard should be easily readable and include a shareable badge indicating the agent's overall score. Users should have the option to customize the scoring criteria if needed.

6. **User Interface**: Develop a simple command-line interface (CLI) for users to interact with 'AgentEvalSuite'. The CLI should allow users to select which agent to evaluate, choose specific scenarios, and view the results of the evaluation.

7. **Documentation and Support**: Provide thorough documentation explaining how to set up and use 'AgentEvalSuite', including examples of different agents and scenarios. Also, include support for troubleshooting common issues and extending the functionality of the application.

By following these steps and utilizing the 'agent-eval-runner' package effectively, you'll create a powerful tool for developers to ensure their AI agents meet both functional and security requirements before deployment.