agent-deepeval

v0.2.0 suspicious
4.0
Medium Risk

Local-first Agent batch evaluation and failure analysis CLI

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package shows low risks in terms of network, shell, obfuscation, and credential aspects. However, the low maintainer activity and poor metadata quality raise concerns about its reliability and potential maintenance issues.

  • Low maintainer activity
  • Poor metadata quality
Per-check LLM notes
  • Network: No network calls detected, which is normal and does not indicate any risk.
  • Shell: Shell executions are observed but seem to be used for running the package's own modules, which could be expected behavior unless there are signs of suspicious commands or paths.
  • Obfuscation: No obfuscation patterns detected, indicating low risk.
  • Credentials: No credential harvesting patterns detected, indicating low risk.
  • Metadata: The package shows low maintainer activity and poor metadata quality, raising some suspicion but not conclusive evidence of malice.

🔬 Heuristic Checks

Outbound Network Calls

No suspicious network call patterns found

Code Obfuscation

No obfuscation patterns detected

Shell / Subprocess Execution score 10.0

Found 5 shell execution pattern(s)

  • dProcess[str]: return subprocess.run( args, shell=False, cwd=
  • ("PYTHONPATH", "") return subprocess.run([sys.executable, "-m", "agent_eval", *args], cwd=tmp_path, t
  • ("PYTHONPATH", "") proc = subprocess.run([sys.executable, "-m", "agent_eval", "run", "--run-name", ".
  • ", "inspect"): proc = subprocess.run([sys.executable, "-m", "agent_eval", command, "--run", "../o
  • tr(no_python_bin) proc = subprocess.run([sys.executable, "-m", "agent_eval", "run"], cwd=tmp_path, t
Credential Harvesting

No credential harvesting patterns detected

Typosquatting

No typosquatting candidates detected

Registered Email Domain

No author email provided

Suspicious Page Links

All external links appear legitimate

Git Repository History

No GitHub repository linked

  • No GitHub repository link found
Maintainer History score 6.0

3 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
  • Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agent-deepeval
Create a Python-based mini-application named 'AgentEvalMaster' that leverages the 'agent-deepeval' package to evaluate the performance of multiple AI agents against various tasks and analyze any failures they encounter. This tool should allow users to define different test scenarios, run batches of evaluations, and generate comprehensive reports on the agents' performance.

Key Features:
1. Define Test Scenarios: Users should be able to input or select predefined test scenarios which include specific tasks for the agents to perform.
2. Batch Evaluation: The application should support running these tests in batches, allowing multiple agents to be evaluated simultaneously.
3. Failure Analysis: After each batch, the application must analyze any failures encountered during the evaluations, providing insights into why certain tasks failed and suggesting potential improvements.
4. Comprehensive Reporting: Generate detailed reports summarizing the overall performance of each agent across all test scenarios, including success rates, common failure points, and suggestions for improvement.
5. User Interface: Implement a simple command-line interface (CLI) for ease of use.

How to Use 'agent-deepeval':
- Utilize 'agent-deepeval' to set up and manage the batch evaluation process.
- Leverage its capabilities for local-first evaluation to ensure that all tests can be conducted without requiring internet access.
- Employ its failure analysis tools to automatically identify patterns in failures and suggest areas for agent improvement.
- Integrate its reporting functionalities to produce user-friendly summaries of evaluation results.