agentevals-cli

v0.9.3 suspicious
7.0
High Risk

Standalone framework to evaluate agent correctness based on portable OpenTelemetry traces

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits several concerning behaviors including high credential risk, moderate shell and obfuscation risks, and poor metadata quality, which collectively raise suspicion but do not conclusively indicate malicious intent.

  • High credential risk due to direct access of environment variables
  • Moderate shell execution risk without proper documentation
Per-check LLM notes
  • Network: Network calls are common and may be necessary for fetching data or communicating with external services, but the absence of clear documentation or purpose raises minor concern.
  • Shell: Shell executions can be part of package installation or dependency management processes, but uncontrolled or undocumented use might pose risks.
  • Obfuscation: The base64 decoding followed by hex conversion suggests data obfuscation, but it could be part of normal encryption/encoding processes.
  • Credentials: Directly accessing environment variables for tokens indicates potential risk of credential harvesting unless explicitly documented as necessary functionality.
  • Metadata: The package shows low engagement and poor metadata quality, but there's no direct evidence of malicious intent.

🔬 Heuristic Checks

Outbound Network Calls score 6.0

Found 4 network call pattern(s)

  • try: async with httpx.AsyncClient(timeout=30) as client: r = await client.get(
  • try: async with httpx.AsyncClient(timeout=60) as client: r = await client.post
  • try: async with httpx.AsyncClient() as client: resp = await client.get(url, he
  • %s", url) async with httpx.AsyncClient() as client: resp = await client.get(url, header
Code Obfuscation score 2.0

Found 1 obfuscation pattern(s)

  • ry: raw = base64.b64decode(data[key]) data[key] = raw.hex()
Shell / Subprocess Execution score 6.0

Found 3 shell execution pattern(s)

  • nv", str(venv_dir)] ) subprocess.run(cmd, check=True, capture_output=True) def _install_deps(ve
  • "-m", "pip", "install"] subprocess.run(base + [sdk_spec], check=True, capture_output=True) logg
  • ...", requirements.name) subprocess.run(base + ["-r", str(requirements)], check=True, capture_output
Credential Harvesting score 2.5

Found 1 credential access pattern(s)

  • "AGENTEVALS_GITHUB_TOKEN") or os.environ.get("GITHUB_TOKEN") @property def source_name(self) -> str:
Typosquatting

No typosquatting candidates detected

Registered Email Domain

No author email provided

Suspicious Page Links

All external links appear legitimate

Git Repository History

No GitHub repository linked

  • No GitHub repository link found
Maintainer History score 6.0

3 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
  • Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agentevals-cli
Your task is to create a Python-based mini-application that leverages the 'agentevals-cli' package to evaluate the performance of various AI agents in a simulated environment. This application will help developers understand how well their agents perform under different conditions and against various benchmarks. Here’s a detailed breakdown of the steps and features your application should include:

1. **Setup Environment**: Ensure you have Python installed along with the 'agentevals-cli' package. You may need to install additional dependencies such as OpenTelemetry.
2. **Simulated Environments**: Create at least three distinct environments where the agents will operate. Each environment should present unique challenges that test different aspects of the agent's capabilities.
3. **Agent Integration**: Integrate two or more pre-existing AI agents into your application. These could be simple decision-making algorithms or more complex machine learning models.
4. **Evaluation Criteria**: Define a set of evaluation criteria using 'agentevals-cli'. These criteria should measure the agents' performance based on factors like accuracy, speed, and resource usage.
5. **Tracing and Analysis**: Utilize 'agentevals-cli' to generate portable OpenTelemetry traces for each agent during its operation within the environments. Analyze these traces to assess how well each agent performs according to the defined criteria.
6. **Visualization**: Implement a visualization component to display the results of the evaluations. This could be through graphs, charts, or any other form of visual representation that clearly communicates the data.
7. **User Interface**: Develop a basic user interface that allows users to select which agents and environments they want to evaluate, and view the results of those evaluations.
8. **Documentation**: Provide comprehensive documentation detailing how to set up and use the application, including explanations of the 'agentevals-cli' integration and the significance of the evaluation metrics.

By completing this project, you'll gain hands-on experience with 'agentevals-cli', learn how to integrate and evaluate AI agents, and develop valuable skills in Python programming and data visualization.