agenttester

v1.4.5 suspicious
6.0
Medium Risk

Run a prompt against multiple coding agents in parallel and compare results

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits moderate risk due to its potential for executing arbitrary commands and sending data externally, although there is no concrete evidence of malicious intent.

  • High shell risk due to use of subprocess.run
  • Potential network exfiltration
Per-check LLM notes
  • Network: The network calls suggest the package is designed to send results or data to an external server, which could be benign but also indicates potential for data exfiltration.
  • Shell: The use of subprocess.run to execute commands like 'rsync' and 'git' on the remote system can be risky, as it allows execution of arbitrary commands which may indicate a backdoor capability.
  • Obfuscation: No obfuscation patterns detected.
  • Credentials: The description mentions harvesting credentials but does not provide clear evidence of malicious intent; could be related to legitimate AWS access methods.
  • Metadata: The package shows signs of low maintainer activity and poor metadata quality, raising some suspicion but not definitive evidence of malicious intent.

📦 Package Quality Overall: Low (4.4/10)

✦ High Test Suite 9.0

Test suite present — 18 test file(s) found

  • Test runner config found: conftest.py
  • Test runner config found: pyproject.toml
  • 18 test file(s) detected (e.g. conftest.py)
◈ Medium Documentation 5.0

Some documentation present

  • Detailed PyPI description (14359 chars)
○ Low Contributing Guide 2.0

No contributing guide or governance files found

  • No CONTRIBUTING, CODE_OF_CONDUCT, or governance files found
◈ Medium Type Annotations 5.0

Partial type annotation coverage

  • 445 type-annotated function signatures detected in source
○ Low Multiple Contributors 1.0

Unable to verify contributor count: no GitHub repository found

  • No GitHub repository linked — contributor count unavailable

🔬 Heuristic Checks

Outbound Network Calls score 9.0

Found 6 network call pattern(s)

  • ).encode() req = urllib.request.Request( f"{self.notify_url.rstrip('/')}/result"
  • try: with urllib.request.urlopen(req, timeout=10) as resp: return f"N
  • erse" ) req = urllib.request.Request( endpoint, data=json.dumps(b
  • try: with urllib.request.urlopen(req) as resp: data = json.loads(resp
  • encode() with patch("urllib.request.urlopen", return_value=mock_resp) as mock_open:
  • ue=False) with patch("urllib.request.urlopen", return_value=mock_resp): result = ex.e
Code Obfuscation

No obfuscation patterns detected

Shell / Subprocess Execution score 10.0

Found 6 shell execution pattern(s)

  • remote host via rsync.""" subprocess.run( [ "rsync", "-az",
  • ck to local via rsync.""" subprocess.run( [ "rsync", "-az",
  • [str]) -> str: return subprocess.run( cmd, cwd=workdir, capture_output=True, text=Tru
  • try: result = subprocess.run( ["git", "remote", "-v"], ca
  • e: return subprocess.run( ["git", "checkout", "-b", branch],
  • _env or {})} result = subprocess.run( cmd, capture_output=True,
Credential Harvesting score 2.5

Found 1 credential access pattern(s)

  • tial chain (env vars, ``~/.aws/credentials``, IAM instance role, etc.). Requires ``pip install age
Typosquatting

No typosquatting candidates detected

Registered Email Domain

No author email provided

Suspicious Page Links

All external links appear legitimate

Git Repository History

No GitHub repository linked

  • No GitHub repository link found
Maintainer History score 6.0

3 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
  • Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agenttester
Create a mini-application named 'CodeBench' using the Python package 'agenttester'. CodeBench is designed to help developers evaluate different code generation models by running the same coding challenge against multiple AI agents in parallel and comparing their outputs. Here’s a detailed breakdown of what the application should do:

1. **Setup**: Begin by setting up a virtual environment and installing necessary packages including 'agenttester'. Ensure all dependencies are managed within a requirements.txt file.
2. **User Interface**: Design a simple command-line interface (CLI) that allows users to input a coding challenge (e.g., writing a function to sort an array).
3. **Agent Configuration**: Allow users to specify which AI agents they want to test. Provide a default set of popular coding assistants if no specific agents are chosen.
4. **Execution**: Use 'agenttester' to run the specified coding challenge against each selected agent in parallel. Capture the time taken for each agent to respond.
5. **Output Comparison**: Once all responses are received, display a side-by-side comparison of the code snippets generated by each agent. Include a brief analysis of the efficiency, readability, and any unique features of the generated code.
6. **Performance Metrics**: Implement basic performance metrics such as execution time, code length, and estimated complexity (using a simple algorithm).
7. **Feedback Loop**: Optionally, allow users to provide feedback on the generated code snippets directly through the CLI. This feedback could be stored locally for future reference or improvement suggestions.
8. **Documentation**: Write comprehensive documentation detailing how to install, use, and extend CodeBench. Include examples of how to integrate new AI agents into the benchmarking process.

By following these steps, you will create a powerful tool for evaluating and comparing different AI coding assistants, making it easier for developers to choose the best one for their projects.