Package Metadata

Author: —
Email: "JFrog Ltd." <[email protected]>
PyPI: agent-belt
Python: >=3.11
Versions: 1 release
First release: 17 May 2026, 13:01 UTC
Analysed: 06 Jun 2026, 05:50 UTC
Source files: 65 .py files scanned

Project Links

Changelog Documentation Homepage Issues Source

Classifiers

Development Status :: 4 - BetaIntended Audience :: DevelopersLicense :: OSI Approved :: Apache Software LicenseProgramming Language :: Python :: 3Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12Programming Language :: Python :: 3.13Programming Language :: Python :: 3.14Topic :: Scientific/Engineering :: Artificial IntelligenceTopic :: Software Development :: Testing

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits high risks related to credential harvesting and metadata that suggest it could be part of a supply-chain attack. Despite no direct evidence of malicious content, the combination of factors points towards caution.

High credential risk
Suspicious metadata indicators

Per-check LLM notes

Obfuscation: No obfuscation patterns detected in the package.
Credentials: The observed pattern suggests potential unauthorized access to system files, indicative of credential harvesting.
Metadata: The package shows signs of being potentially suspicious due to its newness, lack of maintainer history, and minimal git activity, suggesting it might be a test run for malicious intent.

🔬 Heuristic Checks

⚠ Outbound Network Calls score 3.0

Found 2 network call pattern(s)

(base: str) -> None: with httpx.Client(base_url=base, timeout=5.0) as client: health = clie
try: r = httpx.get(f"http://127.0.0.1:{port}/api/health", timeout=1.0)

✓ Code Obfuscation

No obfuscation patterns detected

⚠ Shell / Subprocess Execution score 10.0

Found 5 shell execution pattern(s)

ort = _free_port() proc = subprocess.Popen( [ sys.executable, "-m",
_env} try: return subprocess.run( [git, *args], cwd=str(cwd) if cwd i
) process. proc = subprocess.Popen( # noqa: S603 - cmd is a fixed argv list owned by the adapt
try: result = subprocess.run([bin_path, "--version"], capture_output=True, text=True, tim
try: result = subprocess.run( [bin_path, "--version"], ca

⚠ Credential Harvesting score 2.5

Found 1 credential access pattern(s)

ce-in-depth: ``--bundled ../../etc/passwd`` would otherwise # escape the bundled root. ``resolve(

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: jfrog.com>

✓ Suspicious Page Links

All external links appear legitimate

⚠ Git Repository History score 5.0

Git history flags: Very few commits: 2 total

Very few commits: 2 total
Single contributor with only 2 commit(s) — possibly throwaway account

⚠ Maintainer History score 6.0

3 maintainer concern(s) found

Only one version has ever been released — brand new package
Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agent-belt

Create a mini-application named 'AgentBench' that leverages the 'agent-belt' Python package to evaluate and compare different command-line interface (CLI) agents based on their performance in solving complex tasks. This application will serve as a benchmarking tool for developers and researchers interested in assessing the capabilities of various CLI agents across multiple scenarios.

### Project Goals:
- Develop a series of multi-turn scenarios that simulate real-world problems.
- Use 'agent-belt' to execute these scenarios with different CLI agents.
- Implement both rule-based and language model (LLM) scoring mechanisms to evaluate agent responses.
- Provide a comparative analysis of the agents based on the scores obtained.
- Ensure that all evaluations are reproducible and consistent.

### Key Features:
1. **Scenario Creation**: Allow users to define custom scenarios with multiple turns, where each turn represents a step in problem-solving.
2. **Agent Integration**: Support integration with various CLI agents through a standardized interface provided by 'agent-belt'.
3. **Scoring Mechanisms**:
- Rule-Based Scoring: Define rules for what constitutes a correct or optimal response.
- LLM Scoring: Utilize an LLM to assess the quality of agent responses, providing more nuanced evaluation.
4. **Reproducibility**: Ensure that all evaluations can be reproduced by saving scenario configurations and execution logs.
5. **Comparative Analysis**: Display side-by-side comparisons of different agents' performances across scenarios.
6. **User Interface**: Develop a simple yet effective web UI using Flask or a similar framework to allow users to input scenarios, select agents, and view results.

### Steps to Build the Application:
1. **Setup Environment**: Install necessary packages including 'agent-belt', Flask, and any other dependencies.
2. **Define Scenarios**: Create a few example scenarios covering diverse use cases such as data processing, system administration, and natural language understanding.
3. **Integrate Agents**: Connect your application with at least two different CLI agents.
4. **Implement Scoring Systems**: Develop both rule-based and LLM-based scoring systems using 'agent-belt' functionalities.
5. **Develop Web Interface**: Use Flask to create a user-friendly web interface allowing users to interact with the application.
6. **Test and Refine**: Conduct thorough testing of your application, ensuring it meets all specified requirements and functions smoothly.
7. **Documentation**: Write comprehensive documentation explaining how to use 'AgentBench', how to add new scenarios and agents, and how to interpret the results.

By following these steps and utilizing the 'agent-belt' package effectively, you'll create a valuable tool for evaluating and comparing CLI agents in a structured and reproducible manner.