agentic-swarm-bench

v4.0.9 suspicious
6.0
Medium Risk

Open-source benchmark for LLM inference on agentic scenarios

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits several red flags including high shell and obfuscation risks, indicating potential for misuse or malicious intent. While there's no concrete evidence of malicious activity, the combination of these factors raises concern.

  • High shell risk due to execution of external commands
  • Significant obfuscation risk with misuse of eval and compile functions
Per-check LLM notes
  • Network: Network calls could be legitimate if the package is designed to interact with external services, but long timeouts may indicate unusual behavior.
  • Shell: Execution of external commands via subprocess.run suggests potential for arbitrary code execution, which is highly risky and not typically expected in benign packages.
  • Obfuscation: The code shows signs of obfuscation with potential misuse of eval and compile functions, which can be risky.
  • Credentials: No direct evidence of credential harvesting is present, but the code could potentially be modified to include such functionality.
  • Metadata: Suspicious non-HTTPS links suggest potential risks, but no clear signs of typosquatting or malicious intent from maintainer history.

🔬 Heuristic Checks

Outbound Network Calls score 9.0

Found 6 network call pattern(s)

  • h", None) async with httpx.AsyncClient(timeout=httpx.Timeout(300.0)) as client: resp =
  • aming: async with httpx.AsyncClient(timeout=httpx.Timeout(300.0)) as client: res
  • None async with httpx.AsyncClient(timeout=httpx.Timeout(300.0)) as client: asy
  • streaming: async with httpx.AsyncClient(timeout=httpx.Timeout(300.0)) as client: respons
  • .encode() async with httpx.AsyncClient(timeout=httpx.Timeout(300.0)) as client: async w
  • ) try: async with httpx.AsyncClient(timeout=httpx.Timeout(10.0)) as client: resp = a
Code Obfuscation score 6.0

Found 3 obfuscation pattern(s)

  • ile") @click.pass_context def eval(ctx, endpoint, model, api_key, api_key_header, tasks, valida
  • x errors.""" try: compile(code, "<eval>", "exec") return True, "OK" except SyntaxError as e:
  • n=True, ) run = await __import__( "agentic_swarm_bench.scenarios.player", fromlist=["replay_scenario"] ).replay_scenario(config, scenario_path) assert isinstanc
Shell / Subprocess Execution score 2.0

Found 1 shell execution pattern(s)

  • try: result = subprocess.run( [sys.executable, f.name], c
Credential Harvesting

No credential harvesting patterns detected

Typosquatting

No typosquatting candidates detected

Registered Email Domain

Email domain looks legitimate: swarmone.ai>

Suspicious Page Links score 10.0

Found 5 suspicious link(s) on the package page

  • Non-HTTPS external link: http://your-server:8000
  • Non-HTTPS external link: http://your-gpu-server:8000
  • Non-HTTPS external link: http://new-server:8000
  • Non-HTTPS external link: http://host.docker.internal:8000
  • Non-HTTPS external link: http://my-gpu-server:8000
Git Repository History

Repository swarmone/agentic-swarm-bench appears legitimate

Maintainer History score 4.0

2 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agentic-swarm-bench
Create a mini-application named 'SwarmBenchDemo' that showcases the capabilities of the 'agentic-swarm-bench' package. This application will serve as a benchmarking tool for evaluating the performance of Large Language Models (LLMs) in handling complex, agentic tasks. Agentic tasks refer to scenarios where the model needs to reason, plan, and execute actions based on a given set of instructions or goals.

Step 1: Setup Environment
- Install the required Python packages including 'agentic-swarm-bench'.
- Ensure you have access to at least one LLM API (e.g., Anthropic Claude, Anthropic Claude-instant, OpenAI GPT-4).

Step 2: Define Benchmark Scenarios
- Create a series of agentic tasks for the LLMs to solve. These tasks could range from simple planning exercises to more complex problem-solving challenges involving multiple steps and reasoning.

Step 3: Implement the Benchmarking Mechanism
- Use 'agentic-swarm-bench' to interface with the LLMs and run each task through them.
- Collect metrics such as response time, accuracy, and complexity of the solutions provided by the models.

Step 4: Analyze Results
- Develop a feature within the app to visualize and analyze the collected data. This could include graphs showing comparative performance across different models and tasks.
- Include a summary report generation feature that compiles insights from the analysis into a readable format.

Suggested Features:
- User Interface: A simple GUI for easy interaction with the benchmarking tool.
- Customizable Tasks: Allow users to add their own agentic tasks for benchmarking.
- Real-time Feedback: Provide immediate feedback on the model's responses during benchmarking.
- Detailed Logs: Maintain logs of all benchmarking sessions for future reference.

The goal of 'SwarmBenchDemo' is not only to evaluate the performance of LLMs but also to provide developers and researchers with a robust framework for testing and improving these models in agentic scenarios.