AI Analysis
Final verdict: SUSPICIOUS
The package has a moderate risk score due to missing maintainer history and critical metadata, which raises concerns about its origin and intentions.
- Lack of maintainer history
- Missing critical metadata
Per-check LLM notes
- Obfuscation: No obfuscation patterns detected, indicating low risk.
- Credentials: No credential harvesting patterns detected, indicating low risk.
- Metadata: The package shows several red flags including lack of maintainer history and missing critical metadata, suggesting potential low effort or malicious intent.
Heuristic Checks
Outbound Network Calls
No suspicious network call patterns found
Code Obfuscation
No obfuscation patterns detected
Shell / Subprocess Execution
score 6.0
Found 3 shell execution pattern(s)
None: try: return subprocess.check_output( ["git", "rev-parse", "HEAD"], stderg="utf-8") proc = subprocess.run( ["python", "-m", "pytest", "-q", str(work /g="utf-8") proc = subprocess.run( ["python", str(path)], cwd=
Credential Harvesting
No credential harvesting patterns detected
Typosquatting
No typosquatting candidates detected
Registered Email Domain
No author email provided
Suspicious Page Links
All external links appear legitimate
Git Repository History
No GitHub repository linked
No GitHub repository link found
Maintainer History
score 8.0
4 maintainer concern(s) found
Only one version has ever been released — brand new packageAuthor name is missing or very shortAuthor "" appears to have only 1 package on PyPI (new or inactive account)Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities
No known vulnerabilities found in OSV database.
AI App Starter Prompt
Use this prompt to build a project with agent-builder-evals
Create a command-line tool named 'AgentBench' that leverages the 'agent-builder-evals' package to benchmark and evaluate various AI agents from different providers. This tool should allow users to easily compare the performance of these agents on specific tasks, such as natural language processing, decision-making, and problem-solving. The application should include the following key features: 1. **Agent Registration**: Users should be able to register new AI agents by specifying their provider (e.g., Anthropic, Google, Microsoft), API endpoint, and any necessary authentication details. 2. **Task Configuration**: Define a set of tasks that each agent will perform. These tasks could range from simple Q&A sessions to more complex scenarios like ethical dilemmas or strategic games. 3. **Evaluation Metrics**: Implement a variety of metrics to assess the performance of the agents, such as response time, accuracy, coherence, and creativity. Each metric should be customizable based on user needs. 4. **Benchmarking Suite**: Utilize the 'agent-builder-evals' package to run the defined tasks against all registered agents, collecting data on their performance according to the specified metrics. 5. **Reporting and Visualization**: After running the benchmarks, generate comprehensive reports detailing the performance of each agent across the different tasks. Include visualizations like graphs and charts to make the data easier to understand. 6. **User Interface**: While primarily a CLI tool, consider adding basic help commands and a simple text-based menu system to guide users through the process of registering agents, configuring tasks, and viewing results. 7. **Customization Options**: Allow advanced users to customize the evaluation criteria and task definitions further, ensuring the tool remains flexible and adaptable to various use cases. 8. **Security Measures**: Ensure sensitive information, such as API keys, is handled securely, possibly using environment variables or encrypted storage. The goal of this project is to provide a robust, user-friendly tool for anyone interested in comparing the capabilities of different AI agents, aiding in both academic research and practical applications.