Package Metadata

Author: —
Email: Sungwoo Kim <[email protected]>
PyPI: agent-evaluator
Python: >=3.8
Versions: 22 releases
First release: 19 Mar 2026, 15:35 UTC
Analysed: 06 Jun 2026, 06:24 UTC
Source files: 61 .py files scanned

Project Links

Bug Tracker Documentation Homepage Repository

Classifiers

Development Status :: 4 - BetaIntended Audience :: DevelopersIntended Audience :: Science/ResearchLicense :: OSI Approved :: MIT LicenseOperating System :: OS IndependentProgramming Language :: Python :: 3Programming Language :: Python :: 3.10Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12Programming Language :: Python :: 3.13

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits multiple high-risk behaviors including shell execution and obfuscation, indicating potential malicious intent. While the network and credential risks are moderate, the incomplete metadata further raises suspicion.

High shell risk
Significant obfuscation risk
Incomplete author metadata

Per-check LLM notes

Network: Network calls indicate potential external communication which could be legitimate but should be reviewed to ensure it aligns with the package's intended functionality.
Shell: Shell execution is high risk as it can be used for unauthorized actions. This should be carefully examined to confirm its necessity and legitimacy within the package.
Obfuscation: The obfuscation pattern indicates potential code tampering to evade detection.
Credentials: The regex patterns and environment variable usage suggest possible attempts to harvest credentials or access sensitive files.
Metadata: The author's information is incomplete, suggesting a potentially less reputable source.

🔬 Heuristic Checks

⚠ Outbound Network Calls score 9.0

Found 6 network call pattern(s)

encode("utf-8") req = urllib.request.Request( self.webhook_url, data=data
POST", ) with urllib.request.urlopen(req, timeout=10): pass class WebhookHa
e(self.headers) req = urllib.request.Request( self.url, data=data,
ethod, ) with urllib.request.urlopen(req, timeout=10): pass class EmailHand
try: with urllib.request.urlopen(f"{base_url}/v1/projects", timeout=3) as resp:
}).encode() req = urllib.request.Request( gql_endpoint, data=

⚠ Code Obfuscation score 10.0

Found 6 obfuscation pattern(s)

onitor @property def eval(self) -> Any: """내부 :class:`EvalDecorator` 인스턴스."""
-------------- # 직접 호출 — @eval(task_type="qa") 형태 # -----------------------------------
y) -> Callable: """``@eval(task_type=...)`` 형태로 데코레이터를 직접 생성한다. Harness defaul
Usage:: @eval(task_type="qa", score_fn=my_fn) def agent(questi
passwd': 'critical', 'eval(': 'high', 'exec(': 'high', } param_str = s
= QuickEval("results/") @eval(task_type="reasoning", framework="dspy") def my_cot(ques

⚠ Shell / Subprocess Execution score 2.0

Found 1 shell execution pattern(s)

_phoenix_cmd() proc = subprocess.Popen( cmd, env=env, stdout=su

⚠ Credential Harvesting score 10.0

Found 6 credential access pattern(s)

er = SlackHandler(webhook_url=os.getenv("SLACK_WEBHOOK")) """ def __init__(self, webhook_url: str,
r"(\.\./)", r"(\.\.\\)", r"(/etc/passwd)", r"(/etc/shadow)", r"(C:\\Windows)", r"(/
.\.\\)", r"(/etc/passwd)", r"(/etc/shadow)", r"(C:\\Windows)", r"(/root/)", r"(/var/w
IGNORECASE), re.compile(r'/etc/passwd', re.IGNORECASE), re.compile(r'\\windows\\system32', re
ELETE FROM': 'high', '/etc/passwd': 'critical', 'eval(': 'high', 'exec(': 'hi
new_val = getpass.getpass(prompt).strip() except (EOFError, getpass.Ge

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: gmail.com>

✓ Suspicious Page Links

All external links appear legitimate

✓ Git Repository History

Repository bullpeng72/Agent-Evaluator appears legitimate

⚠ Maintainer History score 4.0

2 maintainer concern(s) found

Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with agent-evaluator

Develop a fully-functional mini-application named 'AI-Agent-Inspector' that leverages the 'agent-evaluator' package to assess the performance of various AI agents in a simulated environment. This application will serve as a tool for developers and researchers to evaluate different AI agents against a set of predefined criteria, ensuring they meet robust standards of functionality, security, and efficiency.

The application should include the following key features:
1. **Agent Registration**: Users can register new AI agents within the application. Each agent can belong to one or more categories (e.g., chatbots, recommendation systems, autonomous vehicles).
2. **Evaluation Setup**: Users can configure evaluation scenarios based on the seven evaluation gates provided by 'agent-evaluator': goal achievement, behavioral integrity, reliability, performance, security, multi-agent coordination, and observability.
3. **Dynamic Metrics Selection**: Allow users to select specific metrics from the 58 available metrics (25 native + 33 Harness Config) for each evaluation scenario.
4. **Scenario Execution**: Execute the configured scenarios and collect data on how well each agent performs according to the selected metrics.
5. **Reporting**: Provide comprehensive reports that summarize the results of each evaluation, highlighting strengths and weaknesses of the agents.
6. **Visualization**: Implement visual dashboards to display the evaluation results in a user-friendly manner, enabling quick insights into agent performance.
7. **Security and Compliance Checks**: Ensure that the application includes checks to prevent unauthorized access and manipulation of evaluation data.

To utilize the 'agent-evaluator' package, integrate it into your application to handle the complex task of evaluating AI agents against the specified metrics and criteria. Use its comprehensive framework to streamline the process of setting up evaluation scenarios, executing them, and generating detailed reports. Additionally, leverage the package's advanced features for handling multi-agent environments and ensuring robust security measures are in place.