🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits several concerning behaviors including high risks associated with shell execution and credential handling, suggesting potential misuse. However, there are no definitive signs of malicious intent.

High shell risk indicating potential for running arbitrary commands
High credential risk suggesting possible unauthorized credential access

Per-check LLM notes

Network: The network calls are likely for checking internet availability and making HTTP requests, which may be part of the package's functionality.
Shell: The shell execution patterns suggest potential use of subprocesses to run external commands, possibly related to hardware detection or other system-specific tasks, but could also indicate risky behavior if not properly documented.
Obfuscation: The obfuscation pattern detected seems to be a result of code formatting issues rather than intentional malicious obfuscation.
Credentials: The code is harvesting a GITHUB_TOKEN from environment variables and using getpass for input, which could indicate an attempt to capture credentials unless it's clearly documented and used for legitimate purposes like API authentication.
Metadata: The package shows low maintenance and effort signs, but lacks clear indicators of malicious intent.

🔬 Heuristic Checks

⚠ Outbound Network Calls score 9.0

Found 6 network call pattern(s)

= False try: s = socket.create_connection(("1.1.1.1", 80)) s.close() internet_availabl
dy as text. """ req = urllib.request.Request(url, headers={"User-Agent": "EuroEval-CoreModels/1"}
Eval-CoreModels/1"}) with urllib.request.urlopen(req, timeout=timeout) as resp: return resp.r
try: req = urllib.request.Request(url, method="HEAD") with urllib.request.
thod="HEAD") with urllib.request.urlopen(req, timeout=10) as resp: length = i
not None else None req = urllib.request.Request( url, data=data, method=meth

⚠ Code Obfuscation score 2.0

Found 1 obfuscation pattern(s)

re[redundant-cast] model.eval() model.to(benchmark_config.device) # ty: ignore[invali

⚠ Shell / Subprocess Execution score 6.0

Found 3 shell execution pattern(s)

pty() try: proc = subprocess.Popen( # noqa: S603 cmd, stdin=slave_fd,
one try: result = subprocess.run( # noqa: S603, S607 [ "nvidia-s
n) master_fd, slave_fd = pty.openpty() try: proc = subprocess.Popen( # noqa: S603

⚠ Credential Harvesting score 5.0

Found 2 credential access pattern(s)

variable. """ token = os.environ.get("GITHUB_TOKEN") if not token: logger.error("GITHUB_TOKEN env
ret: reader = lambda: getpass.getpass(f"{prompt_text}: ") # noqa: E731 else: reader =

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: alexandra.dk>

✓ Suspicious Page Links

All external links appear legitimate

✓ Git Repository History

Repository EuroEval/EuroEval appears legitimate

⚠ Maintainer History score 6.0

3 maintainer concern(s) found

Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)
Package has no PyPI classifiers (low effort / metadata quality)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with ScandEval

Create a web-based language evaluation tool using Python's ScandEval package. This tool will serve as a platform for users to test and compare various language models on their performance across different European languages, focusing particularly on Nordic and other less commonly evaluated European languages. The application should allow users to upload their own custom datasets or select from predefined ones, specify which language models they wish to evaluate, and choose from a variety of tasks such as sentiment analysis, named entity recognition, and machine translation. Additionally, the tool should provide visualizations of the evaluation results, allowing for easy comparison between different models and tasks. The application should be built using Flask for the backend and React for the frontend, ensuring a responsive and user-friendly interface. Utilize ScandEval's benchmarking capabilities to automate the evaluation process and integrate its scoring metrics into the application's output. Ensure that the application is scalable and can handle multiple concurrent evaluations.