AI Analysis
Final verdict: SUSPICIOUS
The package exhibits several concerning behaviors including high risks associated with shell execution and credential handling, suggesting potential misuse. However, there are no definitive signs of malicious intent.
- High shell risk indicating potential for running arbitrary commands
- High credential risk suggesting possible unauthorized credential access
Per-check LLM notes
- Network: The network calls are likely for checking internet availability and making HTTP requests, which may be part of the package's functionality.
- Shell: The shell execution patterns suggest potential use of subprocesses to run external commands, possibly related to hardware detection or other system-specific tasks, but could also indicate risky behavior if not properly documented.
- Obfuscation: The obfuscation pattern detected seems to be a result of code formatting issues rather than intentional malicious obfuscation.
- Credentials: The code is harvesting a GITHUB_TOKEN from environment variables and using getpass for input, which could indicate an attempt to capture credentials unless it's clearly documented and used for legitimate purposes like API authentication.
- Metadata: The package shows low maintenance and effort signs, but lacks clear indicators of malicious intent.
Heuristic Checks
Outbound Network Calls
score 9.0
Found 6 network call pattern(s)
= False try: s = socket.create_connection(("1.1.1.1", 80)) s.close() internet_availabldy as text. """ req = urllib.request.Request(url, headers={"User-Agent": "EuroEval-CoreModels/1"}Eval-CoreModels/1"}) with urllib.request.urlopen(req, timeout=timeout) as resp: return resp.rtry: req = urllib.request.Request(url, method="HEAD") with urllib.request.thod="HEAD") with urllib.request.urlopen(req, timeout=10) as resp: length = inot None else None req = urllib.request.Request( url, data=data, method=meth
Code Obfuscation
score 2.0
Found 1 obfuscation pattern(s)
re[redundant-cast] model.eval() model.to(benchmark_config.device) # ty: ignore[invali
Shell / Subprocess Execution
score 6.0
Found 3 shell execution pattern(s)
pty() try: proc = subprocess.Popen( # noqa: S603 cmd, stdin=slave_fd,one try: result = subprocess.run( # noqa: S603, S607 [ "nvidia-sn) master_fd, slave_fd = pty.openpty() try: proc = subprocess.Popen( # noqa: S603
Credential Harvesting
score 5.0
Found 2 credential access pattern(s)
variable. """ token = os.environ.get("GITHUB_TOKEN") if not token: logger.error("GITHUB_TOKEN envret: reader = lambda: getpass.getpass(f"{prompt_text}: ") # noqa: E731 else: reader =
Typosquatting
No typosquatting candidates detected
Registered Email Domain
Email domain looks legitimate: alexandra.dk>
Suspicious Page Links
All external links appear legitimate
Git Repository History
Repository EuroEval/EuroEval appears legitimate
Maintainer History
score 6.0
3 maintainer concern(s) found
Author name is missing or very shortAuthor "" appears to have only 1 package on PyPI (new or inactive account)Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities
No known vulnerabilities found in OSV database.
AI App Starter Prompt
Use this prompt to build a project with ScandEval
Create a web-based language evaluation tool using Python's ScandEval package. This tool will serve as a platform for users to test and compare various language models on their performance across different European languages, focusing particularly on Nordic and other less commonly evaluated European languages. The application should allow users to upload their own custom datasets or select from predefined ones, specify which language models they wish to evaluate, and choose from a variety of tasks such as sentiment analysis, named entity recognition, and machine translation. Additionally, the tool should provide visualizations of the evaluation results, allowing for easy comparison between different models and tasks. The application should be built using Flask for the backend and React for the frontend, ensuring a responsive and user-friendly interface. Utilize ScandEval's benchmarking capabilities to automate the evaluation process and integrate its scoring metrics into the application's output. Ensure that the application is scalable and can handle multiple concurrent evaluations.