aferiq-eval

v0.7.0 suspicious
6.0
Medium Risk

Quality observability for RAG and agents — Brazilian-Portuguese vertical with claim-by-claim hallucination diagnosis, trajectory quality, tool-use correctness, and goal completion metrics.

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package exhibits significant risks due to its use of incomplete authentication for network communications and potential code obfuscation through restricted builtins. While there's no direct evidence of malicious activity, the combination of these factors suggests a need for caution.

  • incomplete authentication for network requests
  • use of restricted builtins potentially for code obfuscation
Per-check LLM notes
  • Network: The package makes HTTP POST requests with incomplete authentication headers, which could indicate an attempt to communicate with external services without proper authorization.
  • Shell: No shell execution patterns were detected.
  • Obfuscation: The code uses eval with restricted builtins which can be used for code execution, indicating potential obfuscation or malicious intent.
  • Credentials: No clear patterns of credential harvesting are detected.
  • Metadata: The package has a missing author name and the repository is not found, which raises some suspicion.

🔬 Heuristic Checks

Outbound Network Calls score 6.0

Found 4 network call pattern(s)

  • e: self._client = httpx.Client(timeout=self.timeout) return self._client def c
  • e: self._client = httpx.AsyncClient(timeout=self.timeout) return self._client async
  • return response = httpx.post( ingest_url, headers={"Autho
  • } try: resp = httpx.post( ingest_url, headers={
Code Obfuscation score 6.0

Found 3 obfuscation pattern(s)

  • tméticas." return str(eval(expression, {"__builtins__": {}}, {})) # noqa: S307 exc
  • " try: return str(eval(expression, {"__builtins__": {}}, {})) # noqa: S307 exc
  • # auto-captured Programmatic eval (no cloud): >>> from rageval import evaluate >>> resu
Shell / Subprocess Execution

No shell execution patterns detected

Credential Harvesting

No credential harvesting patterns detected

Typosquatting

No typosquatting candidates detected

Registered Email Domain

Email domain looks legitimate: gmail.com>

Suspicious Page Links

All external links appear legitimate

Git Repository History score 3.0

Repository not found (deleted or private)

  • Repository not found (deleted or private)
Maintainer History score 4.0

2 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with aferiq-eval
Create a Python-based mini-application that leverages the 'aferiq-eval' package to assess the quality of responses generated by a Retrieval-Augmented Generation (RAG) model when interacting with a user through a simple chat interface. This application will serve as a diagnostic tool for developers working on conversational AI systems, particularly those focusing on Brazilian Portuguese language support.

### Application Overview:
- **Name**: RAG-QualityChecker
- **Objective**: Evaluate the accuracy, relevance, and overall quality of responses generated by a RAG model, specifically tailored for Brazilian Portuguese.
- **Features**:
  - **Claim-by-Claim Hallucination Diagnosis**: Automatically identify instances where the RAG model generates information not supported by the input context or external knowledge sources.
  - **Trajectory Quality Analysis**: Monitor the coherence and consistency of the conversation over multiple turns, ensuring that the response trajectory aligns with the user's query and context.
  - **Tool-Use Correctness**: Assess whether the RAG model correctly utilizes available tools or references to provide accurate responses.
  - **Goal Completion Metrics**: Measure how well the RAG model meets the user's intent or goal with each interaction.

### Development Steps:
1. **Setup Environment**: Install Python and necessary libraries including 'aferiq-eval'.
2. **Integrate RAG Model**: Use a pre-trained RAG model that supports Brazilian Portuguese or integrate one compatible with 'aferiq-eval'.
3. **Develop Chat Interface**: Create a simple text-based or graphical user interface (GUI) where users can input their queries and interact with the RAG model.
4. **Implement Evaluation Logic**: Utilize 'aferiq-eval' to analyze each response generated by the RAG model, applying its core functionalities to diagnose potential issues like hallucinations, tool misuse, etc.
5. **Display Evaluation Results**: Present the evaluation results to the user in real-time, highlighting areas where the RAG model performed well and areas needing improvement.
6. **User Feedback Loop**: Allow users to provide feedback on the accuracy and relevance of the evaluations provided by 'aferiq-eval', which could help refine future assessments.
7. **Documentation and Testing**: Write comprehensive documentation detailing how to use the application and conduct thorough testing to ensure reliability.

### Expected Outcome:
By the end of this project, you'll have a functional tool that not only serves as a conversational interface but also provides insightful diagnostics about the quality of responses generated by a RAG model in Brazilian Portuguese. This tool can be invaluable for developers looking to improve the performance of their conversational AI systems.