Package Metadata

Author: —
Email: Théo CHARLET <[email protected]>
PyPI: RDTextract
Python: >=3.9
Versions: 3 releases
First release: 19 Apr 2026, 17:24 UTC
Analysed: 05 Jun 2026, 19:46 UTC
Source files: 6 .py files scanned

Project Links

Classifiers

Development Status :: 4 - BetaIntended Audience :: DevelopersIntended Audience :: Science/ResearchLicense :: OSI Approved :: MIT LicenseOperating System :: OS IndependentProgramming Language :: Python :: 3Programming Language :: Python :: 3 :: OnlyProgramming Language :: Python :: 3.10Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package shows low individual risks across various categories, but the metadata risk due to incomplete author details and potential inactivity of the account raises concerns. This combination suggests caution but does not conclusively indicate a supply-chain attack.

Low network, shell, obfuscation, and credential risks.
Metadata risk due to incomplete author details and potentially inactive account.

Per-check LLM notes

Network: No network calls detected, which is normal unless the package requires external services.
Shell: No shell execution patterns detected, indicating no immediate risk from command execution.
Obfuscation: No obfuscation patterns detected, indicating low risk of malicious activity.
Credentials: No credential harvesting patterns detected, suggesting no immediate threat to secrets or credentials.
Metadata: The author's details are incomplete and the account seems new or inactive, raising some suspicion but not conclusive evidence of malice.

🔬 Heuristic Checks

✓ Outbound Network Calls

No suspicious network call patterns found

✓ Code Obfuscation

No obfuscation patterns detected

✓ Shell / Subprocess Execution

No shell execution patterns detected

✓ Credential Harvesting

No credential harvesting patterns detected

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: rdtvlokip.fr>

✓ Suspicious Page Links

All external links appear legitimate

✓ Git Repository History

Repository RDTvlokip/RDTextract appears legitimate

⚠ Maintainer History score 4.0

2 maintainer concern(s) found

Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with RDTextract

Create a web-based utility that extracts and converts content from HTML files into clean Markdown format, optimized specifically for use in large language model (LLM) training datasets. The application should leverage the 'RDTextract' package to ensure the extracted content is free from noise artifacts and includes quality scores for each piece of extracted text. Additionally, the utility should be able to detect and exclude low-value content such as advertisements, navigation bars, and other non-essential elements.

Steps to build the utility:
1. Set up a Flask backend server that allows users to upload HTML files.
2. Integrate the 'RDTextract' package within your Flask app to process the uploaded HTML files.
3. Implement a feature that displays the extracted Markdown content on a separate page, alongside quality scores for each segment.
4. Add functionality to allow users to download the cleaned Markdown file directly from the application.
5. Include a feature that highlights or marks sections of the HTML that have been identified as low-value content.
6. Ensure the application has a user-friendly interface with clear instructions on how to use it.

Suggested Features:
- User authentication to track individual usage statistics.
- A history section where users can view their previously processed files.
- An option to manually review and adjust the quality scores of specific segments.
- Integration with popular cloud storage services for easy file retrieval.

How 'RDTextract' is Utilized:
- Use 'RDTextract' to process the uploaded HTML files, extracting only the valuable content and generating quality scores for each piece of text.
- Apply 'RDTextract's low-value detection capabilities to identify and exclude non-essential parts of the HTML.
- Display the extracted content in a structured Markdown format, ensuring the final output is ready for LLM training purposes.