Resiliparse

v1.0.3 safe
4.0
Medium Risk

A collection of robust and fast processing tools for parsing and analyzing (not only) web archive data.

🤖 AI Analysis

Final verdict: SAFE

The package appears to be legitimate and serves its purpose without showing signs of malicious activity. However, the metadata suggests that the maintainer may be less experienced or invested in maintaining the package.

  • Low network, shell, obfuscation, and credential risks.
  • Maintainer's limited history and lack of PyPI classifiers.
Per-check LLM notes
  • Network: The observed network calls appear to be legitimate requests for data from known and reputable sources, likely for functionality purposes.
  • Shell: No shell execution patterns were detected.
  • Obfuscation: No obfuscation patterns detected, indicating low risk.
  • Credentials: No credential harvesting patterns detected, indicating low risk.
  • Metadata: The maintainer has only one package and lacks PyPI classifiers, suggesting low effort or inexperience.

🔬 Heuristic Checks

Outbound Network Calls score 4.5

Found 3 network call pattern(s)

  • "" encodings = json.load(urllib.request.urlopen('https://encoding.spec.whatwg.org/encodings.json'))
  • try: with urllib.request.urlopen(f'https://dumps.wikimedia.org/{l}wiki/{dumpdate}/dum
  • as outf: with urllib.request.urlopen(f'https://dumps.wikimedia.org{url}') as dumpf:
Code Obfuscation

No obfuscation patterns detected

Shell / Subprocess Execution

No shell execution patterns detected

Credential Harvesting

No credential harvesting patterns detected

Typosquatting

No typosquatting candidates detected

Registered Email Domain

No author email provided

Suspicious Page Links

All external links appear legitimate

Git Repository History

Repository chatnoir-eu/chatnoir-resiliparse appears legitimate

Maintainer History score 4.0

2 maintainer concern(s) found

  • Author "Janek Bevendorff" appears to have only 1 package on PyPI (new or inactive account)
  • Package has no PyPI classifiers (low effort / metadata quality)
Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with Resiliparse
Create a mini-application called 'WebAnalyzer' that leverages the Resiliparse Python package to analyze and extract useful information from web archives. This application should allow users to input a URL or upload a WARC file, then perform various analyses on the content, such as identifying the most common words, extracting all URLs within the webpage(s), and detecting the language of the text. Additionally, implement features to visualize the extracted data using basic charts (e.g., word frequency bar chart). Here are the steps to build the application:

1. **Setup**: Install Resiliparse and any other necessary packages like matplotlib for visualization.
2. **Input Handling**: Implement functionality to accept URLs or WARC files as input.
3. **Parsing & Analysis**:
   - Use Resiliparse to parse the web content.
   - Extract common words and their frequencies.
   - Identify all URLs present within the web pages.
   - Determine the primary language of the text.
4. **Visualization**: Create simple visualizations for the extracted data, such as a bar chart showing word frequency.
5. **Output**: Display the analysis results and visualizations in a user-friendly format.
6. **Testing**: Ensure the application works correctly by testing it with different types of inputs.