Package Metadata

Author: —
Email: Scrapy developers <[email protected]>
PyPI: Scrapy
Python: >=3.10
Versions: 112 releases
First release: 12 Dec 2009, 23:18 UTC
Analysed: 05 Jun 2026, 20:42 UTC
Source files: 61 .py files scanned

Project Links

Documentation Homepage Release notes Source Tracker

Classifiers

Development Status :: 5 - Production/StableEnvironment :: ConsoleFramework :: ScrapyIntended Audience :: DevelopersOperating System :: OS IndependentProgramming Language :: PythonProgramming Language :: Python :: 3Programming Language :: Python :: 3.10Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12

🤖 AI Analysis

Final verdict: SUSPICIOUS

The package has significant risks associated with shell execution and code obfuscation, which could potentially allow for malicious activities. However, there is no clear evidence of credential harvesting or supply-chain attacks.

High risk from use of os.system and subprocess.Popen
Significant risk from use of eval for dynamic code execution

Per-check LLM notes

Network: No direct network calls are detected that would suggest unexpected behavior.
Shell: The use of os.system and subprocess.Popen for shell execution could pose risks if not properly sanitized or controlled, potentially allowing for arbitrary command execution.
Obfuscation: The use of eval with dynamic code execution poses a significant risk for code injection and obfuscation.
Credentials: No clear evidence of credential harvesting is present in the provided snippet.
Metadata: The author's information is lacking and they seem to be new or inactive, which raises some concerns.

🔬 Heuristic Checks

⚠ Outbound Network Calls score 3.0

Found 2 network call pattern(s)

None: old_reqs = self.requests.get(lvl, []) self.requests[lvl] = old_reqs + new_reqs
: requests = self.requests.get(lvl, []) elif self.requests: requests =

⚠ Code Obfuscation score 2.0

Found 1 obfuscation pattern(s)

self.code: print(eval(self.code, globals(), self.vars)) # noqa: S307 else

⚠ Shell / Subprocess Execution score 6.0

Found 3 shell execution pattern(s)

.py") self.exitcode = os.system(f'{editor} "{sfile}"') # noqa: S605 from __future__ import
self.exitcode = os.system(f'scrapy edit "{name}"') # noqa: S605 def _generate_te
hserver"] self.proc = subprocess.Popen( # noqa: S603 pargs, stdout=subprocess.PIPE, en

✓ Credential Harvesting

No credential harvesting patterns detected

⚠ Typosquatting score 3.0

Possible typosquat of: scipy

"Scrapy" is 2 edit(s) from "scipy"

✓ Registered Email Domain

Email domain looks legitimate: pablohoffman.com>

✓ Suspicious Page Links

All external links appear legitimate

✓ Git Repository History

Repository scrapy/scrapy appears legitimate

⚠ Maintainer History score 4.0

2 maintainer concern(s) found

Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)

⚠ Known CVE Vulnerabilities score 3.0

Found 1 vulnerability/vulnerabilities in OSV database.

CVE-2017-14158: No summary provided

💡 AI App Starter Prompt

Use this prompt to build a project with Scrapy

Create a web scraping tool using the Scrapy framework in Python to gather information on popular programming languages from GitHub. This tool should be able to scrape data such as the number of stars, forks, and contributors for each language. Additionally, it should provide an option to filter results based on specific criteria, such as the top 10 most starred repositories for a given language. Here’s a breakdown of the project requirements:

1. **Setup**: Begin by setting up a new Scrapy project. Ensure you have Scrapy installed in your virtual environment.
2. **Target Website**: Choose GitHub as the target website. Focus on scraping data from the 'Trending' page, specifically targeting the 'Programming Languages' section.
3. **Data Extraction**: Define the items that need to be scraped, including but not limited to the language name, number of stars, forks, and contributors count. Use Scrapy selectors to extract these details efficiently.
4. **Filtering Options**: Implement filtering options that allow users to specify a programming language and/or a range of stars to narrow down the search results.
5. **Output Format**: Store the extracted data in a structured format like JSON or CSV. Optionally, implement a feature to visualize the data using a simple chart or graph.
6. **Error Handling & Logging**: Incorporate error handling mechanisms to manage HTTP errors and connection timeouts. Also, set up logging to track the scraping process.
7. **Scraping Frequency**: Design the application to allow periodic updates to the scraped data, ensuring the information remains current.
8. **Testing**: Write tests to ensure the scraper functions correctly under different scenarios, including edge cases where the target webpage structure changes slightly.
9. **Documentation**: Provide comprehensive documentation on how to run the scraper, including setup instructions and usage examples.

By following these steps, you will develop a robust web scraping tool that leverages Scrapy’s powerful capabilities to extract valuable information from GitHub’s trending programming languages page.