arxiv-dl

v1.3.1 suspicious
5.0
Medium Risk

Command-line Papers Downloader. Citation extraction and PDF naming automation.

πŸ€– AI Analysis

Final verdict: SUSPICIOUS

The package shows moderate risks due to its use of executing external commands, which could potentially lead to security vulnerabilities such as command injection. Additionally, the maintainer's incomplete profile adds uncertainty.

  • High shell risk due to execution of external commands.
  • Incomplete maintainer profile raising questions about trustworthiness.
Per-check LLM notes
  • Network: The network patterns detected seem to be related to downloading and checking the status of web resources, which is not inherently suspicious but could indicate additional functionality beyond simple ArXiv downloads.
  • Shell: Executing external commands via the shell can introduce significant risks including command injection and unauthorized access, especially if user input is involved.
  • Obfuscation: No obfuscation patterns detected, indicating low risk.
  • Credentials: No credential harvesting patterns detected, indicating low risk.
  • Metadata: The maintainer has an incomplete profile and appears to be new or inactive, which raises some concerns but does not conclusively indicate malicious intent.

πŸ“¦ Package Quality Overall: Medium (6.2/10)

✦ High Test Suite 9.0

Test suite present β€” 7 test file(s) found

  • 7 test file(s) detected (e.g. test_arxiv_id_validator.py)
β—ˆ Medium Documentation 5.0

Some documentation present

  • Detailed PyPI description (5966 chars)
β—‹ Low Contributing Guide 2.0

No contributing guide or governance files found

  • No CONTRIBUTING, CODE_OF_CONDUCT, or governance files found
β—ˆ Medium Type Annotations 5.0

Partial type annotation coverage

  • 42 type-annotated function signatures detected in source
✦ High Multiple Contributors 10.0

Active multi-contributor project

  • 7 unique contributor(s) across 100 commits in MarkHershey/arxiv-dl
  • Active community β€” 5 or more distinct contributors

πŸ”¬ Heuristic Checks

⚠ Outbound Network Calls score 9.0

Found 6 network call pattern(s)

  • wnload fails """ with requests.get(url, stream=True) as response: response.raise_for_st
  • rogress tracking with requests.get(url, stream=True) as response: response.raise_fo
  • " try: response = requests.get("https://www.google.com", timeout=3) return response
  • metadata...") response = requests.get(paper_data.abs_url) if response.status_code != 200:
  • per_id}" # pwc_response = requests.get(pwc_url) # if pwc_response.status_code == 200: #
  • er_id}" bibtex_response = requests.get(bibtex_url) if bibtex_response.status_code == 200:
βœ“ Code Obfuscation

No obfuscation patterns detected

⚠ Shell / Subprocess Execution score 10.0

Found 6 shell execution pattern(s)

  • ons...") completed_proc = subprocess.run( shlex.split(aria2_command), stdout=subproce
  • ef test_simple(self): subprocess.run( f"paper {self.test_target}", shell=
  • ith_inline_env(self): subprocess.run( f"ARXIV_DOWNLOAD_FOLDER={self.test_dir} paper {
  • est_with_flags(self): subprocess.run( f"paper -v -d {self.test_dir} {self.test_target
  • lf.test_target}", shell=True, check=True, ) shutil.rmtree(DE
  • lf.test_target}", shell=True, check=True, ) def test_with_flags
βœ“ Credential Harvesting

No credential harvesting patterns detected

βœ“ Typosquatting

No typosquatting candidates detected

βœ“ Registered Email Domain

Email domain looks legitimate: markhh.com>

βœ“ Suspicious Page Links

All external links appear legitimate

βœ“ Git Repository History

Repository MarkHershey/arxiv-dl appears legitimate

⚠ Maintainer History score 4.0

2 maintainer concern(s) found

  • Author name is missing or very short
  • Author "" appears to have only 1 package on PyPI (new or inactive account)
βœ“ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

πŸ’‘ AI App Starter Prompt

Use this prompt to build a project with arxiv-dl
Create a Python-based command-line tool named 'ArxivPapercraft' that leverages the 'arxiv-dl' package to streamline academic research paper management. This tool should enable users to easily download papers from arXiv.org, extract citation information, and automatically name downloaded PDF files according to specific user preferences. Here’s a detailed breakdown of the requirements and steps to develop this mini-application:

1. **User Input Handling**: Develop a command-line interface where users can input search queries to find papers on arXiv.org. Users should be able to specify keywords, categories, date ranges, etc.
2. **Paper Search and Download**: Utilize the 'arxiv-dl' package to search for papers based on user inputs and download them directly. Ensure the tool handles multiple results gracefully, allowing users to select which papers they want to download.
3. **Citation Extraction**: Implement functionality to automatically extract citation details (title, authors, publication date, DOI, etc.) from the downloaded papers. This feature should work seamlessly with various citation styles (APA, MLA, Chicago).
4. **PDF Naming Automation**: Allow users to customize how their downloaded PDFs are named. Options could include naming by title, author names, DOI, or any combination thereof. The tool should also handle special characters and ensure filenames are valid across different operating systems.
5. **Output Management**: Provide options for output directory customization, where users can specify where the downloaded papers and extracted citations should be saved.
6. **Error Handling and Logging**: Implement robust error handling to manage issues like network failures, invalid user inputs, and unsupported file formats. Additionally, maintain a log file for troubleshooting and auditing purposes.
7. **Optional Features**: Consider adding optional features such as automatic email notifications upon successful downloads, integration with cloud storage services for paper backups, and support for additional citation formats.
8. **Documentation and Testing**: Write comprehensive documentation for both end-users and developers, including installation instructions, usage examples, and API documentation if applicable. Conduct thorough testing to ensure reliability and usability.

πŸ’¬ Discussion Feed

Leave a comment

No discussion yet. Be the first to share your thoughts!