Package Metadata

Author: —
Email: deepset GmbH <[email protected]>
PyPI: amazon-textract-haystack
Python: >=3.10
Versions: 1 release
First release: 22 May 2026, 11:06 UTC
Analysed: 07 Jun 2026, 01:14 UTC
Source files: 7 .py files scanned

Project Links

Classifiers

Development Status :: 4 - BetaLicense :: OSI Approved :: Apache Software LicenseProgramming Language :: PythonProgramming Language :: Python :: 3.10Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12Programming Language :: Python :: 3.13Programming Language :: Python :: 3.14Programming Language :: Python :: Implementation :: CPython

🤖 AI Analysis

Final verdict: SAFE

The package shows low risk indicators with no network or shell risks and minimal obfuscation. While there is a slight concern regarding metadata and credentials, these do not strongly suggest malicious intent.

No network or shell risks detected.
Low obfuscation risk.

Per-check LLM notes

Network: No network calls detected, which is normal for packages that don't require external services.
Shell: No shell execution patterns detected, indicating no direct system command execution.
Obfuscation: No obfuscation patterns detected.
Credentials: The observed patterns are likely for conditional skipping of tests based on environment variables, not for credential harvesting.
Metadata: The package is new and lacks maintainer history, which raises some concerns but does not definitively indicate malice.

📦 Package Quality Overall: Medium (6.2/10)

✦ High Test Suite 9.0

Test suite present — 3 test file(s) found

Test runner config found: pyproject.toml
3 test file(s) detected (e.g. __init__.py)

◈ Medium Documentation 7.0

Some documentation present

Documentation URL: "Documentation" -> https://github.com/deepset-ai/haystack-core-integrations/tre
Detailed PyPI description (4145 chars)

○ Low Contributing Guide 4.0

No contributing guide or governance files found

Development Status classifier >= Beta

○ Low Type Annotations 1.0

No type annotations detected

No type annotations, py.typed marker, or stub files detected

✦ High Multiple Contributors 10.0

Active multi-contributor project

16 unique contributor(s) across 100 commits in deepset-ai/haystack-core-integrations
Active community — 5 or more distinct contributors

🔬 Heuristic Checks

✓ Outbound Network Calls

No suspicious network call patterns found

✓ Code Obfuscation

No obfuscation patterns detected

✓ Shell / Subprocess Execution

No shell execution patterns detected

⚠ Credential Harvesting score 5.0

Found 2 credential access pattern(s)

@pytest.mark.skipif(not os.environ.get("AWS_ACCESS_KEY_ID"), reason=SKIP_REASON_NO_CREDENTIALS) @pyt
) @pytest.mark.skipif(not os.environ.get("AWS_DEFAULT_REGION"), reason=SKIP_REASON_NO_REGION) def test

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: deepset.ai>

✓ Suspicious Page Links

All external links appear legitimate

✓ Git Repository History

Repository deepset-ai/haystack-core-integrations appears legitimate

⚠ Maintainer History score 6.0

3 maintainer concern(s) found

Only one version has ever been released — brand new package
Author name is missing or very short
Author "" appears to have only 1 package on PyPI (new or inactive account)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with amazon-textract-haystack

Create a Python-based mini-application called 'DocAnalyzer' that leverages the 'amazon-textract-haystack' package to analyze scanned documents. This application will allow users to upload PDF files containing scanned text and then perform various operations on the extracted data such as searching for specific keywords, extracting tables, and summarizing the content. Here’s a detailed breakdown of the project requirements:

1. **User Interface**: Develop a simple command-line interface (CLI) where users can interact with the application. The CLI should provide options like uploading a document, searching for text, extracting tables, and generating summaries.
2. **Document Upload**: Implement functionality to accept PDF uploads from local storage or via a URL. Ensure that the application supports both single-page and multi-page PDFs.
3. **Text Extraction**: Utilize 'amazon-textract-haystack' to extract text from the uploaded documents. The package should handle the conversion of scanned text into searchable text using AWS Textract services.
4. **Keyword Search**: Allow users to search for specific keywords within the extracted text. Provide an option to display the sentences or paragraphs containing these keywords.
5. **Table Extraction**: Implement a feature to identify and extract tables from the document. Users should be able to view the extracted table data in a structured format (e.g., CSV).
6. **Content Summary**: Generate a summary of the document's content. Use natural language processing techniques to create concise summaries that capture the essence of the document.
7. **Error Handling**: Include robust error handling mechanisms to manage issues such as unsupported file formats, connection errors, and timeouts.
8. **Configuration Management**: Enable users to configure settings such as API keys for AWS services and preferred output formats for extracted data.
9. **Testing and Documentation**: Write unit tests to ensure the reliability of each feature. Provide comprehensive documentation detailing how to install and use the application, along with examples.

The 'amazon-textract-haystack' package plays a crucial role in this project by providing the necessary tools to integrate AWS Textract functionalities into the application. It simplifies the process of text extraction from scanned documents, making it easier to implement advanced features like keyword search and table extraction.

💬 Discussion Feed

No discussion yet. Be the first to share your thoughts!

🤖 AI Analysis

📦 Package Quality Overall: Medium (6.2/10)

🔬 Heuristic Checks

💡 AI App Starter Prompt

💬 Discussion Feed

Leave a comment

Report Abuse / Security Issue