Package Metadata

Author: Apache Software Foundation
Email: [email protected]
PyPI: apache-beam
Python: >=3.10
Versions: 105 releases
First release: 14 Mar 2017, 22:00 UTC
Analysed: 07 Jun 2026, 07:44 UTC
Source files: 66 .py files scanned

Project Links

Classifiers

Intended Audience :: End Users/DesktopLicense :: OSI Approved :: Apache Software LicenseOperating System :: POSIX :: LinuxProgramming Language :: Python :: 3.10Programming Language :: Python :: 3.11Programming Language :: Python :: 3.12Programming Language :: Python :: 3.13Programming Language :: Python :: 3.14Topic :: Software Development :: LibrariesTopic :: Software Development :: Libraries :: Python Modules

🤖 AI Analysis

Final verdict: SAFE

The package is assessed as safe with a low risk score. While there are some elements that could be exploited, such as shell execution and obfuscation techniques, these do not strongly indicate malicious intent. The absence of credential risk and metadata anomalies further supports this conclusion.

moderate shell execution risk
potential for obfuscation

Per-check LLM notes

Network: The network calls appear to be related to downloading resources which is common for packages that require additional files or dependencies.
Shell: The shell execution patterns include subprocess calls which can be legitimate for running scripts or commands during setup or operation but could also indicate potential risks if not properly controlled.
Obfuscation: The code snippet shows potential for obfuscation through pickling and base64 encoding, which could be used for malicious purposes but might also serve legitimate encoding needs.
Credentials: No clear patterns indicating credential harvesting were found in the provided code snippets.
Metadata: The package shows no signs of being malicious or involved in a supply-chain attack.

📦 Package Quality Overall: Low (4.4/10)

✦ High Test Suite 9.0

Test suite present — 29 test file(s) found

29 test file(s) detected (e.g. coders_property_based_test.py)

◈ Medium Documentation 5.0

Some documentation present

Detailed PyPI description (9787 chars)

○ Low Contributing Guide 2.0

No contributing guide or governance files found

No CONTRIBUTING, CODE_OF_CONDUCT, or governance files found

◈ Medium Type Annotations 5.0

Partial type annotation coverage

54 type-annotated function signatures detected in source

○ Low Multiple Contributors 1.0

Unable to verify contributor count: no GitHub repository found

No GitHub repository linked — contributor count unavailable

🔬 Heuristic Checks

⚠ Outbound Network Calls score 1.5

Found 1 network call pattern(s)

Downloading', url) with urllib.request.urlopen(url) as fin: with open(zip + '.tmp', 'wb') a

⚠ Code Obfuscation score 10.0

Found 6 obfuscation pattern(s)

ded): return pickle.loads(base64.b64decode(encoded)) def is_deterministic(self): # type: () -> b
self._run_test(lambda df: df.eval('foo = a + b - c'), df) self._run_test(lambda df: df.que
un_inplace_test(lambda df: df.eval('foo = a + b - c'), df) # Verify that attempting to acc
tedError, lambda: deferred_df.eval('foo = a + @b - c')) self.assertRaises( NotImple
coder.decode(b'\x7f\xdf;dZ\x1c\xac\x08\x00\x00\x00\x01\x0f\x00'), windowed_value.create(0, MIN_TIMESTAMP.micros, (
_fields'): try: pickle.loads(pickle.dumps(t)) except pickle.PicklingError:

⚠ Shell / Subprocess Execution score 10.0

Found 6 shell execution pattern(s)

s)) except: os.system('head -n 100 ' + dest + '*') raise def _run_tru
_subprocess(): result = subprocess.run([sys.executable, '-c', script],
: %s' % command_list) p = subprocess.Popen( command_list, stdin=subprocess.PIPE,
_project_path()] result = subprocess.call(args) if result != 0: raise DistutilsError("pyrefl
ted.') return out = subprocess.run( [sys.executable, os.path.join(cwd, 'gen_protos.py')
generated.') return subprocess.run([ sys.executable, os.path.join(sdk_dir, 'gen

✓ Credential Harvesting

No credential harvesting patterns detected

✓ Typosquatting

No typosquatting candidates detected

✓ Registered Email Domain

Email domain looks legitimate: beam.apache.org

⚠ Suspicious Page Links score 10.0

Found 10 suspicious link(s) on the package page

Non-HTTPS external link: http://beam.apache.org/
Non-HTTPS external link: http://flink.apache.org/
Non-HTTPS external link: http://spark.apache.org/
Non-HTTPS external link: http://cloud.google.com/dataflow/
Non-HTTPS external link: http://research.google.com/archive/mapreduce.html
Non-HTTPS external link: http://research.google.com/pubs/pub35650.html

✓ Git Repository History

No GitHub repository linked

No GitHub repository link found

⚠ Maintainer History score 2.0

1 maintainer concern(s) found

Author "Apache Software Foundation" appears to have only 1 package on PyPI (new or inactive account)

✓ Known CVE Vulnerabilities

No known vulnerabilities found in OSV database.

💡 AI App Starter Prompt

Use this prompt to build a project with apache-beam

Create a data processing pipeline application using the Apache Beam SDK for Python. This application will read a large dataset of user activity logs from a CSV file, process the data to calculate the total number of actions per user, and then write the results to a new CSV file. Additionally, implement a feature to filter out users who have less than a specified threshold of actions. Utilize Apache Beam's capabilities to efficiently handle big data and parallel processing. Here are the steps and features to include in your project:

1. Set up a Python environment with the required dependencies, including the 'apache-beam' package.
2. Design a pipeline that reads a CSV file containing user activity logs. Each log entry should contain at least a user ID and a timestamp.
3. Use Apache Beam's transformations to group the logs by user ID and count the number of actions for each user.
4. Implement a filtering mechanism to exclude users whose action counts fall below a configurable threshold.
5. Write the processed data to a new CSV file, which includes only the filtered results.
6. Optimize the pipeline to efficiently handle large datasets by leveraging Apache Beam's distributed processing capabilities.
7. Include error handling and logging mechanisms to ensure robustness and ease of debugging.
8. Provide a simple command-line interface for users to specify input and output file paths, as well as the action threshold.
9. Document the code thoroughly, explaining the use of Apache Beam components and the overall architecture of the pipeline.

💬 Discussion Feed

No discussion yet. Be the first to share your thoughts!

🤖 AI Analysis

📦 Package Quality Overall: Low (4.4/10)

🔬 Heuristic Checks

💡 AI App Starter Prompt

💬 Discussion Feed

Leave a comment

Report Abuse / Security Issue