README#
pyyaml-include#
Include other YAML files into your YAML documents
pyyaml-include extends PyYAML with a simple, powerful include mechanism. Support local files, HTTP, S3, SFTP, and more through fsspec.
Quick Start#
pip install "pyyaml-include"
# main.py
import yaml
import yaml_include
# Register the include tag
yaml.add_constructor("!inc", yaml_include.Constructor())
# Load YAML with includes
with open("config.yml") as f:
config = yaml.full_load(f)
# config.yml
database: !inc config/database.yml
features: !inc config/features/*.yml
That’s it! Your YAML files are now merged together.
Table of Contents#
Why pyyaml-include?#
Feature |
Description |
|---|---|
Simple |
Just add |
Flexible |
Include local files, HTTP, S3, SFTP, and more |
Powerful |
Wildcard patterns, nested includes, custom loaders |
Production-ready |
Comprehensive tests, type hints, documentation |
Installation#
Basic Installation#
pip install "pyyaml-include"
With Remote File Support#
# HTTP/HTTPS files
pip install "pyyaml-include" fsspec[http]
# S3 files
pip install "pyyaml-include" fsspec[s3]
# SFTP files
pip install "pyyaml-include" fsspec[sftp]
# Multiple sources
pip install "pyyaml-include" fsspec[http,s3,sftp]
See fsspec documentation for all supported filesystems.
Basic Usage#
Include Single File#
Directory structure:
config/
├── main.yml
└── database.yml
config/database.yml:
host: localhost
port: 5432
name: mydb
config/main.yml:
database: !inc database.yml
app:
name: MyApp
Result:
{
'database': {'host': 'localhost', 'port': 5432, 'name': 'mydb'},
'app': {'name': 'MyApp'}
}
Include Multiple Files#
Directory structure:
config/
├── main.yml
└── features/
├── auth.yml
├── payment.yml
└── notification.yml
config/main.yml:
features: !inc features/*.yml
Result:
features:
- # contents of auth.yml
- # contents of payment.yml
- # contents of notification.yml
Nested Includes#
config/main.yml:
base: !inc base.yml
environment:
production: !inc env/production.yml
development: !inc env/development.yml
config/env/production.yml:
database: !inc database/production.yml
Nested includes work automatically - no additional configuration needed.
Advanced Usage#
Remote Files (HTTP, S3, SFTP)#
HTTP Example:
import yaml
import fsspec
import yaml_include
# Create HTTP filesystem
http_fs = fsspec.filesystem(
"http",
client_kwargs={"base_url": "https://example.com"}
)
# Register with HTTP filesystem
yaml.add_constructor(
"!inc",
yaml_include.Constructor(fs=http_fs, base_dir="/config"),
yaml.Loader
)
# Your YAML
logging: !inc logging.yml
database: !inc database/production.yml
S3 Example:
s3_fs = fsspec.filesystem("s3", key="YOUR_KEY", secret="YOUR_SECRET")
yaml.add_constructor(
"!inc",
yaml_include.Constructor(fs=s3_fs, base_dir="my-bucket/config"),
yaml.Loader
)
# Load from S3
config: !inc app-settings.yml
Custom File Formats (JSON, TOML)#
You can include non-YAML files using a custom loader:
import json
import tomllib as toml
import yaml
import yaml_include
def custom_loader(urlpath, file, Loader):
"""Load JSON, TOML, or YAML files."""
if urlpath.endswith(".json"):
return json.load(file)
if urlpath.endswith(".toml"):
return toml.load(file)
# Default to YAML
return yaml.load(file, Loader)
# Create constructor with custom loader
ctor = yaml_include.Constructor(custom_loader=custom_loader)
yaml.add_constructor("!inc", ctor, yaml.Loader)
# Now you can include JSON and TOML files
package_json: !inc package.json
config_toml: !inc pyproject.toml
config_yaml: !inc settings.yml
Serialization Support#
To preserve include statements when dumping YAML:
import yaml
import yaml_include
# Create constructor without auto-loading
ctor = yaml_include.Constructor(autoload=False)
yaml.add_constructor("!inc", ctor)
# Add representer for serialization
rpr = yaml_include.Representer("inc")
yaml.add_representer(yaml_include.Data, rpr)
# Load without resolving includes
data = yaml.load(yaml_string, yaml.Loader)
# data contains yaml_include.Data objects, not loaded content
# Serialize (preserves !inc tags)
yaml_str = yaml.dump(data)
# Load and resolve includes
ctor.autoload = True
loaded = yaml.load(yaml_str, yaml.Loader)
Wildcard Patterns#
Supported wildcards (shell-style):
Pattern |
Matches |
|---|---|
|
Any characters |
|
Single character |
|
One of a, b, or c |
|
Recursive directory search |
# All YAML files in directory
files: !inc config/*.yml
# All YAML files recursively
all_files: !inc config/**/*.yml
# Files matching pattern
specific: !inc logs/app-*.yml
⚠️ Warning: Using
**in large directories or remote filesystems can be slow. All matched files are loaded into memory.
Limitations#
Merge Keys and Anchors#
⚠️ Merge keys (<<) and anchors (&/`*) don’t work with include tags.
This is a fundamental limitation of PyYAML’s architecture - merge key and anchor validation happens before custom tags are processed.
Instead of this (won’t work):
<<: !inc config/base.yml
Use one of these alternatives:
Include separately, then merge in Python:
base: !inc config/base.yml override: !inc config/override.yml
config = yaml.load(yaml_string) merged = {**config['base'], **config['override']}
Use wildcard includes for multiple files:
configs: !inc config.d/*.yml # Returns a list
config = yaml.load(yaml_string) # Merge all configs merged = {} for cfg in config['configs']: merged.update(cfg)
See issues #45 and #53 for more details.
Reference#
Constructor Options#
yaml_include.Constructor(
fs=None, # fsspec filesystem (default: local filesystem)
base_dir=None, # Base directory for relative paths
autoload=True, # Auto-load included files (False returns Data objects)
custom_loader=None # Custom loader function for non-YAML files
)
Example:
# Local files with base directory
yaml.add_constructor(
"!inc",
yaml_include.Constructor(base_dir="/path/to/config")
)
# HTTP remote files
http_fs = fsspec.filesystem("http", client_kwargs={"base_url": "https://example.com"})
yaml.add_constructor(
"!inc",
yaml_include.Constructor(fs=http_fs, base_dir="/config")
)
# Without auto-loading (for serialization)
yaml.add_constructor(
"!inc",
yaml_include.Constructor(autoload=False)
)
YAML Tag Parameters#
The !inc tag supports multiple parameter formats:
Simple string (most common):
files: !inc config/*.yml
Sequence (positional parameters):
# With encoding
files: !inc ["config/*.yml", {encoding: utf-8}]
# With maxdepth for recursive search
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}]
# Both glob and open parameters
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf-16}]
Mapping (named parameters):
files: !inc {urlpath: config/*.yml, encoding: utf-8}
Parameter Passing Details#
How parameters are passed depends on the URL pattern:
URL Pattern |
Has Wildcard |
Has Scheme |
Behavior |
|---|---|---|---|
|
No |
No |
|
|
Yes |
No |
|
|
No |
Yes |
|
|
Yes |
Yes |
|
Advanced: Separate glob and open parameters
When using wildcards without a scheme, you can specify separate parameters for glob and open:
# Mapping form
files: !inc {urlpath: "config/**/*.yml", glob: {maxdepth: 2}, open: {encoding: utf-16}}
# Sequence form
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf-16}]
Path Resolution#
Relative paths are resolved relative to base_dir:
yaml.add_constructor(
"!inc",
yaml_include.Constructor(base_dir="/app/config")
)
# Loads /app/config/database.yml
db: !inc database.yml
# Loads /app/config/sub/settings.yml
settings: !inc sub/settings.yml
# Absolute path - base_dir is ignored
absolute: !inc /other/path/file.yml
Without base_dir: Relative paths use the current working directory (for local filesystem) or may fail (for remote filesystems).
Full URLs (with scheme) ignore base_dir:
# base_dir is ignored for full URLs
remote: !inc https://example.com/config.yml
Migration Guide#
Upgrading from v1.x to v2.x#
⚠️ Breaking Change: Version 2.0 is NOT compatible with 1.0
Key changes:
fsspec integration: All file operations now use fsspec
Parameter passing: New parameter syntax for advanced use cases
Wildcard behavior: Improved wildcard support with glob patterns
Basic usage remains the same:
# This still works
yaml.add_constructor("!inc", yaml_include.Constructor())
Advanced usage requires updates:
# v1.x
yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/path'))
# v2.x - same, but with fsspec backend
yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/path'))
# fs defaults to fsspec.filesystem("file")
See full documentation for detailed migration notes.
Links#
Documentation: pyyaml-include.readthedocs.io
GitHub: github.com/tanbro/pyyaml-include
License#
GPL-3.0-or-later
How to Build the Documentation#
The documentation is built using Sphinx.
We need to install the package itself in editable mode, Sphinx, and some of its extensions used in the documentation:
Generate API documentation.
If it’s the first time building the documentation, or if the source tree has changed, you may need a clean
docs/apidocdirectory and regenerate the API documentation:sphinx-apidoc -H "" -feo docs/apidoc src
Build HTML documentation:
Using the Make tool (on Unix-like systems):
make -C docs html
On Windows:
docs\make html
The built static website is located at docs/_build/html. You can serve it with a simple HTTP server:
python -m http.server -d docs/_build/html
Then open http://localhost:8000/ in a web browser.
Tip
Try another port if 8000 is already in use.
For example, to serve on port 8080:
python -m http.server -d docs/_build/html 8080
See also
Python stdlib’s http.server