README#

pyyaml-include#

GitHub tag Python Package Documentation Status PyPI codecov

Include other YAML files into your YAML documents

pyyaml-include extends PyYAML with a simple, powerful include mechanism. Support local files, HTTP, S3, SFTP, and more through fsspec.


Quick Start#

pip install "pyyaml-include"
# main.py
import yaml
import yaml_include

# Register the include tag
yaml.add_constructor("!inc", yaml_include.Constructor())

# Load YAML with includes
with open("config.yml") as f:
    config = yaml.full_load(f)
# config.yml
database: !inc config/database.yml
features: !inc config/features/*.yml

That’s it! Your YAML files are now merged together.


Table of Contents#


Why pyyaml-include?#

Feature

Description

Simple

Just add !inc tag to your YAML

Flexible

Include local files, HTTP, S3, SFTP, and more

Powerful

Wildcard patterns, nested includes, custom loaders

Production-ready

Comprehensive tests, type hints, documentation


Installation#

Basic Installation#

pip install "pyyaml-include"

With Remote File Support#

# HTTP/HTTPS files
pip install "pyyaml-include" fsspec[http]

# S3 files
pip install "pyyaml-include" fsspec[s3]

# SFTP files
pip install "pyyaml-include" fsspec[sftp]

# Multiple sources
pip install "pyyaml-include" fsspec[http,s3,sftp]

See fsspec documentation for all supported filesystems.


Basic Usage#

Include Single File#

Directory structure:

config/
├── main.yml
└── database.yml

config/database.yml:

host: localhost
port: 5432
name: mydb

config/main.yml:

database: !inc database.yml
app:
  name: MyApp

Result:

{
    'database': {'host': 'localhost', 'port': 5432, 'name': 'mydb'},
    'app': {'name': 'MyApp'}
}

Include Multiple Files#

Directory structure:

config/
├── main.yml
└── features/
    ├── auth.yml
    ├── payment.yml
    └── notification.yml

config/main.yml:

features: !inc features/*.yml

Result:

features:
  - # contents of auth.yml
  - # contents of payment.yml
  - # contents of notification.yml

Nested Includes#

config/main.yml:

base: !inc base.yml
environment:
  production: !inc env/production.yml
  development: !inc env/development.yml

config/env/production.yml:

database: !inc database/production.yml

Nested includes work automatically - no additional configuration needed.


Advanced Usage#

Remote Files (HTTP, S3, SFTP)#

HTTP Example:

import yaml
import fsspec
import yaml_include

# Create HTTP filesystem
http_fs = fsspec.filesystem(
    "http",
    client_kwargs={"base_url": "https://example.com"}
)

# Register with HTTP filesystem
yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(fs=http_fs, base_dir="/config"),
    yaml.Loader
)
# Your YAML
logging: !inc logging.yml
database: !inc database/production.yml

S3 Example:

s3_fs = fsspec.filesystem("s3", key="YOUR_KEY", secret="YOUR_SECRET")
yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(fs=s3_fs, base_dir="my-bucket/config"),
    yaml.Loader
)
# Load from S3
config: !inc app-settings.yml

Custom File Formats (JSON, TOML)#

You can include non-YAML files using a custom loader:

import json
import tomllib as toml
import yaml
import yaml_include

def custom_loader(urlpath, file, Loader):
    """Load JSON, TOML, or YAML files."""
    if urlpath.endswith(".json"):
        return json.load(file)
    if urlpath.endswith(".toml"):
        return toml.load(file)
    # Default to YAML
    return yaml.load(file, Loader)

# Create constructor with custom loader
ctor = yaml_include.Constructor(custom_loader=custom_loader)
yaml.add_constructor("!inc", ctor, yaml.Loader)
# Now you can include JSON and TOML files
package_json: !inc package.json
config_toml: !inc pyproject.toml
config_yaml: !inc settings.yml

Serialization Support#

To preserve include statements when dumping YAML:

import yaml
import yaml_include

# Create constructor without auto-loading
ctor = yaml_include.Constructor(autoload=False)
yaml.add_constructor("!inc", ctor)

# Add representer for serialization
rpr = yaml_include.Representer("inc")
yaml.add_representer(yaml_include.Data, rpr)

# Load without resolving includes
data = yaml.load(yaml_string, yaml.Loader)
# data contains yaml_include.Data objects, not loaded content

# Serialize (preserves !inc tags)
yaml_str = yaml.dump(data)

# Load and resolve includes
ctor.autoload = True
loaded = yaml.load(yaml_str, yaml.Loader)

Wildcard Patterns#

Supported wildcards (shell-style):

Pattern

Matches

*

Any characters

?

Single character

[abc]

One of a, b, or c

**

Recursive directory search

# All YAML files in directory
files: !inc config/*.yml

# All YAML files recursively
all_files: !inc config/**/*.yml

# Files matching pattern
specific: !inc logs/app-*.yml

⚠️ Warning: Using ** in large directories or remote filesystems can be slow. All matched files are loaded into memory.


Limitations#

Merge Keys and Anchors#

⚠️ Merge keys (<<) and anchors (&/`*) don’t work with include tags.

This is a fundamental limitation of PyYAML’s architecture - merge key and anchor validation happens before custom tags are processed.

Instead of this (won’t work):

<<: !inc config/base.yml

Use one of these alternatives:

  1. Include separately, then merge in Python:

    base: !inc config/base.yml
    override: !inc config/override.yml
    
    config = yaml.load(yaml_string)
    merged = {**config['base'], **config['override']}
    
  2. Use wildcard includes for multiple files:

    configs: !inc config.d/*.yml  # Returns a list
    
    config = yaml.load(yaml_string)
    # Merge all configs
    merged = {}
    for cfg in config['configs']:
        merged.update(cfg)
    

See issues #45 and #53 for more details.


Reference#

Constructor Options#

yaml_include.Constructor(
    fs=None,           # fsspec filesystem (default: local filesystem)
    base_dir=None,     # Base directory for relative paths
    autoload=True,     # Auto-load included files (False returns Data objects)
    custom_loader=None # Custom loader function for non-YAML files
)

Example:

# Local files with base directory
yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(base_dir="/path/to/config")
)

# HTTP remote files
http_fs = fsspec.filesystem("http", client_kwargs={"base_url": "https://example.com"})
yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(fs=http_fs, base_dir="/config")
)

# Without auto-loading (for serialization)
yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(autoload=False)
)

YAML Tag Parameters#

The !inc tag supports multiple parameter formats:

Simple string (most common):

files: !inc config/*.yml

Sequence (positional parameters):

# With encoding
files: !inc ["config/*.yml", {encoding: utf-8}]

# With maxdepth for recursive search
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}]

# Both glob and open parameters
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf-16}]

Mapping (named parameters):

files: !inc {urlpath: config/*.yml, encoding: utf-8}
Parameter Passing Details#

How parameters are passed depends on the URL pattern:

URL Pattern

Has Wildcard

Has Scheme

Behavior

file.yml

No

No

fs.open(path)

*.yml

Yes

No

fs.glob()fs.open() for each

http://.../file.yml

No

Yes

fsspec.open() (ignores fs)

http://.../*.yml

Yes

Yes

fsspec.open_files() (ignores fs)

Advanced: Separate glob and open parameters

When using wildcards without a scheme, you can specify separate parameters for glob and open:

# Mapping form
files: !inc {urlpath: "config/**/*.yml", glob: {maxdepth: 2}, open: {encoding: utf-16}}

# Sequence form
files: !inc ["config/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf-16}]

Path Resolution#

Relative paths are resolved relative to base_dir:

yaml.add_constructor(
    "!inc",
    yaml_include.Constructor(base_dir="/app/config")
)
# Loads /app/config/database.yml
db: !inc database.yml

# Loads /app/config/sub/settings.yml
settings: !inc sub/settings.yml

# Absolute path - base_dir is ignored
absolute: !inc /other/path/file.yml

Without base_dir: Relative paths use the current working directory (for local filesystem) or may fail (for remote filesystems).

Full URLs (with scheme) ignore base_dir:

# base_dir is ignored for full URLs
remote: !inc https://example.com/config.yml

Migration Guide#

Upgrading from v1.x to v2.x#

⚠️ Breaking Change: Version 2.0 is NOT compatible with 1.0

Key changes:

  1. fsspec integration: All file operations now use fsspec

  2. Parameter passing: New parameter syntax for advanced use cases

  3. Wildcard behavior: Improved wildcard support with glob patterns

Basic usage remains the same:

# This still works
yaml.add_constructor("!inc", yaml_include.Constructor())

Advanced usage requires updates:

# v1.x
yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/path'))

# v2.x - same, but with fsspec backend
yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/path'))
# fs defaults to fsspec.filesystem("file")

See full documentation for detailed migration notes.



License#

GPL-3.0-or-later


How to Build the Documentation#

  1. The documentation is built using Sphinx.

    We need to install the package itself in editable mode, Sphinx, and some of its extensions used in the documentation:

    • Using pip:

      pip install -e . --group docs
      
    • Using uv:

      uv sync --group docs
      
  2. Generate API documentation.

    If it’s the first time building the documentation, or if the source tree has changed, you may need a clean docs/apidoc directory and regenerate the API documentation:

    sphinx-apidoc -H "" -feo docs/apidoc src
    
  3. Build HTML documentation:

    • Using the Make tool (on Unix-like systems):

      make -C docs html
      
    • On Windows:

      docs\make html
      

The built static website is located at docs/_build/html. You can serve it with a simple HTTP server:

python -m http.server -d docs/_build/html

Then open http://localhost:8000/ in a web browser.

Tip

Try another port if 8000 is already in use. For example, to serve on port 8080:

python -m http.server -d docs/_build/html 8080

See also

Python stdlib’s http.server