OpenTelemetry instrumentation

Edit on GitHub

This document describes how to configure and instrument application for OpenTelemetry (OTel). It contains a brief overview of basic OTel concepts, but reading OpenTelemetry documentation first is recommended.

Convention

The current implementation follows OpenTelemetry Semantic Conventions 1.30.0.

Trace

A trace represents a single transaction. It has a unique ID and all spans are related to it. Each trace has a name that is defined automatically or can be changed as described in Integration.

Span

A span represents a unit of work or operation, similar to executing a single method. As the building blocks of traces, spans contain the following information:

  • Name
  • Parent span ID (empty for root spans)
  • Start and End Timestamps
  • Span Context
  • Attributes
  • Span Events
  • Span Links
  • Span Status

Hook

A hook is a function that executes a closure before and after method execution, providing a way to instrument code without modifying it directly. When you execute a method, a pre closure is executed and opens a span. After the method is executed, a post closure is executed and caught. Then, missing attributes are added and the span is closed.

All hooks that are autogenerated or provided by a library, such as Propel or Redis hooks provided by Spryker, are registered automatically.

If you want to register a hook for a class, make sure the registration is executed before the method you want to instrument is called.

Hook example
<?php
\OpenTelemetry\Instrumentation\hook(
                class: MyClass::class, //Class name that should be instrumented.
                function: 'methodName', //Method of this class. It can be even a private method.
                pre: static function (
                $instance, //Instance of the MyClass
                array $params, //Incoming method parameters.
                string $class, //Class name as a string
                string $function, //Method name
                ?string $filename, //Actual file name that is executed
                ?int $lineno //Number of the line where this method is triggered
                ) {
                    //Context is used to keep all spans connected. In this case this is a parent span.
                    $context = \OpenTelemetry\Context\Context::getCurrent();

                    $span = \Spryker\Shared\OpenTelemetry\Instrumentation\CachedInstrumentation::getCachedInstrumentation()
                        ->tracer()
                        ->spanBuilder('ModuleName-MyClass::methodName')//Span name can be not unique, but it would easier to make it such if you want to find it easily.
                        ->setParent($context)//Here parent span is attached to the current span.
                        ->setAttribute(\OpenTelemetry\SemConv\TraceAttributes::CODE_FUNCTION, $function)//You can attach almost everything as a param as long as the value is scalar. Null value means that attribute will be omitted.
                        ->setAttribute(\OpenTelemetry\SemConv\TraceAttributes::CODE_NAMESPACE, $class)
                        ->setAttribute(\OpenTelemetry\SemConv\TraceAttributes::CODE_FILEPATH, $filename)
                        ->setAttribute(\OpenTelemetry\SemConv\TraceAttributes::CODE_LINENO, $lineno)
                        ->startSpan();

                    //Here span is attached to the global context
                    \OpenTelemetry\Context\Context::storage()->attach($span->storeInContext($context));
                },

                post: static function (
                $instance,
                array $params,
                $returnValue, //Result of the method execution.
                ?\Throwable $exception //Exception if one was thrown during the execution
                ) {
                    $scope = \OpenTelemetry\Context\Context::storage()->scope();

                    if (null === $scope) {
                        return;
                    }

                    //Here you can just check the $exception value. But in some cases you might want to take if from other places. E.g. if method that thrown exception was not instrumented, parent span should still have it.
                    $error = error_get_last();

                    if (is_array($error) && in_array($error['type'], [E_ERROR, E_CORE_ERROR, E_COMPILE_ERROR, E_PARSE], true)) {
                        $exception = new \Exception(
                            'Error: ' . $error['message'] . ' in ' . $error['file'] . ' on line ' . $error['line']
                        );
                    }

                    $scope->detach();
                    $span = \Spryker\Service\Opentelemetry\Instrumentation\Span\Span::fromContext($scope->context());

                    if ($exception !== null) {
                        $span->recordException($exception);//Exception will be attached as an event into the span.
                        $span->setAttribute('error_message', $exception->getMessage());
                        $span->setAttribute('error_code', $exception->getCode());
                    }

                    //Status code adds some visibility. Error status code will mark your span as a one with an error for easier navigation.
                    $span->setStatus($exception !== null ? \OpenTelemetry\API\Trace\StatusCode::STATUS_ERROR : \OpenTelemetry\API\Trace\StatusCode::STATUS_OK);

                    //Span ends and sent into a span processor to be validated and prepared for exporting.
                    $span->end();
                }
            );

Collector

Collector collects traces and sends them to a monitoring platform. Traces are sent to collector after that request is sent so it doesn’t affect response time. Collector operates separately from the application and should be set up by a Cloud engineer or you can add one for you local setup yourself.

Integration

Run the latest version of the script from the Installer repo.

If you want to integrate manually, the following sections describe all steps of the scripts.

Install required packages

OTel provides instrumentation via packages that can be installed to register hooks automatically. If you want to instrument additional parts of an application, such as Symfony code, you can install respective packages from Registry or other sources.

Install third-party packages at your own risk.

The spryker/opentelemetry package covers the essential parts of the integration:

  • The entry point for instrumentation
  • Plugin to wire in your monitoring service
  • A console command to generate hooks for project’s code, which creates spans automatically
  • Instrumentation of Propel, Redis, ElasticSearch, RabbitMQ, and Guzzle calls

Optional: Install the Monitoring module

The Monitoring module enables you to add custom attributes and events, change trace names during the request execution, and add exceptions to the root span for visibility.

You can get the Monitoring module from the Packagist.

Install the module and wire the Monitoring plugin.

<?php

namespace Pyz\Service\Monitoring;

use Spryker\Service\Monitoring\MonitoringDependencyProvider as SprykerMonitoringDependencyProvider;
use Spryker\Service\Opentelemetry\Plugin\OpentelemetryMonitoringExtensionPlugin;

class MonitoringDependencyProvider extends SprykerMonitoringDependencyProvider
{
    /**
     * @return array<\Spryker\Service\MonitoringExtension\Dependency\Plugin\MonitoringExtensionPluginInterface>
     */
    protected function getMonitoringExtensions(): array
    {
        return [
            new OpentelemetryMonitoringExtensionPlugin(),
        ];
    }
}

You can call methods from Monitoring service, and they will be translated to OTel actions. Some methods act as placeholders because they’re are not implemented in OTel–for example, \Spryker\Service\Opentelemetry\Plugin\OpentelemetryMonitoringExtensionPlugin::markStartTransaction().

Wire a console command

Spryker is a large application, so manually creating all hooks is impractical. The OpentelemetryGeneratorConsole command automates hook generation for classes you want to cover with spans.

<?php

namespace Pyz\Zed\Console;

...
use Spryker\Zed\Opentelemetry\Communication\Plugin\Console\OpentelemetryGeneratorConsole;
...

class ConsoleDependencyProvider extends SprykerConsoleDependencyProvider
{
    protected function getConsoleCommands(Container $container): array
    {
        $commands = [
            ...
            new OpentelemetryGeneratorConsole(),
            ...
        ];

        return $commands;
    }
}

Wire this console command into your install script to run on every deployment during the base container image build. Place it last in the build section, ensuring it runs after all code modifications and generation.

sections:
    build:
        generate-open-telemetry:
            command: 'vendor/bin/console open-telemetry:generate'

Hooks generation configuration

You can control instrumentation by configuring specific methods.

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getExcludedDirs() defines directories to exclude from instrumentation. For example, you don’t need spans from infrastructure code in traces. Several directories are excluded by default, review them in the module’s vendor directory if you need to include any.

class OpentelemetryConfig extends AbstractBundleConfig
{
    // Traces for Monitoring module, OTel module and tests in existing module are not relevant for the monitoring, so those will be excluded. In the actual implementation you can see more directories.
    public function getExcludedDirs(): array
    {
        return [
        ...
            'Monitoring',
            'OpenTelemetry',
            'tests',
        ...
        ];
    }
}

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getExcludedSpans() defines spans by name to exclude from instrumentation. This may be useful when spans you want to include and those you want to keep are in the same directory.

class OpentelemetryConfig extends AbstractBundleConfig
{
    // In this example a span with a 'User-UserFacade::isSystemUser' name will be not generated as it's not relevant for our traces, but in the same time it's called a lot of times during the request
    public function getExcludedSpans(): array
    {
        return [
            ...
            'User-UserFacade::isSystemUser',
            ...
        ];
    }

}

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getPathPatterns() defines the path patterns where the console command should search for classes to instrument with hooks.

By default, all Spryker directories and the Pyz namespace are covered at the project level. To prevent unnecessary spans, avoid instrumenting autogenerated code such as Transfers.

class OpentelemetryConfig extends AbstractBundleConfig
{
    public function getPathPatterns(): array
    {
        return [
            '#^vendor/spryker/[^/]+/.*/.*/(Zed|Shared)/.*/(?!Persistence|Presentation)[^/]+/.*#',
            '#^vendor/spryker/[^/]+/Glue.*#',
            '#^vendor/spryker(?:/spryker)?-shop/[^/]+/.*#',
            '#^vendor/spryker-eco/[^/]+/.*#',
            '#^src/Pyz/.*#',
        ];
    }

}

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getOutputDir() specifies the directory where generated hooks are stored. By default, they’re placed in src/Generated/OpenTelemetry/Hooks/. The classmap.php file, which is used to autoload hook files, is also added to this directory.

class OpentelemetryConfig extends AbstractBundleConfig
{
    public function getOutputDir(): string
    {
        return APPLICATION_SOURCE_DIR . '/Generated/OpenTelemetry/Hooks/';
    }

}

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::areOnlyPublicMethodsInstrumented() defines which methods are instrumented. By default, hooks are generated only for public methods in regular classes and for all methods in Controller classes.

class OpentelemetryConfig extends AbstractBundleConfig
{
    public function areOnlyPublicMethodsInstrumented(): bool
    {
        return true;
    }

}

\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getCriticalClassNamePatterns() identifies spans as critical to prioritize them during sampling. By default, Controllers and Facades are included. This configuration doesn’t use regex but matches based on a substring within the class name.

class OpentelemetryConfig extends AbstractBundleConfig
{
    public function getCriticalClassNamePatterns(): array
    {
        return [
            'Facade',
            'Controller',
        ];
    }

}

Enable PHP extensions

Hooks processing requires you to have a few PHP extensions in place. Spryker has prepared a new PHP image, so you need to install nothing, just enable them in your deploy file.

Hook processing requires specific PHP extensions. There’s a preconfigured PHP image, so you only need to enable the extensions in your deploy file:

namespace: spryker-otel
tag: 'dev'

environment: docker.dev
image:
    tag: spryker/php:8.3-alpine3.20-otel
    php:
        enabled-extensions:
            - opentelemetry
            - grpc
            - protobuf

The blackfire extension conflicts with opentelemetry, so avoid using both simultaneously.

Don’t use newrelic or blackfire extensions simultaneously with the opentelemetry extension simultaneously to avoid conflicts and broken traces. For more details, see Conflicting extensions.

Sampling

Spryker executes a big number of methods per request, many of them repeatedly. Because OTel uses PHP functions to open and close spans, excessive span creation can introduce unnecessary load on your application. To mitigate this, there’re mechanisms to reduce the number of spans sent with traces.

Sampling occurs three times during execution:

  1. Tracing sampling: Determines if the trace should be a root span only, without additional details
  2. Opening span sampling: Decides whether to open a span before execution
  3. Closing span sampling: Filters out extremely fast and successful spans upon closing because they likely hold little value

Trace sampling

A detailed trace for every request or command execution is usually unnecessary. At minimum, a span should capture that the request occurred and whether it contained an error.

To do this, the request is checked during initialization. For HTTP requests, if the method is not GET, the trace is always detailed. If the method is GET, a random number between 0 and 1.0 is generated and compared against a configured probability. If the number is less than the configured probability, the trace includes spans. Otherwise, only a root span is recorded.

The same logic applies to console commands, but with a separate configuration value for finer control.

Opening span sampling

When span tried to start check take place if it should be started, by the same algorithm as trace sampling was done. The only difference is that different configuration value is used and the random number is generated on each and every span starting. Not all spans are equal so different probabilities for different types of spans are used. You can read about criticality of spans below. If decision was to not sample a span, an empty one will be opened. Empty spans are just a placeholder that are used to build a tree properly. They will appear in the trace in any case.

Opening span sampling

On start, each span is checked whether it should be started using the algorithm similar to that used for trace sampling. The differences between algorithms are as follows:

  • A different configuration value is used for span sampling
  • A random number is generated for each span
  • Different span types have different sampling probabilities based on their criticality

If a span is not sampled, an empty span is created instead. Empty spans act as placeholders to maintain the trace structure and always appear in the trace.

Closing span sampling

Fast spans without errors can be discarded from the trace. When a sampled span closes, its execution time and status are checked. If the span is successful and completes faster than a configured threshold, it’s omitted and they don’t appear in the trace. The threshold is configured in OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD or OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD.

Span criticality

Some spans are more relevant to users than others. To manage span sampling effectively, spans are categorized into three levels of criticality: non-critical, regular, and critical.

Each category uses different probability settings and execution time limits, which can be configured separately. For closing span sampling, regular and non-critical spans are treated as the same type.

Critical spans

Spans that execute operations that communicate with other services or change the application’s state should be marked as critical to have a higher chance of appearing of appearing in the trace

The following span types are critical by default:

  • RabbitMQ spans
  • ElasticSearch spans
  • Redis spans
  • Guzzle spans (ignored by the sampling mechanism because they’re required for Distributed Tracing)
  • Propel INSERT/DELETE/UPDATE calls
  • Hooks for classes configured in \Spryker\Zed\Opentelemetry\OpentelemetryConfig::getCriticalClassNamePatterns()

Non-critical spans

Only Propel SELECT calls spans marked as no_critical because every request generates a lot of them, which can easily overflow a trace with useless information.

Regular spans

All other spans are considered as regular.

Sampling configuration

You can adjust sampling values by changing environment variables. Increasing these values will generate more detailed traces but may also slow down your application because more spans will be sampled and sent to the collector.

Variable Name Description Default Value Allowed range
OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD Used in Closing Span Sampling to define a threshold in milliseconds. Spans with an execution time below this value are filtered out. 20 0…100000
OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD Same as a previous one, but used only for critical spans. 10 0…100000
OTEL_TRACES_SAMPLER_ARG Defines the probability of a web GET request trace being detailed. 0.1 0…1.0
OTEL_CLI_TRACE_PROBABILITY Defines the probability of a console command trace to be detailed. 0.5 0…1.0
OTEL_TRACES_CRITICAL_SAMPLER_ARG Defines the probability of a critical span to be sampled 0.5 0…1.0
OTEL_TRACES_NON_CRITICAL_SAMPLER_ARG Defines the probability of a non critical to be sampled 0.1 0…1.0
OTEL_TRACE_PROBABILITY Defines the probability of a regular span to be sampled 0.3 0…1.0

Additional configuration

Variable Name Description Default Value Allowed range
OTEL_SERVICE_NAMESPACE Defines a service namespace used in resource definition spryker any string value
OTEL_SERVICE_NAME_MAPPING A JSON object mapping application URLs to service names; used if no service name is provided via MonitoringService::setApplicationName().
                   | {}              | A valid JSON object where keys represent service names and values define URL patterns.                                                                                                                                            
                                                                                                                                                          |

| OTEL_DEFAULT_SERVICE_NAME | If no service name is provided and OTEL_SERVICE_NAME_MAPPING is not defined, this default name is used.
| Default Service | any valid string | | OTEL_BSP_SCHEDULE_DELAY | Defines the delay in milliseconds before sending a batch of spans to the exporter. A higher value results in larger batches.
| 1000 | 0…100000000 | | OTEL_BSP_MAX_QUEUE_SIZE | Defines the maximum number of spans that can be queued for processing in a single request.
| 2048 | At least an amount of spans you want to see | | OTEL_BSP_MAX_EXPORT_BATCH_SIZE | Defines the batch size for spans. Once this limit is reached, the batch is sent to the exporter.
| 512 | More than 0 and less than OEL_BSP_MAX_QUEUE_SIZE | | OTEL_SDK_DISABLED | If set to true, no traces are generated or sent to the backend. The default value is true; change only after the collector is up and running.
| true | Can be a boolean or a string representation of a boolean, such as true or false.
| | OTEL_PHP_DISABLED_INSTRUMENTATIONS | Disables specific parts of additional instrumentation. For example, to exclude all Redis spans, set the value to spryker_otel_redis. Multiple instrumentation parts can be disabled by providing a comma-separated list. | | spryker_otel_redis, spryker_otel_elastica, spryker_otel_propel, spryker_otel_rabbitmq, spryker_otel_guzzle, all, or a combination such as spryker_otel_rabbitmq,spryker_otel_propel. |

Custom attributes

To add custom data to your traces, such as the logged-in user ID, the current store name, or any other request-specific information, add a custom attribute that appears in the root span for better visibility. There’re attributes that are shipped by default, such as current store name or locale.

We recommend adding custom attributes through MonitoringService::addCustomParameter(). Alternatively, you can add them directly through \Spryker\Service\Opentelemetry\OpentelemetryService::setCustomParameter, which doesn’t require the Monitoring module.

All attributes added via these services are included in the root span at the end of the request execution, so you can call them even after the response has been sent.

Custom events

To add a custom event to your trace for backend logic configuration, you can trigger a custom event during execution.

You can add custom events via \Spryker\Service\Monitoring\MonitoringService::addCustomEvent(). Alternatively, add them directly though \Spryker\Service\Opentelemetry\OpentelemetryService::addEvent(), which doesn’t require the Monitoring module.

Custom events are attached to the root span.

Error handling

Default Spryker’s Error Handler already executes \Spryker\Service\Monitoring\MonitoringService::setError(), so if you are using Monitoring module and default Error Handler - you are covered. But if you don’t, please adjust your error handler accordingly. This will add a error event into the root span and will change its status to the error one. Please check this part during integration of OTel into your system.

The OTel integration catches all the exceptions thrown during a request or command execution and attaches them as events to the root span. These events will also appear in the span of the method that threw the exception, but only if a hook for that method exists.

To ensure error tracking, we recommend using \Spryker\Service\Monitoring\MonitoringService::setError() or \Spryker\Service\Opentelemetry\OpentelemetryService::setError() in your application’s error handler.

The default error handler calls \Spryker\Service\Monitoring\MonitoringService::setError() to add error events to the root span and update its status to error. If you’re using the Monitoring module with the default error handler, no additional configuration is needed. With a custom setup, adjust your error handler and verify this logic during the integration.

Service name

Service names let you to filter traces by source. For example, you might want to analyze only Yves traces or Glue requests while excluding CLI commands from your Scheduler.

You can define a service name using \Spryker\Service\Monitoring\MonitoringService::setApplicationName() or \Spryker\Service\Opentelemetry\OpentelemetryService::setResourceName().

All MonitoringService methods trigger a service name change. For example, calling \Spryker\Service\Monitoring\MonitoringService::setError() updates the service name.

Default name convention for service names: APPLICATION-REGION_OR_STORE(APPLICATION.ENV)

PLACEHOLDER DESCRIPTION
APPLICATION Application name, such as ZED, YVES, or GLUE.
REGION_OR_STORE Current store or region name, depending on whether Dynamic Store mode is enabled.
APPLICATION_ENV Environment name from the deploy file.

You can change any of these values using MonitoringService, or override the service name entirely using OpentelemetryService.

If no service name is explicitly set, the integration first checks OTEL_SERVICE_NAME_MAPPING, attempting to determine a service name based on the URL or CLI binary filename. If no matches found, it falls back to OTEL_DEFAULT_SERVICE_NAME from the environment configuration.

Trace name

The trace name (or root span name) shows which request or command was executed.

Default behavior:

  • For web requests: The default trace name includes the HTTP method and route name
  • For command executions: The command name is used

Spryker automatically adjusts the trace name for web requests to reflect the route name. However, if you prefer a different naming convention or don’t use the Monitoring module, you can define it manually using \Spryker\Service\Opentelemetry\OpentelemetryService::setRootSpanName() or \Spryker\Service\Monitoring\MonitoringService::setTransactionName().

If no name was provided and no route was resolved, a fallback name is used. This name can be configured by adding the following to the regular configuration file:


$config[\Spryker\Shared\Opentelemetry\OpentelemetryConstants::FALLBACK_HTTP_ROOT_SPAN_NAME] = 'your_fallback_name';

//Can be used to show HTTP method in the root span name. Default is true.
$config[\Spryker\Shared\Opentelemetry\OpentelemetryConstants::SHOW_HTTP_METHOD_IN_ROOT_SPAN_NAME] = false;

Recommendations

Due to the fact that PHP code is used to instrument codebase, you should consider performance. Tracing is an expensive operation and can slow down your application. Here are some recommendations to avoid performance issues:

Please minimise amount of generated spans per request. OTel documentation recommends to have no more than 1000 of them. So you can skip some spans via configuration that are not relevant to you. Don’t be afraid, errors will be processed even if the method was not instrumented because Error Event will be attached to the root span.

Use sampling to not get a full trace every time. Please check configuration section for the reference.

Skip some traces. You may not want to get a full trace for all of your transactions. You can define a probability of detailed trace overview by setting a probability via OTEL_TRACE_PROBABILITY env variable. Be advised that Trace still will be processed and root span will be there for you. Also requests that are changing something in your application (POST, DELETE, PUT, PATCH) considered as critical and will be processed anyway.

Tracing is resource-intensive and can slow down your application. Follow these recommendations to minimize performance impact:

  • Minimize the number of generated spans per request:

    • OTel docs recommend keeping span count below 1000 per trace
    • Configure irrelevant spans to be skipped
    • Errors are processed even if a method is not instrumented because error events are attached to the root span
  • Use sampling to reduce trace volume:

    • Full traces for every request are unnecessary in most cases
    • Refer to the (sampling configuration)[#sampling-configuration] section to fine-tune trace collection
  • Skip unnecessary traces:

    • You can control the probability of generating detailed traces using the OTEL_TRACE_PROBABILITY environment variable.
    • Even if a detailed trace is skipped, a root span will still be created.
    • Requests that modify the application state (POST, DELETE, PUT, PATCH) are always considered critical and will be fully processed.