Database performance guidelines

Last updated: December 16, 2025

Edit on GitHub

Database operations are often the primary performance bottleneck in e-commerce applications. Poor database design, missing indexes, and inefficient queries can severely impact application performance, especially under high load. These guidelines help you identify and resolve common database performance issues in Spryker projects.

Database indexes

Proper indexing is fundamental to database performance. Indexes dramatically speed up data retrieval but come with trade-offs in write performance and storage space.

Understanding index usage

When a query runs without appropriate indexes, the database performs a full table scan—reading every row to find matches. For tables with millions of records, this becomes exponentially slower.

Signs you need an index:

Slow queries on frequently accessed tables
Full table scans in query execution plans
High database CPU usage during peak traffic
Queries filtering or joining on unindexed columns

Creating effective indexes

Index columns used in WHERE clauses:

// This query needs an index on spy_sales_order.order_reference
$order = SpySalesOrderQuery::create()
    ->filterByOrderReference($orderReference)
    ->findOne();

Add index in your Propel schema:

<table name="spy_sales_order">
    <column name="order_reference" type="VARCHAR" size="255"/>
    <index name="spy_sales_order-order_reference">
        <index-column name="order_reference"/>
    </index>
</table>

Index foreign keys:

Foreign keys used in JOINs must be indexed. Propel automatically creates indexes for foreign keys defined in schema, but custom foreign key columns need manual indexing:

<table name="spy_sales_order_item">
    <column name="fk_sales_order" type="INTEGER" required="true"/>
    <foreign-key foreignTable="spy_sales_order" name="spy_sales_order_item-fk_sales_order">
        <reference local="fk_sales_order" foreign="id_sales_order"/>
    </foreign-key>
    <!-- Index automatically created by Propel for foreign key -->
</table>

Composite indexes for multiple conditions:

When queries filter on multiple columns, create composite indexes:

// Query filtering by store and status
$orders = SpySalesOrderQuery::create()
    ->filterByStore('DE')
    ->filterByFkOmsOrderStatus($statusId)
    ->find();

<index name="spy_sales_order-store-status">
    <index-column name="store"/>
    <index-column name="fk_oms_order_status"/>
</index>

Index column order matters

In composite indexes, column order is critical. Place the most selective column (highest cardinality) first. For queries that filter on only the first column, the index can still be used.

Index anti-patterns

Over-indexing:

Too many indexes slow down INSERT, UPDATE, and DELETE operations. Each write operation must update all indexes on that table.

Don’t index every column “just in case”
Remove unused indexes identified through database monitoring
Avoid duplicate indexes (for example, separate indexes on columns A and B, plus a composite index on A+B)

Indexing low-cardinality columns:

Indexes on columns with few distinct values (for example, boolean flags, status with 3-4 values) are often inefficient:

<!-- Likely inefficient: only 2 possible values -->
<index name="spy_product-is_active">
    <index-column name="is_active"/>
</index>

Exception: Low-cardinality indexes can be useful when combined with high-cardinality columns in composite indexes.

Monitoring index effectiveness

Use database query analysis tools to verify index usage:

EXPLAIN
SELECT * FROM spy_sales_order
WHERE order_reference = 'DE--123';

PostgreSQL - Look for:

Index Scan (good) vs Seq Scan (bad for large tables)
Execution time and rows scanned

MySQL - Check the type column:

const, eq_ref, ref: Using indexes efficiently
ALL: Full table scan (needs optimization)

Query optimization

Beyond indexes, query structure significantly impacts performance.

Avoid SQL functions that prevent index usage

Using functions on indexed columns in WHERE clauses prevents index usage:

Inefficient:

// DATE() function prevents index usage
$orders = SpySalesOrderQuery::create()
    ->where('DATE(created_at) = ?', '2025-01-15')
    ->find();

Optimized:

// Use range comparison to leverage index
$orders = SpySalesOrderQuery::create()
    ->filterByCreatedAt(['min' => '2025-01-15 00:00:00', 'max' => '2025-01-15 23:59:59'])
    ->find();

Other common function anti-patterns:

LOWER(email) = '[email protected]' → Use case-insensitive collation or store lowercase
SUBSTRING(sku, 1, 3) = 'ABC' → Use LIKE 'ABC%' or dedicated prefix column
YEAR(order_date) = 2025 → Use date range filtering

Necessary JOINs and relationship loading

The N+1 problem:

Loading related entities in loops creates excessive queries. This is one of the most common performance issues:

// ❌ Bad: N+1 queries
$orders = SpySalesOrderQuery::create()->find();
foreach ($orders as $order) {
    // Each iteration = 1 query
    $items = $order->getSpySalesOrderItems();
    foreach ($items as $item) {
        // Each iteration = 1 query per item
        $product = $item->getSpyProduct();
    }
}
// Total: 1 + N + (N * M) queries

Solution: Use joinWith() to eager load relationships:

// ✅ Good: Single query with JOINs
$orders = SpySalesOrderQuery::create()
    ->joinWithSpySalesOrderItem()
    ->useSpySalesOrderItemQuery()
        ->joinWithSpyProduct()
    ->endUse()
    ->find();

foreach ($orders as $order) {
    $items = $order->getSpySalesOrderItems(); // Already loaded
    foreach ($items as $item) {
        $product = $item->getSpyProduct(); // Already loaded
    }
}
// Total: 1 query

For more details on batch processing and bulk operations, see Batch processing of Propel entities and Architecture performance guidelines.

GROUP BY and ORDER BY optimization

GROUP BY performance:

Grouping requires sorting data, which is expensive for large datasets:

// Potentially slow on large tables
$orderCounts = SpySalesOrderQuery::create()
    ->groupByFkCustomer()
    ->withColumn('COUNT(*)', 'order_count')
    ->find();

Optimizations:

Filter before grouping: Reduce dataset size with WHERE clauses
Index grouped columns: Ensure columns in GROUP BY are indexed
Pre-aggregate: For frequently needed aggregations, maintain summary tables
Limit grouped results: Use HAVING to filter groups

ORDER BY performance:

Sorting large result sets is resource-intensive:

// Expensive without proper index
$orders = SpySalesOrderQuery::create()
    ->orderByCreatedAt('DESC')
    ->find();

Best practices:

Index columns used in ORDER BY
Combine ORDER BY columns with WHERE clause columns in composite indexes
Limit result set size before sorting
For pagination, use cursor-based pagination instead of OFFSET

// ✅ Good: Index-friendly pagination
$orders = SpySalesOrderQuery::create()
    ->filterByIdSalesOrder(['min' => $lastSeenId])
    ->orderByIdSalesOrder('ASC')
    ->limit(20)
    ->find();

Controlling data volume

Selecting only needed columns

Loading unnecessary columns wastes memory and network bandwidth:

// ❌ Bad: Loads all columns including potentially large TEXT/BLOB fields
$products = SpyProductAbstractQuery::create()->find();

// ✅ Good: Select only needed columns
$products = SpyProductAbstractQuery::create()
    ->select(['IdProductAbstract', 'Sku', 'Attributes'])
    ->find();

When to use select():

Reading thousands of records
Tables with large TEXT, JSON, or BLOB columns
Building API responses or exports
Aggregation queries

Limiting result sets

Always use pagination for user-facing queries:

// ❌ Bad: Loading potentially millions of records
$allOrders = SpySalesOrderQuery::create()->find();

// ✅ Good: Paginate results
$orders = SpySalesOrderQuery::create()
    ->limit(50)
    ->offset($page * 50)
    ->find();

Propel hydration and relationships

Hydration overhead:

Propel converts database rows into PHP objects (hydration). For large result sets, this consumes significant memory:

// High memory usage: Creates PHP objects for all records
$orders = SpySalesOrderQuery::create()->find();

// Lower memory usage: Returns associative arrays
$orders = SpySalesOrderQuery::create()
    ->find(Propel::CONNECTION_READ)
    ->toArray();

For bulk operations, use PDO directly:

When processing tens of thousands of records, bypass Propel’s ORM layer:

// For data import/export or batch processing
$connection = Propel::getConnection();
$stmt = $connection->prepare('SELECT id_product, sku FROM spy_product WHERE is_active = 1');
$stmt->execute();

while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    // Process row
}

See Data import optimization guidelines for comprehensive bulk processing strategies.

Managing 1-to-Many and Many-to-Many relationships

The aggregation problem:

Loading all related records for many parent records causes memory issues:

// ❌ Bad: Loads ALL items for ALL orders
$orders = SpySalesOrderQuery::create()
    ->joinWithSpySalesOrderItem()
    ->find();
// For 1000 orders with 10 items each = 10,000 item objects in memory

Strategies:

Paginate parent records: Process orders in batches
Lazy load selectively: Only load relationships when needed
Count without loading: Use withColumn('COUNT(*)', 'item_count') for counts
Filter relationships: Use useXxxQuery() to add conditions on related entities

// Load only orders with specific item conditions
$orders = SpySalesOrderQuery::create()
    ->joinWithSpySalesOrderItem()
    ->useSpySalesOrderItemQuery()
        ->filterByFkOmsOrderItemStatus($statusId)
    ->endUse()
    ->find();

Stored procedures

While stored procedures can improve performance by reducing network round trips, they introduce challenges in Spryker projects:

Considerations:

Portability: Stored procedures are database-specific and complicate migrations between PostgreSQL and MySQL
Version control: Harder to manage in git repositories alongside application code
Testing: Difficult to unit test and mock in development environments
Debugging: Limited debugging tools compared to PHP code
Deployment: Require special deployment procedures and permissions

Recommendations:

Avoid by default: Prefer PHP-based solutions with proper query optimization
Performance test thoroughly: If using stored procedures, establish baselines:
- Measure execution time under realistic load
- Monitor database CPU and memory usage
- Profile query counts and complexity
Watch for hidden N+1 problems: Ensure stored procedures don’t execute hundreds/thousands of queries internally
Document extensively: Include performance benchmarks and maintenance procedures
Have a migration path: Plan how to refactor if performance degrades

When stored procedures might be justified:

Complex aggregations requiring multiple passes over large datasets
Legacy integration requirements
Proven performance bottlenecks where all other optimizations have been exhausted

Integer autoincrement ID overflow

PostgreSQL and MySQL use INTEGER types (32-bit) for autoincrement primary keys by default, which limits maximum values to approximately 2.1 billion. In high-transaction tables, this limit can be reached, causing application failures.

Understanding the risk

High-volume tables at risk:

spy_sales_order_item (one record per ordered product)
spy_oms_product_reservation (order item state tracking)
spy_event_behavior_entity_change (publish & sync events)
spy_queue (message queue storage)
Logging tables if stored in the database

Symptoms of approaching overflow

When IDs approach 2,147,483,647:

INSERT operations fail with “integer out of range” errors
Application throws exceptions on order placement
Queue processing stops
New entity creation fails

Prevention and remediation

For comprehensive solutions including migration to BIGINT and cleanup strategies, see Database tables take up too much space or have ID overflow.

Key preventive measures:

Use BIGINT for high-transaction tables: When creating new tables or migrating existing ones
Implement data retention policies: Automatically purge old records from transient tables
Monitor table growth rates: Track ID increment velocity to predict overflow timeframes
Regular capacity planning: Review table sizes and ID usage quarterly

Best practices summary

Index strategically: Create indexes on WHERE, JOIN, and ORDER BY columns, but avoid over-indexing
Use eager loading: Eliminate N+1 problems with joinWith() for related entities
Avoid function anti-patterns: Don’t use SQL functions on indexed columns in WHERE clauses
Limit data selection: Use select() to load only needed columns and paginate results
Batch write operations: Use ActiveRecordBatchProcessorTrait for bulk inserts/updates
Optimize GROUP BY/ORDER BY: Index grouped and sorted columns, filter before aggregating
Monitor and test: Use EXPLAIN to verify query performance
Avoid stored procedures: Unless thoroughly performance-tested and documented
Monitor INT overflow: Implement alerts and cleanup for high-transaction tables
Use appropriate tools: PDO for bulk processing, Propel for application logic