How to benchmark Shopware 6 vs Magento/Shopify on the same stack?
Quick Answer
A fair benchmark means identical infrastructure, identical catalogue size, and identical test scenarios across all three platforms—otherwise you’re just measuring your hosting setup, not the platform. This page walks through exactly how to control those variables, which tools to use, and what metrics actually matter for an eCommerce decision.
Before You Start
- ✦Three identical server environments — same CPU, RAM, PHP version, and Redis/Varnish config for each platform. Shopify is SaaS so you’ll benchmark against their CDN edge directly.
- ✦A representative product catalogue — at minimum 5,000 products with realistic category depth. Benchmarking on 50 products tells you nothing useful.
- ✦k6 or Locust installed — these let you script realistic shopping journeys rather than just hammering a single URL with Apache Bench.
- ✦Caches fully warmed before every test run — cold-cache numbers are meaningless for production comparisons and will make any platform look terrible.
Standardise your infrastructure
Shopware 6 and Magento 2 both run on PHP, so match them closely: same cloud provider region, same VM size (4 vCPU / 8 GB RAM is a sensible mid-market baseline), same PHP 8.2, same MySQL 8.0, and Redis for session and cache. Shopify runs on their own infrastructure—you can’t control it—so benchmark it as-is from the same geographic origin as your other servers.
- Provision identical VM specs on AWS or GCP for Shopware and Magento
- Enable Varnish or Shopware’s HTTP cache with the same TTL settings on both self-hosted installs
- Document every config difference—OPcache settings, PHP-FPM worker count, MySQL buffer pool size
Load identical catalogue data into each
Use a data generator to seed the same product count, category tree depth, and attribute count on Shopware and Magento. For Shopify, import the same catalogue via CSV. The point is that slow category pages on Magento with 50,000 products aren’t comparable to a 500-product Shopify store—the database query profile is completely different.
- Use the
sw-clifixture generator or Faker-based scripts to seed Shopware with 5k–50k products - Match category depth: if your real store has three levels deep, replicate that across all three platforms
- Seed 500+ customer accounts and 1,000+ past orders to stress the account and history queries
EXPLAIN on your five heaviest category queries per platform—this tells you whether slow results are a data indexing problem you can fix, not a platform limitation.
Script realistic user journeys
Don’t benchmark homepage TTFB alone. Real eCommerce load comes from browse → search → PDP → add-to-cart → checkout flows. Write your k6 or Locust scripts to mirror this pattern, and run them at the same virtual user concurrency level across all three platforms simultaneously.
- Script at least four journeys: anonymous browse, authenticated browse, checkout, and search
- Set a realistic VU ramp: 50 concurrent → 200 → 500 over 10 minutes—then hold at peak for five minutes
- Keep a 70/20/10 split: 70% anonymous browse, 20% search, 10% checkout
Measure the metrics that matter
TTFB (Time to First Byte) is the server-side number you can directly compare across platforms. Core Web Vitals—LCP, CLS, INP—are the user-facing numbers that affect rankings and conversion. Capture both. For Shopify you’ll also want to look at Storefront API response time separately if you’re evaluating headless viability.
- Record p50, p95, and p99 response times for each journey—averages hide the tail latency that kills conversion
- Run Lighthouse or WebPageTest from the same geographic node for LCP and INP across all three storefronts
- Track error rate under load—a platform that returns 2% 5xx errors at 300 VU is not suitable regardless of TTFB
- Log server-side CPU and memory utilisation throughout the test—Prometheus + Grafana works well here
Run tests and account for Shopify’s CDN edge
Shopify routes all traffic through Fastly by default. That gives it a structural CDN advantage for static and cacheable pages that self-hosted Shopware and Magento won’t have unless you add Cloudflare or Fastly in front of them. Either add the same CDN to your self-hosted instances, or clearly note this difference in your findings—it’s a legitimate operational comparison, not a platform flaw.
- Run each test suite three times and discard the lowest run—single-run results have too much noise
- Test both with and without CDN on Shopware/Magento and document both sets of results
- For Shopify, bypass the CDN using the
?preview_theme_idworkaround to get origin response times—this levels the playing field for server comparison
Interpret results with your business context
Raw TTFB differences under 200ms rarely matter for conversion. What you’re really looking for is which platform degrades gracefully under peak load, how each platform handles checkout under stress, and where the performance ceiling is relative to what it costs to raise it. A platform that needs a €2,000/month server to hit your p95 target is a different business decision than one that does it on €400/month.
- Map p95 response times against your target: sub-300ms TTFB on category and PDP pages under realistic concurrent load
- Note which platform required the most infrastructure tuning to hit targets—that tuning cost is a real ongoing operational expense
- Separate frontend performance (Lighthouse scores, theme quality) from backend performance (TTFB, throughput)—they’re fixable independently
Benchmark Setup Checklist
0 of 12 completeInfrastructure
Data
Load Testing
Mistakes Most Developers Make
! Benchmarking only cacheable pages
What happens: Every platform looks fast because Varnish or Fastly is serving static HTML—you’ve benchmarked your cache layer, not the platform.
Fix: Always include authenticated checkout and account pages in your test suite, and verify your cache-bypass headers are working correctly before starting.
! Using a tiny product catalogue for testing
What happens: All three platforms perform identically at 200 products because there’s nothing to stress the ORM or query planner—results are meaningless for a real migration decision.
Fix: Seed at least 5,000 products with realistic attribute sets. If your target store has 50,000, test at 50,000.
! Conflating frontend and backend performance
What happens: A slow Lighthouse score gets blamed on the platform when the real culprit is a bloated theme or third-party script loading 800kb of JavaScript.
Fix: Measure TTFB (server) and Core Web Vitals (browser) separately, and test with a default/stock theme before adding any custom code.
! Not accounting for Shopify’s shared infrastructure limits
What happens: Shopify performs well under moderate load in tests but has API rate limits and checkout throughput caps that only surface during a real flash sale.
Fix: Review Shopify’s published API limits and checkout concurrency documentation before concluding benchmark results represent production peak capacity.
Key Takeaway
The short version: a fair benchmark requires identical specs, identical data, and realistic multi-step journeys—anything less just measures your infrastructure choices. The single biggest gotcha is comparing Shopify’s CDN-fronted responses against bare-metal Shopware origin responses and drawing the wrong conclusion. Shopware 6 typically outperforms Magento 2 on server-side response time at equivalent infrastructure cost, particularly on search and category pages with Elasticsearch enabled—but frontend performance is a theme problem, not a platform problem, and should be measured separately. Start with Step 1—getting your infrastructure truly equivalent is the one thing that invalidates everything else if you skip it.
Related Answers
Still need help?
Talk to our Shopware experts
We've handled GDPR/CCPA compliance for dozens of EU & US Shopware stores.