Filip Ohanka7 min

The Developer’s Guide to Building with Apple HealthKit

BackendEngineeringJul 10, 2025

BackendEngineering

/

Jul 10, 2025

Filip OhankaBackend Engineer

Share this article

Building a wellness app powered by Apple HealthKit sounds straightforward — until it isn’t.

At first glance, you're just syncing step counts, heart rate, sleep data. Turning those into daily charts or scores. Easy enough. But once you hit real-world usage — especially at production scale — things can get messy. Fast. For example…

  • HealthKit can deliver yesterday’s data today (or a week later), depending on when a device syncs
  • Users travel across time zones while their watch keeps recording
  • Each metric has different units, semantics and aggregation logic
  • And you’re expected to deliver an instant, accurate health overview — every time

This article breaks down how we built a flexible, reliable HealthKit backend — designed for early-stage products with under 10k users and architected to scale well beyond that.

The Problems We Knew We’d Have to Solve

When you go beyond HealthKit’s demo data and start building for real users, things start to break. We ran into bloated tables, conflicting data sources, metrics that refused to play by the same rules and sync issues that wrecked daily insights.

To build a system that could handle all of that, we mapped out every issue and designed around it.

Here’s the overview — and in the sections that follow, we’ll unpack the logic behind each decision.

Architecting for Chaos: Our HealthKit Data Model

You can’t force HealthKit data into a one-size-fits-all schema. And you definitely can’t fake scale with a flat table and a few CASE statements. We designed our system to handle real-world data — delayed inputs, timezone jumps, metric inconsistencies — without crumbling under pressure.

Step 1: Split the Problem

Instead of stuffing all metrics into a single table, we separated data by metric type — each with its own raw and daily value tables. This gives us:

  • Faster queries with smaller, indexable partitions
  • Cleaner aggregation logic tailored to each metric
  • Easier maintenance and extensibility over time

Each metric has two levels of storage:

  • Raw data — every synced value, stored with start_date, end_date, source_device_id, source_bundle and source_timezone
  • Daily values — pre-aggregated summaries used for timelines, charts and scores

Step 2: Get the Timestamps Right

Every data point is stored with a UTC timestamp and a source_timezone. This may seem redundant — until your users start crossing time zones midday and metrics shift depending on where and when they're recorded.

At read time, we align values to the correct local day using this timezone data — making sure users see data where they expect it.

Step 3: Sequence Everything

Once data hits the backend, it flows through a pipeline: raw → daily → score. That flow needs to be tightly controlled — each step should only run after the previous one finishes successfully, or you risk inconsistent summaries and broken scores.

We keep that flow tightly orchestrated, either within a single Lambda function or using an orchestration layer (like AWS Step Functions) to enforce order.

This model gave us a solid foundation. Sounds simple in theory. Let’s look at how we made it all work in practice — and why these choices held up under real-world pressure.

Metric Logic: One Size Doesn’t Fit All

HealthKit offers a wide range of metrics, but they don’t behave the same. Steps, heart rate, sleep, calories — each comes with its own format, frequency and context. Trying to process them all through a generic pipeline leads to bloated logic, messy exceptions and, ultimately, bad data.

So, we built a system where each metric type is routed through its own processor — each one tailored to compute accurate daily values based on the nature of the metric.

  • Sum Processor — for things like step count, active minutes or calories burned
  • Average Processor — for metrics like heart rate, HRV, respiratory rate
  • Custom Pipelines — for complex metrics like sleep, which require segmentation and scoring logic

Each processor is purpose-built to aggregate, validate and store data correctly — then pass it to the next layer of the system.

This gave us two big wins:

  • Extensibility — When we add a new metric, we don’t have to refactor the system. We just plug in a new processor with its own rules.
  • Precision — Each metric gets exactly the logic it needs — nothing more, nothing less. That means better data and clearer insights.

By tailoring each metric’s logic, we give product teams the confidence to build features that are actually grounded in reality and not just approximations built on shaky data.

Timezones, Delays & Device Conflicts

Once the data starts flowing, timezone shifts and device overlaps can wreck your daily summaries — unless your backend knows how to handle them.

Timezones Aren’t Optional

When a user takes a flight from LA to Tokyo, their Apple Watch doesn’t miss a beat. It keeps recording in local time, while your backend is likely still anchored in UTC. If you’re not careful, you’ll log a 10 p.m. workout in Tokyo as if it happened yesterday.

We fix this by anchoring each data point to:

  • endDate — when the event finished
  • source_timezone — the timezone of the device that recorded it

This lets us convert everything to the user’s local day — regardless of where they are or when the data syncs.

Timezone Normalization in Practice

Late Data Is the Norm, Not the Exception

HealthKit doesn’t guarantee immediate sync. Sleep data, for instance, can arrive 24 hours late. Even step counts may get backfilled as devices catch up.

Our backend listens for these late arrivals and triggers recalculations automatically. If we see new raw data for a past day, we regenerate the daily summary and update any scores tied to it. No user interaction needed. No stale insights.

Conflicting Devices

Users don’t just wear one device. They might carry an iPhone, wear an Apple Watch and sync with a third-party app. HealthKit will happily accept all of it — sometimes for the same metric, at the same time.

That’s why we track:

  • source_device_id
  • source_bundle (e.g. native Health app vs. third-party)
  • Trust rules per device type

With that info, we can prioritize the most reliable data, deduplicate cleanly and avoid inflating metrics just because multiple devices chimed in.

Keeping It Consistent: Orchestrating the Flow

Even with clean data and solid timezone handling, HealthKit’s unpredictability isn’t done with you. Syncs happen at odd intervals. Users open the app while data is still being processed. And if your backend updates daily summaries or scores out of sequence, you’ll end up with inconsistencies that are hard to spot and even harder to fix.

The key is controlling the order of operations.

Our processing pipeline follows a strict sequence:

  1. Raw ingestion — Validate, enrich and store the data
  2. Daily aggregation — Compute per-metric values by local day
  3. Score computation — Calculate any user-facing insights (like readiness, recovery or energy)

Each step depends on the previous one being complete. If daily values update before all raw data is in — or if a score is calculated before the day is fully aggregated — users get incomplete or inaccurate results.

To avoid this, we wrap the full flow in one of two ways:

  • Single Lambda function — For simpler cases, all processing happens in one run, in sequence
  • Orchestration layer — For more complex setups, we use AWS Step Functions or a state machine to ensure each step finishes before the next one starts

This orchestration prevents race conditions, avoids double-counting and ensures that every user gets a clean, complete daily summary — no matter when they check the app.

📌 Quick Tip: If you're debugging inconsistent scores, start by checking for out-of-order updates in your pipeline. It's almost always the culprit.

Scaling the System: Fast Reads & Future-Proofing

Once your MVP is live, things change. Data volume grows. Users expect real-time feedback. Product wants new insights — yesterday. And that’s when most HealthKit setups start to crack.

We designed for performance that lasts. Not by over-engineering, but by making a few foundational choices that pay off as you grow.

Precompute What You Can

Once you’ve got thousands of users syncing data across metrics, querying everything in real time becomes expensive and slow. That’s why we precompute daily_values per user, per metric, per day. These summaries are:

  • Indexed by user_id and metric_date
  • Queried directly by the frontend
  • Used for charts, streaks, scores and more

The result? Instant-feeling UX and no surprise spikes in backend load.

Store the Raw Stuff, Too

Even with fast reads, we never throw away raw HealthKit data. Every entry gets stored with:

  • UTC timestamp
  • source_timezone
  • Source device and app metadata

Why? Because product needs evolve. You might start with daily steps — and six months later, you’re looking to segment behavior by region, time of day or device usage patterns. Having raw, contextualized data means you can explore without retrofitting your pipeline.

It also makes you ML-ready — no extra data gathering required.

Optimize Where It Counts

Some performance wins come standard. Others you roll out as needed.

  • Partition tables by date or metric for parallel reads
  • Index for your most common queries (dashboards, timelines, comparisons)
  • Plug in TimescaleDB if you’re leaning heavily on time-series ops and want compression + advanced aggregations

We didn’t need all of this on day one. But we made space for it, and that made scaling simple instead of painful.

Conclusion: Build Smart, Not Just Fast

HealthKit gives you access to powerful data. But it also comes with inconsistencies, delays and enough edge cases to trip up even a solid MVP. If you’re not thinking about structure, logic and scale from the start, you’ll end up fighting fires later.

What we’ve outlined here isn’t theory — it’s the system we built to handle real production needs, with all the weirdness that comes from syncing health data across timezones, devices and use cases.

A few takeaways if you’re heading down this path:

  • Don’t generalize your metrics — treat each one with care
  • Store source timezones — UTC alone won’t cut it
  • Precompute summaries — frontend speed is a product feature
  • Sequence your logic — race conditions will wreck your confidence
  • Keep raw data rich — even if you don’t need it yet, you will

And most of all: Don’t over-optimize too early — but do architect for growth.

If you’re building a HealthKit-powered product and want to skip the painful parts, we’re happy to dive deeper. Let’s talk.

Share this article


Sign up to our newsletter

Monthly updates, real stuff, our views. No BS.