👋 In this article, we'll clarify how to count your users correctly in GA4 and BigQuery, by comparing two key approaches:
-COUNT(DISTINCT) for absolute precision.
-HyperLogLog(HLL) for large-scale performance.
The aim here is to help you choose intelligently, according to your use cases, between analytical rigor and budgetary control.
Background: why is user counting strategic?
A good user count is much more than just a line on a dashboard.
It's the foundation of your business KPIs:
- It influences your conversion rates, performance reports and marketing decisions.
- A counting error? This distorts your customer acquisition cost (CAC), your ROAS calculations, and can lead you to invest... on the wrong basis.
Common problems with GA4 alone
- Incomplete or sampled data (especially for large volumes).
- History limited to 14 months.
- Restricted access to certain fields such as basket contents or user IDs.
👉 Exporting to BigQuery becomes indispensable for accessing raw data in its entirety, avoiding sampling and carrying out advanced customized analyses.
Two methods, two philosophies
1. COUNT(DISTINCT)
Counting exact based on a unique identifier (e.g: user_id
).
- Advantages: Total precision, legally required (RGPD).
- Disadvantages: Very costly for large volumes.
Example:
SELECT COUNT(DISTINCT user_pseudo_id)
FROM `project.dataset.events_*`
WHERE device.category != 'bot'
💡 SEO tips : Exclude bots withWHERE device.category != 'bot'
.
2. HyperLogLog (HLL)
A probabilistic algorithm designed to quickly and inexpensively estimate the number of unique elements (e.g.: number of users, emails, sessions, etc.).
- Benefits: 99% accuracy, 60-90%cost reduction , idealfor processing very large volumes of data, perfect for trending and daily analysis.
- Disadvantages: it's not an exact count (the results are very close, but not rigorously accurate) and therefore unusable for reports that require rigorously accurate results, such as legal or financial reports.
Example:
SELECT HLL_COUNT.EXTRACT(HLL_COUNT.INIT(user_pseudo_id, 14))
FROM `project.dataset.events_*`
💡 The second parameter (here 14
) corresponds to the desired precision (between 10 and 24). The higher the number, the more accurate (and expensive) the estimate. Google uses 14 by default.
⚖️ Comparison: Which method for which use?
👉Check data regularly with COUNT to validate HLL estimates.
Practical Applications
Case studies
"GA4 underestimates my SEO traffic!"
➡️ Check in BigQuery with COUNT(DISTINCT)
on sessions or users.
This is often when you realize that GA4 sampling was costing you 10-20% in lost traffic.
"My Ads campaigns don't seem very profitable..."
➡️ HLL allows you to track trends at low cost.
Cross it with a monthly COUNT to identify underestimated levers.
Testimonial
We performed these tests on data exported from Google Analytics to BigQuery, for the period January 01, 2025 to March 12, 2025.
The comparative results of the three methods with the GA4 interface are available in the following file:
Google Sheets - Analysis results
Integrated FAQ
Q: "Why do GA4 and BigQuery give different results?"
→ R: GA4 samples the data, while BigQuery uses the raw data.
Q: "Is HyperLogLog reliable for strategic decisions?"
→ R: Yes to 99.9%, but validate with COUNT for critical ratios.
Compare COUNT(DISTINCT)
for total precision, but expensive and HyperLogLog
for fast, cost-effective estimates. Use COUNT
for critical audits and HLL
for large-scale trends.
☝ Convictions
"User counting is not an exact science... but a science of intelligent compromise."
- ☑️ Don't make accuracy sacred: 99% = enough to act.
- ☑️ GA4 + BigQuery = forced marriage: One without the other is blind or paralyzed.
- ☑️ HLL is a superpower: Learn to harness it, or pay the price

A need, a question?
Write to us at hello@starfox-analytics.com.
Our team will get back to you as soon as possible.