Audience performance doesn’t just depend on creative or bidding. It starts with the quality of the identifiers you upload. Inconsistent formatting, sloppy hashing, and duplicate records silently depress match rates and inflate frequency. This guide explains the core concepts and gives you checklists and templates you can run inside LeadEnforce.
Why Hygiene Matters (Numbers to Know)
-
Formatting errors (country codes, casing, punctuation) can cut match rates by 10–25% on the same file.
-
Teams that enforce pre‑upload deduplication commonly remove 8–18% of rows as repeats, bots, or stale contacts—reducing wasted impressions and skewed frequency.
-
Maintaining a ≤ 2% invalid‑record rate (bounces, malformed fields) correlates with 5–12% lower CPMs on prospecting due to cleaner seeds.
-
Multi‑identifier uploads (email + phone) typically yield +8–20 percentage‑points higher platform match rates than email‑only lists.
Your audience is only as good as your preprocessing. Hygiene is a performance feature, not paperwork.
Identifier Formatting Standards
Clean formatting ensures platforms can normalize and match your data. Apply the following before hashing.
Emails
-
Trim leading/trailing spaces.
-
Lowercase entire address.
-
Collapse dots only where appropriate (do not alter local parts unless policy requires it).
-
Remove tags/aliases you’ve added (e.g.,
+utm
), unless you specifically use them to segment. -
Validate with a simple regex (e.g.,
^[^@\s]+@[^@\s]+\.[^@\s]+$
) and optional DNS MX check.
Phone Numbers
-
Convert to E.164:
+<country code><national number>
with no spaces or punctuation. Example:+14155550123
. -
Strip extensions and non‑digits; infer country using billing/geo if missing.
-
Avoid leading zeros beyond national rules.
Names (when used)
-
Title‑case or uppercase consistently; normalize diacritics with Unicode NFC.
-
Split first/last; remove honorifics; collapse double spaces.
Postal Addresses (if applicable)
-
Standardize abbreviations (St., Rd., Ave.) and state/province codes.
-
Normalize to a single country format for each file.
Hashing Correctly (for Privacy‑Preserving Uploads)
Most major ad platforms accept SHA‑256 hashes for PII identifiers. Consistency matters more than anything.
General rules:
-
Preprocess first, then hash. Formatting must be complete before hashing.
-
Use SHA‑256 on UTF‑8 strings; output in lowercase hex.
-
No salts unless the destination explicitly supports and requests them; salts break matching.
-
Hash each identifier separately (email, phone), one field per column.
-
Store only what you need; maintain a mapping table internally if you must reconcile later.
Example workflow:
-
Clean → Validate → Normalize →
sha256(normalized_value)
→ Lowercase hex → Upload.
Quality checks:
-
Sample‑recreate 100 hashes from source values and verify deterministic equality.
-
Target hash completeness ≥ 99% of rows for the chosen identifier.
Deduplication: Merging Without Losing Signal
Duplicates inflate audience size, raise frequency, and distort reporting. Dedup across and within sources before export.
Common duplicate patterns
-
Same person with work + personal email.
-
Multiple numbers per contact; vanity vs. mobile; old numbers.
-
CRM contacts cloned across lifecycle stages.
A practical dedup recipe
-
Build a person key using priority identifiers (Email → Phone → MAID) after normalization.
-
For records sharing the same key, merge attributes by recency and completeness.
-
Keep the freshest consent timestamp and the most recent activity date.
-
If two records conflict, prefer values with verified provenance (checkout > webform > event scan).
Performance target: remove 10%± as true duplicates on first pass; sustain <3% duplicate rate going forward.
Pre‑Upload Validation Checklist (Copy/Paste)
-
All identifiers normalized (email lowercase; phones in E.164; Unicode NFC)
-
Invalid formats < 2% after validation
-
SHA‑256 hashing applied, lowercase hex, no salts
-
Deduped across sources with a stable person key
-
Consent + last‑active dates preserved in the master record
-
Sample match test performed on 100 random rows
Implementation in LeadEnforce
-
Unified audience builder: ingest CSVs from CRM, checkout, events, and web leads; apply one normalization profile across all.
-
Built‑in hygiene rules: define email/phone normalization once and reuse; monitor invalid‑record rate in the upload report.
-
Deterministic hashing: run SHA‑256 consistently on selected fields; export hashed or raw according to your workflow.
-
Smart dedup: create a person key, merge by recency, and keep consent/state intact.
-
Automations: schedule weekly refreshes and rolling windows (e.g., last 90 days) to keep seeds fresh.
Measuring Success
Track these hygiene KPIs after each upload and refresh cycle:
-
Upload → Match Rate by identifier; aim for 60–80%+ consumer, lower for B2B depending on data quality.
-
Invalid Record Rate: ≤ 2% sustained.
-
Duplicate Rate: < 3% in export files; 10–18% removal on initial consolidation is normal.
-
Net Audience Delta (weekly): Adds – Removals – Decay. Goal: positive.
-
Privacy Compliance: consent timestamps present for ≥ 98% of contactable records.
Troubleshooting
-
Low match rate (< 40%): re‑check normalization order; confirm hashing on UTF‑8; include a second identifier type.
-
High bounce/invalids: tighten regex and DNS checks; enforce E.164 for phones; exclude stale sources beyond your recency window.
-
Frequency spikes: dedup across channels; add exclusions for recent converters; rotate creatives.
Suggested Reads from the LeadEnforce Blog
-
Why Custom Audiences Shrink Over Time (and How to Rebuild Them)
-
Lookalike Audiences: How to Seed, Train, and Scale
-
Retargeting Without Waste: Exclusions, Frequency Caps, and Creative Rotation
Takeaway
Audience hygiene is one of the highest‑leverage steps you can control. By standardizing formats, hashing consistently, and deduplicating across sources—then automating the process in LeadEnforce—you protect privacy, improve match rates, and create cleaner seeds for both retargeting and lookalike expansion.