Scaling personalisation in B2C retail: design challenges and real-world fixes

By Vishal Gaurav

Personalisation in business-to-consumer (B2C) retail began gaining recognition as a valuable commodity in the late 1990s and early 2000s. In the early 2010s, major retailers leveraged the advent of accelerated technologies such as artificial intelligence (AI), combined with advancements in Big Data and mobile commerce, for competitive advantage, making hyper-personalisation a differentiating factor in their marketing approach. By the end of the decade, personalisation had evolved for consumers from a novelty to an expected feature.

In the current B2C retail environment, businesses are increasingly challenged to compete for consumers who as a whole have more shopping opportunities and less focused attention (due to social media and other online platforms) than ever before. For a generation that has grown up with e-commerce, personalisation is expected and deeply integrated into the customer journey.

Retailers know that personalisation is a critical driver of customer engagement, loyalty, and revenue. Yet, while recognising the value proposition, implementing personalisation at scale remains a complex challenge for many organisations. While technology has evolved rapidly, the core issues often stem from how customer data is captured, processed, and ultimately used to inform marketing strategies.

Drawing from my experience architecting marketing databases, customer data platforms (CDPs), and customer relationship management (CRM) solutions across diverse retail landscapes, I have identified several key hurdles businesses face in their personalisation journey. In this article, I will discuss the root causes of these problems, delve into their business impact, and propose design strategies that can help overcome them to enable more effective and scalable personalisation efforts.

Scaling personalisation in B2C retail: design challenges and real-world fixes — Vishal Gaurav: Retailers know that personalisation is a critical driver of customer engagement, loyalty, and revenue. Yet, while recognising the value proposition, implementing personalisation at scale remains a complex challenge for many organisations.

Siloed and Incomplete Data Creates a Fragmented Reality

In the B2C quest to deliver truly personalised customer experiences, data is the lifeblood of any retail organization. But one of the most persistent challenges retailers face is that this data is often scattered in silos across multiple systems and touchpoints. Without a unified view, efforts to activate personalisation at scale fall short, resulting in inconsistent messaging and missed engagement opportunities that have a negative impact on customer loyalty.

The following retail systems are typical areas illustrating just how fragmented the customer data landscape can be across key customer data touchpoints:

• Loyalty Programme Sign Ups: captures customer identification, preferences, and transaction history, but often resides in standalone loyalty platforms.
• E-commerce Registrations: collects personal details, login behavior, and browsing patterns that are typically stored within the e-commerce or CMS stack.
• Mobile App Platforms: enables behaviour tracking, location data, and in-app purchase history; usually managed separately from web platforms.
• In-Store Clienteling Tools: associates or stylists collect personal notes and preferences directly from customers, but this data that may live only on tablets or local CRM systems.
• Point-of-Sale (POS) Systems: captures purchase history and sometimes contact details, but often not connected to digital identities.
• Marketing Opt-ins: includes email, SMS, push notification subscriptions, and preferences that are usually managed via marketing automation or consent management tools.
• Browse and Purchase Behaviour: clickstream data, cart activity, product affinity, all typically captured in analytics or personalisation engines.
• Customer Surveys: provides valuable feedback on satisfaction, preferences, or needs; often siloed within research platforms.
• Product Reviews: insightful data on sentiment and product preferences, typically stored separately from CRM and commerce platforms.

Each of these systems holds a piece of the customer puzzle, but without stitching them together, businesses cannot form a cohesive customer view. This systemic flaw often causes retailers to deliver inconsistent messaging across channels, sending customers disjointed or conflicting communications depending on the platform.

This disconnection also leads to redundant or irrelevant offers, diminishing the effectiveness of marketing efforts and potentially frustrating loyal customers. Moreover, the inability to personalise in real-time results in missed opportunities to engage customers at critical moments. Lacking a unified view, it becomes difficult to track customer journeys and behaviours, making it harder to optimise campaigns or measure impact accurately.

Finally, compliance with privacy regulations becomes a challenge, as consent and preference data are scattered across systems, increasing the risk of non-compliance and eroding customer trust.

To move toward scalable personalisation, retailers must invest in strategies and architectures that unify customer identities and behaviours across these touchpoints. These include:

• Data identifiers and reliability of data attributes. A critical first step in building a trustworthy and unified customer identity framework is the systematic identification and evaluation of customer data attributes across the enterprise. This process involves scanning all systems and applications - including third-party vendor solutions - where customer data is captured, updated, or enriched. This also includes behavioral and interaction-based systems like analytics tools, mobile apps, and clienteling platforms.

It is essential to assess each attribute’s data quality against five critical dimensions: accuracy, completeness, reliability, relevance and timeliness. Only attributes that achieve a data quality score of 90% or higher across these dimensions should be considered for integration into the Customer Data Platform (CDP). Including unreliable data not only undermines personalization efforts, but can also degrade customer trust and regulatory compliance.

This evaluation process is a joint responsibility between data stewards and the technical teams. While stewards bring expertise in governance, policy, and data integrity, technical teams provide the architectural and system-level understanding needed to assess data flow, availability, and transformation logic. Conducting this exercise with diligence and discipline ensures that the foundation of your customer identity strategy is strong - enabling better personalisation, consistent customer experiences, and compliance with data privacy expectations.

• Build real-time data pipelines alongside a scalable data lake. To support responsive and relevant customer experiences, retailers must move beyond traditional batch processing and adopt an event driven data architecture. This shift enables customer interactions to be captured, processed, and made available in near real-time - critical for timely personalisation and engagement.

This solution requires the implementation of technologies such as Apache Kafka, AWS Kinesis, or Snowflake Streams to build robust, real-time data pipelines that capture behavioral events, transactions, and profile updates at attribute level, as they happen. These pipelines can stream data into a centralised data lake.

By combining real-time pipelines with a scalable data lake architecture, organisations can store high-volume raw and processed data cost-effectively, while ensuring agility in how that data is accessed and activated. This hybrid approach supports both immediate personalisation use cases and longer-term analytical insights. The result is a data infrastructure that is not only faster and more responsive, but also more flexible - capable of adapting to evolving business needs and customer expectations in real-time.

Data Ingestion from the Data Lake into the CDP

As organisations mature in their data strategy, many already have a centralised data lake or data warehouse (like Snowflake, BigQuery, or Amazon S3) acting as a unified store for raw and transformed customer data. To unlock this data for personalised marketing, analytics, and customer engagement, it must be efficiently ingested into a Customer Data Platform (CDP). These best practices will help ensure a clean, scalable and compliant data lake to CDP ingestion:

• Ingest only what’s needed: focus on high-value, business-relevant attributes and filter based on customer engagement level, recency, or opt-in status. This approach will help keep CDP lean and relevant while reducing clutter and privacy risks.
• Transform before loading: use the data lake to clean, normalise, and enrich data before ingestion. Standardise the formats (dates, country codes, currency), handle nulls or special characters (address fields, descriptions), and deduplicate the records. The standardisation rules should be relevant organisation wide and applied in the data lake across. Use unified identifier strategy across platforms, such as hashed emails and phone numbers, UUIDs etc.
• Use incremental load or CDC: ingest data incrementally using timestamps or Change Data Capture (CDC) patterns, rather than full loads. Debezium, Snowflake Streams, or BigQuery’s partitioned tables are a few ways to achieve this. With one of our implementations, we attempted having attribute level CDC using Snowflake Streams. This helped us send only the attributes that changed, instead of the full record, which resulted in a 10% to 50% reduction in compute costs related to ETL/ELT pipelines, and a 10% to 90% reduction in data ingestion volumes (and related costs). In aggregate this yielded tremendous cost savings.
• Preserve source metadata and IDs: the original source and the respective identifier attributes should be preserved. This helps in tracing, identity resolution, and deduplication, as well as implementing an efficient CDC framework.
• Privacy consent and compliance: to ensure that personalisation efforts do not violate privacy laws and customer trust, data should be filtered out for customers who have opted out of data use in marketing. In cases where customer data has been sent to the CDP before the customer exercised their privacy rights, a delete event needs to be sent to the CDP (and downstream activation platforms). This will result in anonymising customer Personally Identifiable Information (PII) attributes, while retaining data that is essential for business.
• Scheduled automations and monitoring for business relevance: ensure data freshness matches use case expectations; transactional data (loyalty rewards, reward/offer redemptions, points accumulation, etc.) should have more frequency for data sync in the CDP.

In our implementation experience, we have worked with CDP vendors hosted on Snowflake, which brings an added layer of efficiency and flexibility. By using Snowflake Data Share, customer data can be made instantly available to the CDP without incurring additional data transfer costs. This integration dramatically accelerates time-to-value and simplifies the data onboarding process.

Difficulty in Creating Unified Customer Profiles

It is important to remember that a CDP implementation is not just a data integration exercise. The real value comes from a well designed data model and a thoughtfully architected identity resolution strategy. The following elements will ensure that the data ingested into the CDP is clean, relevant, and structured in a way that supports accurate customer stitching, segmentation, and activation.

• Establish a persistent customer identity layer: build a flexible identity model that supports both known and anonymous customers. Implement a Customer Identity and Access Management (CIAM) solution or identity graph that links identifiers across channels into a single profile, including:

o Primary identifiers: Email, Phone, Loyalty id, Login id
o Secondary identifiers: Cookies, Device IDs, IP address, hashed IDs
o Behavioral identifiers: clickstream, session id, location data

• Use hierarchical resolution strategy: a rule-based deterministic matching can be applied on records with exact matches on stable identifiers (email, phone, customer id, login id). These will have the highest confidence and lowest risk of false merges. Next, probabilistic matching (fuzzy logic) can be applied on records or identified attributes which could be similar (such as “J. Doe” and “John Doe”). Probabilistic matching should include confidence score, weights, and thresholds. There could also be a hybrid approach in which probabilistic matching is applied when needed, along with maintaining logs on merges and match confidence scores.

For marketing teams - particularly those focused on one-to-one personalisation - some level of probabilistic matching is almost always necessary. If the objective is to promote to individuals across multiple touchpoints in a personalised, timely manner, relying solely on deterministic data can lead to missed opportunities and fragmented customer experiences. It is always important to highlight this balance when discussing CDP strategies. While deterministic data anchors the foundation of identity resolution, probabilistic techniques enhance flexibility and scale. and ultimately enable more effective and personalised marketing outreach.

• Use identity graph to map relationships: identity graphs can help in connecting identifiers from different systems that match and need to be tied to a single user if across platforms and sessions. This can be used to support cross-device tracking, allows stitching together anonymous and known records, and enables real-time updates on profile.
• Prioritize data with source hierarchy: to create a reliable golden customer record in a CDP, data should be prioritized using a source hierarchy at two levels:

o Record Level – platforms with built-in verification (e.g., loyalty systems tied to credit cards, or e-commerce platforms used for shipping) should be trusted more when building the overall customer profile.
o Attribute Level – when multiple sources provide the same field (like an email), rules based on recency or source trustworthiness should determine which value is used in the unified profile.

This approach ensures more accurate, consistent, and actionable customer data.

• Governing Identity Resolution Policies in CDP: While identity resolution is the engine behind creating unified customer profiles, it is the governance of that process that ensures long-term accuracy, trust, and control. Without governance, the risk of incorrect merges or fragmented customer views can lead to ineffective personalisation, skewed analytics, and compliance risks. Below are the key principles for effectively governing identity resolution policies in your CDP. These points can also be considered when deciding which CDP to implement.

o Set clear rules for merging and splitting profiles: start by defining business and technical rules for when customer records should be merged (consolidated into one profile) or split (separated into distinct identities).

♣ Merge rules may be based on deterministic identifiers like email or phone number, or probabilistic matching with confidence thresholds.
♣ Split logic is equally important, particularly in cases where merged profiles are found to represent different individuals (e.g., two family members sharing an email).

These rules should be documented and maintained as part of your CDP’s identity resolution configuration. In enterprise settings, this often involves collaboration between marketing, data governance, and IT teams.

o Review low-confidence matches: even the most sophisticated CDPs can make incorrect decisions, especially when using probabilistic identity resolution. To mitigate risk, be sure to:

♣ Flag and log low-confidence matches
♣ Provide manual review mechanisms (either via UI or downstream review queues)
♣ Define confidence thresholds below which matches must be approved or validated before merging
♣ This human-in-the-loop model ensures that potentially risky matches are scrutinized and helps refine the resolution engine over time.

o Enable versioning and rollback: mistakes in identity resolution are inevitable. Correcting mistakes requires that every merge or split action should be:

♣ Version-controlled: keep historical versions of each customer profile and how it evolved over time.
♣ Reversible: allow profiles to be rolled back to a prior state in the event of an incorrect merge or deletion.

By implementing identity transaction logs and audit trails, you can also trace when and why a customer profile changed—a key requirement for troubleshooting and regulatory compliance.

o Make resolution decisions transparent and auditable: identity resolution should not be a “black box.” For both technical and business users to trust the CDP, they need visibility into:

♣ How profiles are merged
♣ What logic or match scores were used
♣ Which systems provided which data points
♣ Who or what system initiated the merge/split

To support this, CDPs should provide audit logs, resolution explainability tools, and metadata tagging for source systems and merge events.

Inaccurate or Outdated Data, Tracking and Coordinating Personalised Experiences

Personalization is only as good as the data behind it. For your personalisation efforts to hit the mark, your customer data needs to be accurate, current, and well-governed. Investing in the right data quality processes - and aligning them with your CDP and marketing systems - helps drive meaningful, relevant, and trusted customer experiences.

While it is important to keep CDP up to date with data, it is also of utmost importance to keep the downstream platforms, especially customer engagement platforms (CEPs) in sync with the latest and greatest of data. Remember, the CEP is the last step before data becomes visible to the customers through different marketing (email, SMS, push notifications) and sales (ecommerce, mobile app, PoS register) channels.

No retailer wants to send obsolete data in their messaging to customers, as doing so could erode customer trust and/or lead to irrelevant (or even offensive) personalised experiences, damaging customer relationships and brand reputation. Common problems resulting from data insufficiencies include:

• Misaligned messaging: customers receiving irrelevant messages that reflect their past behaviours instead of current state. Examples are loyalty customers shown old loyalty status, points, and rewards, or customers getting recommendations for products they recently purchased, or customers who recently moved getting localised offers for the wrong region.
• Wasted personalisation opportunities: showing generic messaging instead of personalized messages that could have made a difference. The result is missed revenue and lower engagement rates.
• Ineffective segmentation: due to stale data, customers are put into the wrong segments and thereby receive unsuitable messages and offers.

For retail businesses, ensuring that customers always receive timely and relevant communication demands a strong focus on synchronising data updates across their Customer Data Platform (CDP) and Customer Engagement Platforms (CEPs). The goal is to identify the optimal cadence at which data should be refreshed—not only in the CDP, but also in the systems that actually deliver the messages.

Aligning the timing of data updates with the timing of marketing activities (such as email sends, SMS drops, and in-app pushes), gives brands the power to drastically improve the accuracy and consistency of what the customer sees.

I advise retailers to move away from traditional batch pipelines. Streaming frameworks such as Apache Kafka, AWS Kinesis, or Snowflake Streams will enable customer actions and updates to flow continuously into the CDP and downstream platforms. Businesses can also leverage Snowflake Data Shares or similar mechanisms to allow their data lake, CDP, and CEPs to access a single source of truth, without incurring additional data movement costs. This approach allows for consistency, while also being operationally efficient.

For example, I was involved in solving an enterprise CDP implementation in which the team encountered a critical issue with loyalty reward data appearing inconsistently across customer touchpoints. Customers who logged into their online accounts could see newly issued loyalty rewards reflected instantly. However, the same updates failed to appear in their promotional emails or push notifications. This discrepancy caused confusion and a diminished sense of trust in the brand’s communications.

Upon investigation, we discovered that the issue stemmed from asynchronous data flows: the online platform received loyalty updates in near real-time, but the email platform relied on scheduled batch feeds that were not aligned with upstream data refreshes or the timing of campaign sends.

While the problem could have been addressed either by adjusting the feed schedule or aligning campaign timings, the team took a more sustainable approach. They redesigned the system so that loyalty data used in emails and push notifications was directly aligned with the upstream source, independent of the campaign schedule. As a result, any changes made upstream were instantly reflected downstream, ensuring real-time consistency across all channels.

Continual Improvements for Data Quality and Identity Resolution

Customer Data Platforms have emerged as a cornerstone of modern marketing strategies, enabling organisations to unify customer data from disparate sources and drive personalised engagement at scale. However, the success of any CDP depends not just on the technology itself, but on the ongoing processes that ensure data quality and improve identity resolution over time.

One critical area for continual improvement is the accuracy and completeness of customer data. This includes updating records using reliable sources such as the National Change of Address (NCOA) database, which helps maintain current and deliverable mailing addresses. In this context, it is also worth highlighting the value of the postal address as a primary identifier, particularly when email or phone data may be missing or inconsistent. Postal data, when verified and normalized, can serve as a stable and dependable anchor for linking customer records.

Another important, but often underestimated aspect, is the ongoing tuning of the deduplication process. CDPs rely on a combination of deterministic and probabilistic matching techniques to consolidate customer records, but there’s no one-size-fits-all solution for deduplication. Over-matching - where distinct individuals are merged incorrectly, and under-matching - where records for the same person are treated as separate - both introduce challenges. Striking the right balance between these extremes requires continuous monitoring, testing, and adjustment of matching rules.

The key is recognising that identity resolution is not a one-time task, but an evolving discipline. As customer behaviour changes, new data sources are introduced, and marketing goals shift, the matching logic within a CDP must adapt accordingly. Organisations that invest in this kind of iterative improvement not only achieve cleaner data, but also gain a more accurate, real-time view of their customers. This unified, up-to-date view can drive smarter decisions and more effective marketing campaigns.

Ultimately, the power of a CDP lies in its ability to evolve alongside the business. By treating data refinement and identity resolution as ongoing priorities, brands can ensure that their personalisation efforts remain precise, consistent, and impactful.

About Vishal Gaurav

Vishal Gaurav is IT Delivery Manager for CDP, Loyalty, and Personalisation at an historic and major American retailer that operates hundreds of stores across the Southeast. Specialising in marketing technology, hyper-personalisation, data driven customer engagement, and loyalty programmes, Vishal leverages more than 20 years of global market and multi-sector experience in designing and implementing solutions to drive customer centric growth and digital transformation.

Since 2011, his IT innovations have earned five US patents, and his work has been cited by ten of the world’s leading technology companies. As a strategic leader, Vishal works directly with C-suite executives, leads multi-million dollar projects, and supervises cross-functional teams of product owners, engineers, analysts, vendors, strategic partners and alliances to deliver cutting-edge customer-facing experiences.

He is particularly experienced in large-scale SAP CRM and ERP implementations for the retail, energy, utilities, and healthcare sectors, as well as AI, Cloud Computing, RFP process management, and budget forecasting and planning using Agile and Waterfall methodologies. Vishal earned a bachelor’s degree in Electrical Engineering from the Bhagalpur College of Engineering and a master’s in Information Technology from the International Institute of IT (IIIT) in Bangalore, India.

After relocating to the US, he completed advanced education in Machine Learning-Fundamentals and Algorithms at Carnegie Mellon University. His professional certifications include Certified ScrumMaster (CSM), Project Management Professional, and ITIL Foundations. Since 2018 he has volunteered as a judge, mentor, and coach at global competitive robotics events hosted by FIRST, the world’s leading youth serving nonprofit organisation advancing STEM education.

Contact Vishal on LinkedIn or email vishalg.martechpro@gmail.com

Featured