Sumit Tewari discusses retail businesses leveraging new technologies and his outlook for innovation
Sumit Tewari builds scalable data flows and technology solutions for the world’s largest retailer, with a focus on data governance, quality frameworks, and ESG/human capital data lakes. Interview by Ellen F. Warren.
Sumit Tewari is Senior Manager Data Engineering for the world’s largest retailer. In this leadership position, he applies more than 20 years of experience in the data domain, enhanced by his focused expertise over the past decade in designing and optimising complex enterprise data lake systems and pipelines, modernising software systems, and spearheading technology migrations to cloud-based platforms.
Sumit has built a stellar career on leading critical and transformative data engineering and infrastructure projects for global Fortune 500 companies, including major financial institutions, retail industries, and IBM.
As an international leader and engineering consultant, he has significant experience in establishing data governance and data quality frameworks for large, enterprise data warehousing platforms and data lakes using Agile SDLC. He has notably achieved measurable outcomes by partnering with diverse business teams and product stakeholders to deliver viable technology solutions and value, while strategically meeting organizational objectives.
Currently based in Frisco, Texas (US), Sumit leads large, cross functional, global software engineering teams in developing scalable, highly available, enterprise-grade data flows to drive process improvement and high performance in Design Extract, Transform and Load (ETL) architecture, analytics solutions, End of Life (EOL) and End of Services (EOS) product Service Level Agreements/Objectives for data sets, and resilience disaster recovery processes, among a broad range of functional imperatives.
He is additionally responsible for preparing the global human resources data storage infrastructure annual budgets, including planning and forecasting project costs and cost controls in project execution; performing proof of concept and rapid prototyping for emerging technologies; and hiring, training, supervising, and mentoring software/data engineers and teams, among other areas.
Sumit earned his Master’s degree in Computer Applications in 2001 from Jawaharlal Nehru National College of Engineering (JNNCE) in Shimoga, India. Prior to his current role, he was Vice President of Software Engineering for the fifth largest global bank.
Now, in his present position, he is proud to be contributing to his employer’s sustainability and human capital management efforts by leading the company’s environmental and social governance (ESG) and human capital data lake initiatives, producing positive outcomes in decision-making, operational efficiency, and employee performance.
RTIH spoke with Sumit about how retail businesses can leverage new technologies to optimise data lake frameworks, warehousing, and infrastructure, how he is using these tools to achieve objectives in in ESG and HR domains, and his outlook for innovation applications in the retail sector.
RTIH: Let’s start with your background. You previously had a long and distinguished career with a global financial leader, where you focused on modernising technology platforms with innovative solutions.
You also developed data driven solutions to analyse and interpret Covid-19 statistics that informed public health policies and financial strategies during the pandemic. How did your work in the banking sector prepare you for your current role with the world’s largest retail entity? And how are you applying your subject matter expertise to achieve operational and financial objectives in a retail environment?
ST: My background in banking gave me a solid foundation to tackle the biggest data driven challenges in retail.
At JPMorgan Chase & Co., I worked on modernising data platforms across mortgage banking, risk management, and finance, which gave me hands-on experience with scalable and reusable solutions. These systems, like the universal underwriting platform, made it possible to ease business processes in all departments and drive better efficiencies by using data and process automation.
This approach translates directly into retail growth. For example, when I was working on supply chain optimisation for PepsiCo and Cingular Wireless early in my career, as an IBM technology consultant, I faced similar challenges in managing large-scale data with architecture that would be flexible and cost-effective.
In both industries, I found that the benefits of tech stack, such as open source Apache Spark, Apache Airflow, Public cloud platforms such as GCP/AWS for compute and storage for processing Big Data, and HashiCorp Terraform for automating infrastructures became apparent.
In retail, solutions need to handle larger transactions, optimise inventory, and improve customer experience. Building a reusable system helps companies adapt quickly to changing market demands, ensuring flexibility and efficiency without starting from scratch.
Finally, optimising capacity utilisation in such a complex and flexible system is increasingly important in improving outcomes in the financial and retail sectors.
RTIH: You bring a strong track record of modernising enterprise level data systems to your current role. How have you facilitated these transformations for large organisations, and what guidance do you advise for retail businesses, which may be operating on outdated legacy systems?
ST: Updating legacy systems is always a tough call, especially in larger organisations where the existing tech stack is heavily integrated into critical operations. In these situations, the risks are high, but with the right approach, it is possible to make meaningful changes without disrupting the project.
The first step is to understand the customer’s needs and core system use cases. This allows us to look at which parts of the legacy system have the greatest risk, complexity, and potential for return on investment (ROI). With that knowledge, we can prioritise the right locations for modernisation.
I generally recommend starting with a pilot project - a proof of concept to help validate our approach. The goal here is to test functionality, scalability, and architecture in a controlled environment. This helps to minimise the risk before committing to large packages.
At this stage, we have identified our cloud environment for compute and storage - like GCP/AZURE/AWS, design of model data pipeline, Apache spark-based data flow framework, scheduling options like Apache Airflow, and Database for Data store - such as Google BQ/Redshift/Snowflake.
Before production cutover and post-system integration/regression testing, a production parallel testing in is essential. This enables us to develop modern solutions alongside legacy systems, while ensuring that we do not compromise performance or efficiency. It also helps build stakeholder confidence by demonstrating that the new solution can handle real-world situations.
It’s important for retailers and others in similar industries to see modernisation as an investment in the future, not just an immediate expense, starting with the original core improvements - to improve efficiency, safety, and confidence - and then move on to a specific application or implementation based on organisational needs. This scalability allows you to modernise at a constant pace without overwhelming your operations, while continuing to deliver value to the business.
RTIH: Throughout your career, you have specialised in data lakes and warehousing platforms. How are you using new technologies in these areas, and how are these tools enabling you to improve outcomes? Can you give us some examples of process improvements? How is innovation transforming applications for data lakes, and what does this mean for the retail sector?
ST: In today’s rapidly evolving data environment, I firmly believe that technological capabilities are the key to unlocking opportunities that once seemed elusive. The improvements we’ve seen in both storage and computing power over the past decade have been revolutionary.
What used to be a struggle - management of a few hundred gigabytes - has now been developed into petabytes of data with a robust workflow. This shift has redefined how we access data lakes and warehouses. Data governance and data quality, and an optimised business domain-based enterprise data model are essential components of good large data lake.
In my approach, I advocate the benefits of modern open technologies Such as Apache Spark, Apache Airflow, and Apache Hudi in combination with public cloud-based infrastructure. These open source platforms allow developers to customise, test, and modify solutions to meet specific project needs.
This is especially useful in larger organisations where cost is a key consideration. With this tech stack, we can deliver high performance, scalable solutions that allow companies to invest in business expansion without the burden of significant infrastructure costs.
A recent example of this in practice is a programme that compared salary data from two different programmes. Traditionally, this would have taken thousands of hours of manual work by data analysts.
However, by combining advanced cloud-based data lakes with AI driven analytics, we were able to streamline this process - reducing cost and time, while significantly increasing accuracy. Such innovations not only change business processes, but also allow retail and other businesses to adapt quickly to changing business needs.
The takeaway for leaders in large organisations is simple: embrace the flexibility of today’s technology stack and leverage open source cloud capabilities. These technologies enable faster and more cost-effective solutions and enable companies to innovate without being constrained by legacy systems or excessive infrastructure costs.
RTIH: More specifically, tell us about your work in the ESG and Human Capital data lake arenas. First, your company is working diligently to reduce greenhouse gas emissions and boost renewable energy projects, simultaneously creating jobs, supporting local economies, and fostering sustainability practices across its delivery chain. As a data engineer, what is your role in contributing to these efforts?
ST: Environmental and social governance (ESG) initiatives are critical in establishing appropriate performance standards in supermarkets and other businesses. As a technology leader and data strategist, I'm focused on using technology to advance sustainable practices, improve diversity ratios, and ensure that ESG activities are clearly defined.
In ESG terms, data lakes are central repositories for structured and unstructured data to manage three core ESG pillars: opportunity, sustainability, and community. These themes are all driven by ethics and integrity. In Opportunity, the ESG programme emphasises fair wages, employee benefits, career mobility, and employee diversity, to improve job satisfaction and retention, and advance communities.
Sustainability efforts aim to be climate leaders, reduce waste, and recycle natural resources, as well as build strong and responsible supply chains. In the Community, ESG strives to source safe, healthy products and services, support disaster relief efforts, and connect the local community. Ethics and integrity ensure high standards of governance, compliance, digital citizenship, and human rights in all these areas. .
By creating a modern cloud data lake on platforms such as Google Cloud bucket storage, we can collect and manage this information efficiently, enabling the monitoring of key performance indicators (KPIs) as they link to ESG goals in real time (i.e. reducing emissions and adopting renewable energy). Automated ETL pipelines built using Apache Airflow and Apache Spark frameworks facilitate seamless integration across data sources and systems, simplifying data feeds and processing.
On the human capital side, our data set supports a comprehensive assessment of employee diversity. We use technologies like Apache Spark and Google BQ to efficiently process Big Data about employee demographics, enabling us to track gender and ethnicity demographics over time. This data driven approach provides diverse and integrated reporting that is helpful in identifying systems and areas that need correction.
One notable contribution is the generation of ESG related features designed for 10-K reporting. Leveraging technologies such as Terraform and Kubernetes, we have transformed manual data collection into an automated process that ensures data accuracy and complies with reporting standards.
The integrated AI model enhances our ability to detect trends and anomalies, and significantly improves the speed and accuracy of report generation. This architecture assures that ESG data is timely, accurate, and scalable, empowering organisations to meet evolving regulatory needs and business objectives. It also allows us to turn ESG projects into insights, by providing valuable data that informs executive-level decision-making throughout the workplace.
RTIH: Besides focusing on environmental concerns, ESG includes provisions for social governance. You have worked closely with the human resources function and worldwide IT teams to build a data lake that spans seven geographical markets and integrates critical employee records.
Your success in overseeing the end-to-end project and acting as a liaison between business stakeholders and tech teams earned you one of the company’s highest awards. Tell us about this project, and how it advanced your employer’s social governance goals. What lessons can other retailers learn from your experience?
ST: In my recent project focused on enhancing social governance through a robust data lake, we collected critical employee records across seven geographic markets, effectively breaking down silos and making us more visible to all the employees. This role was critical to driving our employer ESG objectives, and ensuring data integrity and accessibility, which is foundational for informed decision-making around our diversity, equity and inclusion policy.
Using technologies like Google Big Query, Apache Hudi, and Apache Spark for data processing, we created an architecture that supports structured and unstructured data. This allowed us to analyze employee demographics, track trends across different employees, and gain usable insights for our HR teams.
In terms of continuing to evolve technology, we have ensured that our architecture is flexible and scalable. Using a modular approach, with tools like GCP Data Procs for container orchestration, we can easily integrate as new technologies emerge. We have also adopted Agile methodologies to enable iterative improvement and continuous improvement, enabling our teams to adapt quickly.
Collaboration was important in all these initiatives, so I acted as a liaison between executive and technical teams, facilitating meetings and presentations to ensure goals were aligned. This not only helped social governance develop meaningful KPIs, but also fostered a culture of transparency and accountability.
The key lesson for retailers is to invest in a data-driven culture that prioritises cross-functional collaboration. By establishing a centralised data lake, organisations can enhance their social governance process and drive better business results.
Additionally, ongoing training and upgrading employees’ skills with emerging technologies will better equip teams to navigate the rapidly evolving landscape of data management and analytics.
RTIH: Building collaboration and consensus between engineers and diverse functional teams seems to be a common element across all your projects. As an IT leader, how do you align strategies and objectives across teams to define goals, meet business objectives, and ensure successful outcomes, all while creating value and retaining a strong focus on your customers?
ST: Stakeholder consensus among engineers and functional groups is essential to the success of the project. As an IT leader, I emphasse aligning teams’ strategies and objectives to achieve common goals, while delivering value and maintaining a strong customer focus through a client-based approach.
In this context, a client typically means a long-term partner with whom you develop a deep, strategic relationship, focusing on tailored solutions and shared goals. A customer is generally a consumer, for whom we focus on providing specific products or services. Client management involves a consultative approach, while customers demand efficient service delivery and satisfaction with their immediate needs..
In my experience, it is important to see problem issues through the eyes of stakeholders to ensure full transparency about a project’s progress. This process not only builds trust, but allows us to tailor our technology priorities to better suit their needs. When stakeholders feel understood and valued, we can develop SMART (specific, measurable, attainable, relevant, ultimate) goals that align with business objectives and customer needs.
To facilitate this process, I recommend regular assignment meetings where teams can share insights, challenges, and successes. This open communication fosters collaboration and helps break down silos between technology and functional teams. By ensuring that all voices are heard, we build a culture of trust and collaborative project outcomes.
I also use Agile techniques to provide a focus on iterative improvement and continuous feedback. This allows us to evolve quickly when needed and refine our strategies in response to stakeholder feedback. By incorporating tools such as JIRA for project tracking and Confluence for documentation, we increase transparency and ensure everyone is aligned with project objectives.
Finally, by positioning the participants as clients rather than just customers, we achieve a relationship that emphasises sharing. This focus on collaboration enables teams to define clear objectives that align with business objectives and drive successful results. In doing so, we increase the delivery of our business goals, and create sustainable value that is aligned with both our clients and stakeholders.
RTIH: Your leadership responsibilities include all aspects of engineering talent hiring and onboarding, technical training, coaching, and team supervision. You’ve said that you enjoy motivating individuals and teams and mentoring young engineers. With advancements in AI, ML, RPA and other innovations accelerating so rapidly, how do you and your teams stay ahead of the curve? And how do you motivate exceptional performance?
ST: In rapidly evolving technologies like AI, ML, and RPA, engineering teams need a strategic and hands-on approach to motivate and guide them. As a technology leader and architect, I prioritize several key actions to ensure we remain ahead of the curve and drive exceptional performance.
First, I embed the principles of servant leadership. This means creating an environment where team members feel empowered and supported in their role. I focus on building strong relationships, which helps me understand their individual strengths and areas for growth.
We incorporate Agile retrospectives into our workflows to keep our knowledge sharp. These meetings allow us to reflect on our strategies, celebrate successes and identify areas for improvement. This continuous feedback encourages a growth mindset and motivates team members to embrace change.
Regular pulse surveys provide insight into team morale and engagement. By analysing this information, we can adjust our approaches to any concerns, ensuring that our environment remains conducive to collaboration and innovation. This data driven approach helps us adapt to the needs of the team.
Technically, we use tools like Jupyter Notebooks to implement using AI and ML algorithms. I encourage hands-on learning through hackathons and project sprints, which allow engineers to explore these technologies in a supportive environment.
During important design meetings, I actively seek input from team members and value their input in designing the architecture. The constructive feedback provided during these conversations motivates them and reinforces their sense of ownership over their business.
In addition, we track progress and establish common objectives, and conduct periodic reviews to ensure alignment with our broader business objectives. This clarity fosters accountability and inspires a shared commitment to excellence.
By emphasising clear communication, I create a culture of transparency that builds trust within the team. In challenging situations, I demonstrate decisiveness and responsibility, confidently guiding the team through adversity. Through these strategies, I inspire exceptional performance while preparing my team to adapt to technological advances. The combination of effective leadership, continuous learning, and collaborative practices has created a high performing environment where innovation can flourish.
RTIH: What do you see on the horizon for how technology will influence and impact the retail sector over the next decade? What should retailers be doing now? How can they plan to optimize their systems so they won’t be left behind?
ST: As we look to the future, the retail market is set for massive technology driven change, especially through the lens of artificial intelligence (AI) and advanced data analytics. Over the next decade, AI will become increasingly important in streamlining big data processes that are currently labour intensive and complex.
To stay ahead, technologists and business leaders need to think critically about how AI can be integrated into their operations. Instead of a short-term focus, I recommend adopting a three-to-five-year strategic plan. This long-term vision will enable retailers to leverage AI technology more efficiently, and implement it in alignment with their business objectives.
When it comes to technology stack, choosing a platform-based solution is important. Using integrated platforms such as Microsoft Azure or Google Cloud Platform can provide common, reusable infrastructure and domain services that simplify operations for teams and applications. This architectural approach increases productivity and allows scalability of applications. Retailers also need to embrace modern marketing practices that encourage innovation.
Modern marketing practices focus on data driven approaches, personalised customer experiences, and the use of digital tools such as AI, automation, and analytics to engage audiences across multiple channels.
Customers can enjoy the benefits of remote global locations and make informed decisions by analysing insights, and can further optimise each interaction. A broad tech stack is needed to embrace modern marketing, including customer data platforms (CDP) like Segment for integrated customer insight, HubSpot for marketing automation, and Salesforce CRM for customer management.
Tools like Google Analytics and Tableau enable data driven decisions, while social media platforms like Hootsuite and Optimizely optimise their personalisation efforts. AI powered solutions like H2O.ai and IBM Watson improve personalisation and predictive marketing. This integrated approach improves efficiency and personal and customer engagement at multiple levels.
Additionally, it is important to phase out legacy systems that hinder agility and responsiveness. Investing in modern solutions - like Salesforce for customer relationship management or Shopify for e-commerce - will not only reduce operational costs in the long run, but also keep teams aligned for the effective use of modern equipment. Training employees in this modern solution will ensure that they are prepared to maximise their capabilities.
I believe that retailers must actively embrace technological advances, especially with a focus on AI and data analytics. By developing comprehensive long-term strategies, choosing integrated platform solutions, fostering a culture of innovation, and retiring outdated systems, organisations can continue to improve their performance and profitability, and remain competitive in an ever evolving environment.
Continue reading…