Inside Dots’ SF Tech Week Panel: The Future of Data Labeling and Human Context in AI

Dots hosted a panel at SF Tech Week featuring CloudFactory, HumanSignal, Samaya AI, TriFetch, and Waldium, exploring how humans, context, and alignment will define the future of data labeling.

Simran Handa

Oct 31, 2025 • 3 min read

During SF Tech Week, Dots hosted a panel bringing together five people building the invisible systems that teach machines how to think: Ajai Sharma (CloudFactory), Sheree Zhang (HumanSignal), Ashwin Paranjape (Samaya AI), Varuni Sarwal (TriFetch), and Amrutha Gujjar (Waldium).

The discussion quickly revealed that the hardest problem in AI isn’t just scale, like we’d expect, it’s meaning. How do you keep thousands of contributors, hundreds of clients, and dozens of models aligned on what “correct” even means?

Ajai, who oversees more than 7,000 workers labeling data for 700 enterprises, put it simply: “The first thing that breaks at scale is context.” He explained that as labeling tasks multiply across teams, languages, and use cases, small differences in interpretation can compound into major inconsistencies in model performance.

His team now versions annotation guidelines like software, complete with feedback loops, to keep human understanding consistent.

Sheree from HumanSignal (the creators of Label Studio), whose community has labeled over 100 million data points, echoed that sentiment. “It’s not just about labeling for volume,” she said. “It’s about ensuring human expertise and preferences are properly captured especially as data requirements evolve.” Finding the right domain experts and tailoring interfaces to effectively structure their insights are essential to a strong human-in-the-loop data workflow. After all, to trust your models, you must first trust the data they’re trained on.

Together, they reframed quality not as accuracy but as alignment: a shared understanding between people, systems, and models about what the data actually means.

When Annotation Becomes Craft

Sometimes, even alignment isn’t enough, because truth itself is subjective.

At Samaya AI, Ashwin Paranjape works on financial models where two experts can interpret the same dataset differently. “Accuracy is really important in financial matters, because money means so much to us,” he said. “And it’s really important to know whether humans and models are reasoning the same way.”

That shift, from correctness to interpretive alignment, transforms labeling from repetitive task to creative craft. Annotators become collaborators, shaping how machines think rather than verifying what they see. “Once people understand that connection,” Ashwin added, “their motivation changes completely.”

When Data Disappears

In healthcare, the stakes of mismanaged data aren’t theoretical. Varuni Sarwal from TriFetch shared an example that captured just how physical the problem still is.

One hospital she worked with had filled entire floors with paper patient records full of decades of information stacked wall to wall. When they finally ran out of space, staff began throwing older files away.

“It’s not just a storage issue,” Varuni said. “It’s knowledge literally disappearing.”

Each hospital believes its data is unique, and they’re all right. The challenge is translating that uniqueness into something standardized before it’s lost. To Varuni, standardization isn’t bureaucracy; it’s preservation. The next phase of regulation, she believes, will treat data lineage like financial audits — not optional, but required proof of integrity.

Labeling the Open Web

That same fragility now extends to the internet itself.

“The open web is the world’s largest unlabeled dataset,” said Amrutha Gujjar of Waldium. “And as AI becomes the way people find information, every company is suddenly in the labeling business.”

Her team builds AI-first visibility systems to ensure brands remain machine-readable and brand-safe across search and generative interfaces. “Every time a model like ChatGPT or Claude updates, something breaks,” she said. “But each break teaches us how models understand meaning.”

Her prediction: SEO rankings will fade. The future will belong to those who optimize for AI visibility, ensuring models understand not just their data, but their identity.

Humans, Still the Common Thread

Across industries, the pattern is clear. Humans aren’t being automated out of the loop, they’re moving up the stack, closer to the layer where meaning is made.

Ajai sees oversight as the only safeguard against semantic drift. Ashwin imagines new kinds of labeled data emerging for evaluating AI agents, not just training them. Varuni calls for regulatory transparency. And Amrutha sees labeling, marketing, and storytelling converging into the same discipline.

The future of data labeling won’t be defined by faster clicks or larger datasets. It will be defined by coherence, the ongoing negotiation between how humans see the world and how machines learn to interpret it.

At Dots, we power payouts for data labeling platforms across 150+ countries and 300+ payment rails (from UPI to M-PESA to PIX) so every contributor, no matter where they live, gets paid quickly and securely.

Reliable payouts don’t just reduce churn, they compound expertise. The longer contributors stay in your loop, the smarter and more stable your training data becomes.

If you’re building the next generation of AI infrastructure, start with the people training it. Book a demo to see how Dots keeps your workforce paid, motivated, and loyal.

When Annotation Becomes Craft

When Data Disappears

Labeling the Open Web

Humans, Still the Common Thread

Sign up for more like this.