OpenAI is reportedly instructing third-party contractors to upload actual work files from current or past jobs — including Word documents, PDFs, spreadsheets, slide decks, code repositories, images and other deliverables — as part of an initiative to train and evaluate its next-generation AI agents, according to reporting by TechCrunch, Wired and other outlets. The company and partner Handshake AI want “real, on-the-job work” rather than summaries, and they’ve provided a tool for workers to try to scrub out proprietary or personally identifiable information before submission. However, intellectual property lawyers and privacy experts warn that this approach could expose confidential data, create legal risk for contractors bound by nondisclosure agreements, and blur lines around consent, compensation and intellectual property protections. OpenAI has declined to publicly comment on the program’s broader implications, even as critics highlight the potential for inadvertent data disclosure and ethical concerns surrounding the collection of high-value professional content for AI training.
Source:
https://techcrunch.com/2026/01/10/openai-is-reportedly-asking-contractors-to-upload-real-work-from-past-jobs/
https://www.wired.com/story/openai-contractor-upload-real-work-documents-ai-agents/
https://www.eweek.com/news/openai-contractors-upload-work-ai-systems/
Key Takeaways
- OpenAI is asking contractors to submit actual past work files — not just descriptions — to train and benchmark AI systems, increasing quality of training data but also legal exposure.
- The responsibility to remove confidential or proprietary information lies with the individual contractors, which critics say creates privacy and IP risks and could conflict with nondisclosure agreements.
- The move reflects a broader industry trend toward using real workplace data to improve AI capabilities but underscores ethical and compliance concerns that have not yet been fully addressed.
In-Depth
In what could be one of the most controversial training-data strategies in the artificial intelligence sector this year, OpenAI has quietly begun asking third-party contractors to provide real deliverables from their past and current jobs so that its next generation of AI agents can be trained and evaluated against a human standard. According to reporting from major tech outlets, including TechCrunch and Wired, these are not simple task descriptions — OpenAI wants actual files that show how professionals execute complex work, whether that’s spreadsheets with detailed calculations, slide decks with narratives, code repositories with annotated commits, or written documents with real arguments and structure.
From a product standpoint, the logic seems simple: if AI systems are to handle real white-collar tasks, they need to see the genuine outputs of human work rather than synthetic or abstract examples. Commercially, this would give OpenAI’s models better grounding in the realities of professional labor — a significant edge as the company pushes deeper into tools that claim to assist with enterprise workflows and professional services.
But here’s where conservative skepticism is warranted. This strategy transfers the burden of compliance onto individual contractors — many of them gig-workers or outside contributors — to decide what counts as confidential or proprietary before they upload it. An intellectual property lawyer cited in the reporting said this places “a lot of trust in its contractors to decide what is and isn’t confidential,” which is not something any attorney would recommend without iron-clad safeguards and legal review. Contrary to public relations language about “scrubbing” tools, modern documents carry layers of hidden data — metadata, revision histories and embedded objects — that are difficult to remove entirely without professional redaction tools and legal oversight, not DIY sanitization.
That raises two red-flag issues. First, contractors could inadvertently violate nondisclosure agreements or expose trade secrets from previous employers — potentially subjecting themselves to litigation or penalties. Second, this approach undermines the principle that knowledge workers deserve fair compensation and clear consent when their work is used to build commercial products. If OpenAI hopes to use professional outputs as training fodder, it should secure them through clear legal channels and direct contracts, rather than placing liability on workers who may not fully understand the risks.
Ultimately, this development points to the broader AI industry’s data supply problem: high-quality, domain-specific training data is scarce and expensive, and leading companies are pushing the bounds of how they obtain it. But in doing so, they risk eroding trust, weakening legal norms around intellectual property, and inadvertently turning well-intentioned contributors into unwitting compliance casualties. If we believe in free markets and robust property rights, the process of sourcing data for training frontier AI should be done transparently, ethically, and with full respect for the legal rights of all parties involved.

