AI tools today rely heavily on something called "embeddings," long strings of numbers representing text, images, or other data points. Whether your team is building personalized chatbots, improving search relevance, or analyzing customer sentiment, you're likely using embeddings generated by models from vendors like OpenAI.
A recent breakthrough published in May 2025 by researchers in the paper Harnessing the Universal Geometry of Embeddings discovered something remarkable: behind the scenes, most embedding models share an almost identical "latent geometry.” Think of embedding models as human experts turning human text into numerical vectors, with each expert giving slightly different answers based on their particular training or personality, but still capturing similar meanings and structures. Just like two skilled experts might choose different wordings to express the same idea, different embedding models produce different numerical vectors to represent the same text. Underneath, they still rely on a common web of meanings and concepts.
The researchers harnessed this insight by building a tool called vec2vec, which can effectively translate embeddings between entirely different models even if those models were created independently by separate companies, without ever needing the original sentences or documents that the vectors represent. Imagine being able to easily convert information from Google's embedding model into OpenAI's embedding model without having to reprocess the original documents.
This research breakthrough comes with important new risks and serious privacy concerns. In their findings, the team successfully demonstrated they could extract sensitive details like names, dates, and financial specifics from embedding vectors alone. They showcased this vulnerability by recovering personal information directly from the embeddings of Enron employee emails.
This also challenges the widespread assumption that embeddings inherently obscure or anonymize information. In reality, embedding vectors can leak confidential details if an attacker employs advanced techniques like vec2vec.
For executives, this means it's critical to revisit how your organization treats embeddings. They should no longer be considered safe to share publicly or store unencrypted. Compliance and risk management teams will now need to classify embeddings similarly to protected customer data, ensuring appropriate governance, access controls, and encryption.