Skip to Main Content
Faint pattern of 1s and 0s on top of hexagons

5 data types not to feed AI models

Faint pattern of locks, 1s and 0s on top of hexagons

We take a look at how Generative AI (GenAI) can be a subtle vector for data leakage and how Mondas approach the risks inherent in “black box” technologies. If someone in your team is innocently pasting data into public Large Language Models (LLMs), they might be sending that data to a third-party server and there’s a real risk that’s going to be absorbed into the model’s training set.

In order to maintain data hygiene, we need to take a look at what constitutes a breach in the age of AI. Here are 5 examples of data that we identified should never cross the threshold into a public AI prompt.

1. Personally Identifiable Information (PII)

It’s a bit of a misconception that PII refers only to National Insurance numbers or passport scans. In the context of AI, PII leakage is often more contextual.

Drafting an email to a client using AI can save time but if the prompt includes a client’s full name, email address, and details about a specific grievance or financial transaction, this may inadvertently expose sensitive data. Even if the AI platform claims to be “private,” you are transmitting PII to an external processor without the data subject’s explicit consent. This could even represent a potential violation of GDPR and UK data privacy laws.

The Rule: If the information can identify a living individual, keep it out of the prompt.

2. Source Code and API Keys

For developers, GenAI is an incredible debugging partner. The temptation to paste a block of malfunctioning code and ask, “fix this” is compelling. But the risk arises when that code snippet contains hardcoded credentials, API keys, or some proprietary logic that gives your organisation its competitive edge.

There have been documented instances of engineers accidentally leaking private keys to public chatbots. Now an LLM has ingested the data it can result in a non-zero risk of it being regurgitated in response to a prompt from a bad actor seeking vulnerabilities in your specific infrastructure.

The Rule: Sanitise all code. Replace keys with placeholders (e.g., <API_KEY_HERE>) and ensure the logic revealed does not expose core intellectual property.

3. Internal Financial and Strategic Data

Which member of the finance team hasn’t asked AI to summarise a quarterly report or create some notes or presentation for a meeting?

Non-public financial results, merger and acquisition rumours, restructuring plans, or sales figures are the lifeblood of your organisation. Feeding this data to a public AI model treats a third-party vendor as a confidant. Should a breach occur at the AI provider, or should the data legally be subpoenaed, your internal strategies are no longer internal.

The Rule: Treat the AI prompt box as a public billboard. If you wouldn’t put the data on your website, do not put it in a public LLM.

4. Regulated and Health Data (PHI)

For organisations operating in sectors dealing with health or highly regulated data, the stakes are huge. Protected Health Information (PHI) or intense legal case data requires strict chain-of-custody protocols.

General-purpose AI tools are rarely HIPAA or GDPR compliant by default. They lack the necessary encryption-at-rest and audit trails required for handling medical records or legal discoveries. Using them here is a security bad practice and often illegal.

The Rule: Use only specialised, walled-garden AI tools that are contractually obligated to adhere to your specific regulatory frameworks.

5. Credentials and Security Architectures

This is the “keys to the castle” scenario. Beyond API keys in code, security teams need to be wary of uploading network diagrams, firewall configurations, or incident response reports to generate summaries.

Describing your security posture to an AI to get advice on improvements could effectively map your digital perimeter for an external entity. You’re highlighting your known weaknesses to a system that stores that conversation history.

The Rule: Keep your security architecture documentation offline or within a self-hosted, secure environment.

Governance is Key

At Mondas, we’re not suggesting a return to the stone age. AI is a vital component of the modern security stack and the solution is governance, not prohibition (which can lead to Shadow AI covered here in a prior article).

Organisations must move towards “Enterprise” versions of AI tools where data privacy is contractually guaranteed (zero-day retention), or implement local, open-source models that run entirely within the company’s own secure infrastructure.

At Mondas, we leverage the latest technology to protect your assets, but we never forget that the human element in deciding what you choose to share and how remains the first line of defence.

Certify Your AI Security with ISO 42001

Internal policies are the first step but at Mondas we guide organisations through ISO 42001 compliance for the ultimate safeguard. Artificial Intelligence Management Systems (AIMS) is a certification that gives a structure to manage risks associated with GenAI. Partner with Mondas to achieve ISO 42001, secure your operations and turn AI into a competitive advantage. Talk to our AI compliance experts today.

This article was collated by Lance Nevill, our Cyber Security Director at Mondas. Lance works with organisations towards their compliance with a focus on ISO 42001 in this context, learn more about Lance on 🔗LinkedIn here.