I’ve been doing a bit of research into generative AI and security. My last article focused on security risks in a simple bot- https://medium.com/@mgbecken/for-those-about-to-bedrock-751fbc804012. This time around I wanted to experiment with Bedrock agents and investigate some of the things that are specific to agentic AI. Some distinct issues include:
Complexity and orchestration of tasks performed
Potential access of enterprise data sources
Ability to perform real time tasks with real time data
Tooling integration
Memory- both long term and short term
Once again, OWASP has some excellent material to read- https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/ directly addresses agent scenarios.
Another great article — Agentic AI Threat Modeling Framework: MAESTRO | CSA. breaks threat modeling and potential mitigation into a layered framework.
A wonderful GitHub repo with incredible information is — github.com/precize/OWASP-Agentic-AI
As I build a simple single agent application, I wanted to think through and comment on some of the security issues. I broke my categories of possible issues to address into three rough steps at this point- “Prebuild”, “During Build”, and “Postbuild”. Many of these topics are not specific to agentic AI.
PREBUILD
Marketplace :
Choosing our foundation model- possible threats are a noncompliant model and trustworthy model information. I talked about this in my last article, so I won’t go into detail here.
This time around I am using Amazon Nova Micro as a foundation model for cost and performance reasons. I am building my own agents in Bedrock but in a different situation, I would need to think about secure agent selection.
Threat Modeling:
The articles above have some great places to start. There are quite a few factors to consider depending on your architecture.
Data:
Is my data AI ready? Is it labeled with access restrictions and other metadata?
Operations planning:
— Plan for keeping our CICD workflow secure to prevent IaC manipulation. You could also use code signing and integrity verification.
— Plan on control- what boundaries and constraints can we set up between components, users, roles, and agents?
— Plan for updates and vulnerability management. An AI BOM would be helpful here to detail all the components you will be using from data and algorithms to dependencies and metadata. We would want to have a system for supply chain monitoring.
— Plan your operations monitoring and incident response strategy. How will we detect problems and attacks and how will we respond to them? What type of communications will we have in place and will we have a feedback loop to self correct? Will we need system redundancies in case of failure or resource exhaustion?
— Plan to train people. We might need to train our model, but we also need to train the humans in the loop. What should the end user expect? How will support deal with issues? What is the business goal?
DURING BUILD
Data - what kind of data am I using, and who is allowed access? What type of authentication am I using? Am I making sure data is encrypted with modern methods? I am specifically thinking of the DeepSeek iOS app here using 3DES with hard-coded encryption credentials.
Tools — How can I make sure my tools remain safe and isolated? What are our isolation requirements? We would want to make sure we have authentication, boundaries, and monitoring in place. I would also want to make sure I kept the tools up to date, so we aren’t sitting on any old vulnerabilities. We could scan the Lambda based tooling and report on outdated components in AWS to help with that.
It will be interesting to see our risks evolve with MCP usage. That is next on the bucket list.
Agents - we will want to think about potential hijacking and keeping multi-agent communications secure. What monitoring can we build to catch any problems? Agent to agent communication could have mutual authentication and continuous reauthentication for long running processes.
Prompts - am I being careful to have sufficient input validation? We would want to prevent code injection and system compromise. Another threat vector might be computationally expensive inputs to create resource exhaustion. An interesting threat to read about is temporal manipulation in prompts — https://github.com/precize/OWASP-Agentic-AI/blob/main/agent-temporal-manipulation-timebased-attack-13.md
I was picturing the look on a coworker’s face if we triggered something resource intensive at peak usage hours.
Memory- we would want to prevent both memory poisoning and possible data exfiltration. Memory content validation, session isolation, encryption, and authentication would be important. We would also want to monitor memory logs for anomalies in memory storage. If we are working with sensitive data, we would want to be especially careful and make sure we are not unnecessarily retaining the data. Bedrock encrypts session information with an AWS provided key but we could use a customer managed key if necessary.
Logs- are they encrypted, sensitive data masked? Are they aggregated and stored for as long as is required by regulations?
Decisions — Is there decision traceability and transparency? Are these being analyzed? Is there human intervention for high risk decisions?
Output — is there validation and constraints to detect and prevent hallucinations, etc?
Resource exhaustion - possibly we can have process control policies that would trigger throttling or auto suspension if one of our parameters goes over a defined limit?
Redundancy — is this in place for high value workflows?
Alerts and feedback loop- is there a way to alert on big problems and potentially fix them as we run?
Orchestration — if we are using multiple agents or tools, are we able to monitor coordination and integration? I am picturing an evil orchestrator here. “Bwahaha, let me add more double bass.”
POSTBUILD
Logs and Traces — now it’s time to look at all the logs and traces we hopefully created in the operational step. Do we have everything we need? Are they decently easy to analyze? Can we find a digestible way to share them with non-technical managers?
Analysis- is this providing the functionality and value we need? What are the costs associated with this workflow?
Vulnerabilities and observed attacks- what issues do we need to fix and what is the priority? How are we going to communicate these to users who need to perform the work?
AI BOM — do we need to update this or update components in the build?
Foundation model — do we need to switch out the model? Is it doing what we want? Could we use a cheaper model?
Continuous improvements — Can we optimize anything? On this build, I definitely see some more prompt engineering needs to be done.
Now it is on to my actual build procedure….
1.) DynamoDB creation for two tables: I will use these two tables to look at the Dependabot Alerts and Developer Team Info. I had a previously created Dynamodb table populated with GitHub Dependabot alerts. I used GitHub CLI to get the alerts and associated data, if you want details on the process, here is a very quick article I wrote: https://medium.com/@mgbecken/operation-dynamodb-wrangle-some-github-data-into-dynamodb-9947a1e99077. Security notes here, using roles, so IAM protection, baked in encryption at rest. Don’t put sensitive names or plaintext data in your Primary Key and Global Secondary Indexes, since the names will show up in your table definition.
The table with alerts is called ‘concat_dependabotalert’. I also created some fake Team data to put into a DynamoDB, ‘concatTeamInfo’.
2.) Create two Lambda functions to look up information in the DynamoDb tables: ‘concatAlertNumber’ and ‘concatTeamInfo’. The code for these is in my github at https://github.com/mgbec/0425AgentSec. You will also want to bump up the timeout for both of them. The default time out is 3 seconds and you will probably want more time than that.
You also need to add DynamoDB permissions to your execution role for each Lambda. In Configuration > Permissions click on the Role name. You will be redirected to IAM where you can click the “Add permissions” button and add the ability for this role to query the DynamoDB.
3.) Create an Open API schema: The tools my agent needs to run here are both using an OpenAPI schema to tell them how to work. You can write this in yaml or json and then put it in an S3 bucket. Security note: a potential threat vector is a malicious change to this file. We could scan this upload or file for evil doings and make sure our bucket is locked down. My OpenAPI yaml files are ConcatDependabotAlertStatus.yaml and concatTeamInfo.yaml, also available in my GitHub ‘https://github.com/mgbec/0425AgentSec’
4.) Now we get to build our agent. I am using Nova Micro, which is cheap, fast, and is working fine for the task at hand. More details about the Nova models:
Amazon Nova models, specifically Nova Micro, Nova Lite, and Nova Pro, have varying costs per 1,000 tokens. Nova Micro is the cheapest, with input tokens at $0.000035 and output tokens at $0.00014. Nova Lite is in the middle, costing $0.00006 for input tokens and $0.00024 for output tokens. Nova Pro is the most expensive, with input tokens at $0.0008 and output tokens at $0.0032.
https://aws.amazon.com/bedrock/pricing/
One of the most important parts of building the agent is creating the ‘Instructions for the Agent’. I tried to be as specific as possible and went back and adjusted as I tested.
There are additional settings that you can customize as needed. I left these as default but one potential setting that could affect security is the ‘idle session timeout’- your user’s submitted data remains in short term memory for the length of the session. The agents have additional memory configuration options, depending on the foundation model.
Another setting is ‘KMS Key selection’. You could opt to use your own KMS key if that is a requirement.
5.) Creating your action groups: (not the same thing as Azure action groups, as you might suspect.)
This is where our previous work creating our Lambda functions and OpenAPI schemas pays off. We will create an action group for each DynamoDB lookup we want to perform.
We can select the OpenAPI schema and Lambda function that corresponds with each tool.
Save your Agent and take note of the Agent ARN, we will use it in the next step.
6.) For both of your Lambda functions, you will want to let the agent have the permissions to be able to invoke them. Under Configuration>Permissions, you can add a resource based policy and enter the ARN of your agent in the Conditions.
7.) Now we can test our Bedrock agent. In the right side panel, you can try your input.
The traces will help you troubleshoot errors or unexpected output, and also look at the agent’s “thinking process”.
You could possibly return the information in another format or determine what functionality you wish was there for future development.
8.) Our next step could be to keep tweaking everything and adding more tooling. We might want to add a knowledge base or test MCP. GitHub does have an MCP server — https://github.com/github/github-mcp-server. At the time I wrote this article, we would be able to add CodeQL or Secrets alerts. Other MCP servers I would like to check out for this particular project can create Excel workbooks and look up CVE’s.
9.) Finally, if we are ready to deploy, we can create and publish a user interface in the method of our choice. This is another project on my list, to be continued. Thanks for reading!