About: I am a Hong Kong based cloud native software engineer in the InsurTech industry focusing on DevOps and Security who always think about the worse case scenarios. Recently hacking on GenAI tools too.
Location:
Hong Kong
Joined:
Aug 25, 2022
Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!
Publish Date: Jan 2
21 1
Why bedrock-access-gateway-function-url
This article is for GenAI builders who cares for all of these:
I want to use OpenAI SDK/compatible API
No fixed cost, pay as you go priced
Serverless LLM, no self hosting
Multiple models in one codebase
Lightweight solution without bloatware
If you want to stick to AWS, and wants to keep the simplicity without the burden of maintain extra configuration, continue reading.
Why not XXX as solution instead?
Solution
Pros & Cons
LiteLLM (SDK)
(-) full list of unnecessary dependencies potentially bloating your Python environment/application, e.g., gunicorn, fastapi, google-cloud-kms, etc.
(-) Python only
(-) >US$16/month
(-) Extra Load Balancer needed + Fargate/Lambda pricing
aisuite
(+) No bloatware issue with usage of extra Python dependencies
(+) No extra infra cost
(-) Python Only
This Solution
(+) Only minimal pay-as-you-go Lambda exec costs
A Typical GenAI Builder's Struggle
You are a builder specialized on AWS, maybe with a lot of AWS Credits like me.
You want to build GenAI applications when you found that most starters/examples are based on OpenAI's official Python/NodeJS SDKs, e.g.:
fromopenaiimportOpenAIclient=OpenAI()completion=client.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}])print(completion.choices[0].message)# > Hello! What can I help you?
If you did read through Amazon Bedrock docs, you would realize that the data schema of the Bedrock Runtime Converse API for chat completions is very different from OpenAI's. If you need to allow model/provider switching in your GenAI application, this is particularly a burden because you might need to write very different implementations for each provider.
There are also other provider specific implementations: VertexAI, Gemini API, LangChain, etc. It takes effort to rewrite your code to cater for more models. If you are working on multiple projects, you might be maintaining the same set of code within different projects.
A New Hope - But It Comes with a Fixed Cost
To fix this issue, AWS has provided the great project aws-samples/bedrock-access-gateway - It allows you to deploy an Application Load Balancer + Lambda/Fargate pair so that you can use OpenAI's official SDKs with the OpenAI-API compatibile Rest API endpoint via the environment variables OPENAI_API_BASE and OPENAI_API_KEY.
It achieves goals #2 and #3 in the first section. You can work on projects utilizing OpenAI SDKs with ease.
The Fixed Cost Strikes Back
Yes it's absolutely great, but it's also costly if you are building your GenAI project particularly with your own money/limited budget:
Application Load Balancer is running 24/7 once deployed, it comes with a fixed cost per hour:
$0.0225 per Application Load Balancer-hour; or
$16.2 / month FIXED cost regardless of usage
In addition, there is also the variable cost:
No. LCUs used * $0.008 per LCU-hour
Fargate (the alternative deployment option) is also running 24/7, so it also comes with an additional fixed cost on top of ALB:
$0.04048 / vCPU hour
$0.004445 / GB hour
$35.5 / month FIXED cost under the default 1vCPU+2GB RAM setup
It's a cost nightmare especially for those who don't require 24/7 uptime and usage for the OpenAI compatible API endpoint.
Also if a fixed cost is unavoidable, why don’t we just start a cloud VM and put everything inside it instead?
Why Bedrock in the First Place?
Something feels wrong to me. I used Amazon Bedrock with the 1st reason being it's serverless nature and pay as you go capability - Why bother to pay a gigantic fixed monthly cost to host your own open sourced LLM with a VM paired with expensive GPU when you can just pick the serverless option?
The 2nd reason of picking Bedrock is on the ease of switching models.
With Bedrock, not only you can use proprietary models like Amazon Nova, but also it's immediate compatibility with other open source models like LLaMA 3.3 (While VertexAI is still offering LLaMA 3.2 at most) or Mistral by just changing the model field in your code - without extra “endpoint deployments” - this is what other major Cloud AI providers can't provide at the moment.
For example for Azure AI, every non-OpenAI model needs to be deployed into separate inference endpoints:
Again, I want to stick with Bedrock with my OpenAI SDKs, but I am not willing to pay a fixed recurring cost for my GenAI application that might not generate 24/7 traffic.
When in Doubt, Read the Docs First
The maintainers of bedrock-access-gatewaysuggested, namely for performance improvements that:
Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
This sample app provided by AWS was based on a Lambda function with a Docker runtime, and as it's name suggests, it is a sample app not used for general purposes: Serverless Bedtime Storyteller. With this example in place, I can build a serverless "Fixed Cost lessness" version of the bedrock access gateway.
Building the bedrock-access-gateway-function-url project
I made a few tweaks from the original bedrock-access-gateway project:
This is necessary since using the Lambda Web Adapter resets some Python Path settings which would cause your Layered dependencies to be un-importable.
Lastly, the crux of my project is the very prepare_source.sh file - it fetches the latest Python source of bedrock-access-gateway with git so that the latest efforts from the aws-examples contributors are included. The scripts clones from the latest main branch of the project, and copies the Python FastAPI implementation of the access gateway.
It also conducts an optional dependency reduction if you do not need to call the embeddings endpoint, as large PyPI dependencies like numpy or tiktoken could have been avoided.
Deployment
Straightforward. I personally recommend using the AWS CloudShell as you can even do so with your mobile AWS Console, and you can save some time by skipping the need of a Docker build:
sudo yum update -ysudo yum install-y python3.12 python3.12-pip
git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url
./prepare_source.sh
sam build
sam deploy --guided
After within a minute, grab the value of FunctionUrl as well as recall the value of ApiKey value you supplied earlier in sam deploy:
Outputs
Key Function
Description FastAPI Lambda Function ARN
Value arn:aws:lambda:us-east-1:123456789012:function:sam-app-BedrockAccessGatewayFunction-yLLzetPaKSq5
Key FunctionUrl
Description Function URL for FastAPI function
Value https://lukeskywalker.lambda-url.us-east-1.on.aws/
Successfully created/updated stack - sam-app in us-east-1
Now, test your own dedicated pay-as-you-go serverless infrastructure OpenAI-compatible API endpoint in your GenAI application!
Alternatively, I have built a minimal UI based on the deep-chat project so that you can test it without access to any local shell environment: https://chat.gab.hk/.
No worries about security - it’s an open sourced static website, no backend and tracking scripts. Just bring your own endpoint and key.
Return of Cost Effectiveness
With the new true serverless option, here are the costs incurred:
Amazon Bedrock costs: Pay-as-you-go according to token usage
Lambda Invocation costs: Per GB-second + Per Requests
So here is the final repository containing the entire setup:
OpenAI-Compatible RESTful APIs for Amazon Bedrock, modified from the original "bedrock-access-gateway" project for not using ALB, so that one could deploy and use it under a pay as you go model WITH NO FIXED COSTS.
(-) Extra Load Balancer needed + Fargate/Lambda pricing
aisuite
(+) No bloatware issue with usage of extra Python dependencies
(+) No extra infra cost
(-) Python Only
This Solution
(+) Only minimal pay-as-you-go Lambda exec costs
Intro
This repo is combining the great works of the original implementations of bedrock-access-gateway with aws-lambda-web-adapter so that one can deploy an OpenAI API compatible endpoint on AWS Lambda with Function URL and streaming enabled.
This solution is more cost effective than the original bedrock-access-gateway solution as it removes the need of fixed cost components (Application Load Balancer and…
great post. I have successfully deployed this solution for instant translation tool.