Mount S3 Objects to Kubernetes Pods
Ant(on) Weiss

Ant(on) Weiss @antweiss

About: Software Delivery Futurist. Chief Storyteller at https://perfectscale.io. CEO and founder at https://otomato.io. Co-founder at https://stagecentral.io.

Location:
Tel Aviv, Israel
Joined:
Oct 5, 2018

Mount S3 Objects to Kubernetes Pods

Publish Date: Jan 31 '22
58 16

This post describes how to mount an S3 bucket to all the nodes in an EKS cluster and make it available to pods as a hostPath volume. Yes, we're aware of the security implications of hostPath volumes, but in this case it's less of an issue - because the actual access is granted to the S3 bucket (not the host filesystem) and access permissions are provided per serviceAccount.

Goofys

We're using goofys as the mounting utility. It's a "high-performance, POSIX-ish Amazon S3 file system written in Go" based on FUSE (file system in user space) technology.

Daemonset

In order to provide the mount transparently we need to run a daemonset - so the mount is created on all nodes in the cluster.

The Dockerfile and the Helm Chart

We've built our own goofys Docker image based on Alpine Linux and a Helm chart that installs the DaemonSet.

The image is found on our Docker hub repo here: https://hub.docker.com/r/otomato/goofys

The Dockerfile and the Helm chart can be found here: https://github.com/otomato-gh/s3-mounter

S3 Access per ServiceAccount

The Helm chart currently assumes that S3 access is provided by using an IAM Role attached to a kubernetes serviceAccount. We may add API access keys support in the future if needed.

HowTo:

Here's how to set it all up:

1. OIDC Provider for EKS

Make sure you have an IAM OIDC identity provider for your cluster. If not - you can use the following commands (you'll need eksctl installed):

aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text`
Enter fullscreen mode Exit fullscreen mode

Example output:

https://oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

List the IAM OIDC providers in your account. Replace EXAMPLED539D4633E53DE1B716D3041E with the value returned from the previous command.

aws iam list-open-id-connect-providers | grep EXAMPLED539D4633E53DE1B716D3041E
Enter fullscreen mode Exit fullscreen mode

Example output

"Arn": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"

If output is returned from the previous command, then you already have a provider for your cluster. If no output is returned, then you must create an IAM OIDC provider with the following command. Replace cluster_name with your own value.

eksctl utils associate-iam-oidc-provider --cluster cluster_name --approve
Enter fullscreen mode Exit fullscreen mode

2. Create a Managed Policy for Bucket Access

Create json file named policy.json with the appropriate policy definition. For example - the following code snippet creates a json file that allows full access to a bucket named my-kubernetes-bucket:

read -r -d '' MY_POLICY <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
            ],
            "Resource": [
                "arn:aws:s3:::my-kubernetes-bucket"
            ]
        }
    ]
}
EOF
echo "${MY_POLICY}" > policy.json
Enter fullscreen mode Exit fullscreen mode

Create the managed policy by running:

aws iam create-policy --policy-name kubernetes-s3-access --policy-document file://policy.json
Enter fullscreen mode Exit fullscreen mode

Example output:

{
"Policy": {
"PolicyName": "kubernetes-s3-access",
"PolicyId": "ANPAS3DOMWSIX73USJOHK",
"Arn": "arn:aws:iam::04968064045764:policy/kubernetes-s3-access",

Note the policy ARN for the next step.

3. Create a Role for S3 Access

Set your AWS account ID to an environment variable with the following command:

ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
Enter fullscreen mode Exit fullscreen mode

Set your OIDC identity provider to an environment variable with the following command. Replace the example values with your own values:

OIDC_PROVIDER=$(aws eks describe-cluster --name cluster-name --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
Enter fullscreen mode Exit fullscreen mode

Copy the following code block to your computer and replace the example values with your own values.

read -r -d '' TRUST_RELATIONSHIP <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:my-namespace:my-service-account"
        }
      }
    }
  ]
}
EOF
echo "${TRUST_RELATIONSHIP}" > trust.json
Enter fullscreen mode Exit fullscreen mode

Run the modified code block from the previous step to create a file named trust.json.

Run the following AWS CLI command to create the role:

aws iam create-role --role-name eks-otomounter-role --assume-role-policy-document file://trust.json --description "Mount s3 bucket to EKS"
Enter fullscreen mode Exit fullscreen mode

Run the following command to attach the IAM policy using the ARN created in the previous section to your role:

aws iam attach-role-policy --role-name eks-otomounter-role --policy-arn=IAM_POLICY_ARN
Enter fullscreen mode Exit fullscreen mode

4. Finally - Install the S3 Mounter!

  • Add the helm repo to your repo list:
helm repo add otomount https://otomato-gh.github.io/s3-mounter
Enter fullscreen mode Exit fullscreen mode
  • Inspect its arguments in values.yaml
helm show values otomount/s3-otomount
Enter fullscreen mode Exit fullscreen mode

The values you want to set are in the end:

bucketName: my-bucket
iamRoleARN: my-role
mountPath: /var/s3
hostPath: /mnt/s3data
Enter fullscreen mode Exit fullscreen mode
  • Install the chart by providing your own values:
helm upgrade --install s3-mounter otomount/s3-otomount   \ 
--namespace otomount --set bucketName=<your-bucket-name> \
--set iamRoleARN=<your-role-arn> --create-namespace
Enter fullscreen mode Exit fullscreen mode

This will use the default hostPath for the mount - i.e /mnt/s3data

5. Use the mounted S3 bucket in your Deployments.

Here's an example pod definition that provides its container the access to the mounted bucket:

apiVersion: v1
kind: Pod
metadata:
  name: sleeper
spec:
  containers:
  - command:
    - sleep
    - infinity
    image: ubuntu
    name: ubuntu
    volumeMounts:
    - mountPath: /mydata:shared
      name: s3data
  volumes:
  - hostPath:
      path: /mnt/s3data
    name: s3data
Enter fullscreen mode Exit fullscreen mode

Note the :shared - it's a mount propagation modifier in the mountPath field that allows this volume to be shared by multiple pods/containers on the same node.

And that's it! You can now access your bucket. If you've created the pod from our example - you can exec to verify:

kubectl exec sleeper -- ls /mydata
Enter fullscreen mode Exit fullscreen mode

Note: running this on your cluster will cost you a few additional $ for S3 API calls that goofys performs to maintain the mount. So remember to monitor your cloud costs. But you should do that anyway, right?

Happy delivering!

Comments 16 total

  • Eldad Assis
    Eldad AssisJan 31, 2022

    Cool setup. Have you tested its speed?

    • Ant(on) Weiss
      Ant(on) WeissJan 31, 2022

      No, speed wasn't a consideration here. Main motivation here was providing an easy and transparent way to upload files and make them accessible to pods. S3 gives users an easy and secure UI for that. Goofys is supposedly quite performant compared to other FUSE implementations (i.e s3fs). But we haven't benchmarked this ourselves.

      • Eldad Assis
        Eldad AssisJan 31, 2022

        Thx! Would love to know numbers if you ever do try it :-)

  • behrooz hasanbeygi
    behrooz hasanbeygiJan 31, 2022

    In high number of files its fail you due to nature of s3 api for small files the http response will be bigger than files.

    I think mounting s3 is a bad idea, if you have enough developing resources its better to write a client for code to connect directly to s3 and cache list of s3 files ... For better performance.
    But its a fun thing to do, also cephfs with rados gateway will give you better performance in kubernetes

    • Ant(on) Weiss
      Ant(on) WeissFeb 2, 2022

      good to know. not an issue in our case - we have a small number of large files there. And I agree it's not such a great idea in general - both performance wise and because of the hidden complexity. But it solved our specific itch and may help others solve it.

  • Lidor Ettinger
    Lidor EttingerFeb 8, 2022

    Amazing approach!
    Thx for sharing in details.

  • Randy Gupta
    Randy GuptaJun 22, 2022

    Nice approach. However, you might want to have a look at JuiceFS:

    github.com/juicedata/juicefs

    That has quite a good performance due to the combination with Redis and it is made with Kubernetes in mind.

  • Oleksii Smiichuk
    Oleksii SmiichukFeb 22, 2023

    Hi, Can set multiple bucketName?
    I need to interact with few s3 buckets for different tasks

    • Ant(on) Weiss
      Ant(on) WeissFeb 24, 2023

      hi @oleksiihead
      no support for this right now.
      to add this one would need to do smthng like:

      1. modify the Dockerfile to replace the container startup command with an entrypoint script that mounts buckets in a loop.
      2. modify the Helm chart to receive an dictionary of bucket names and mount points and pass these into the DaemonSet

      If you get to do this - please submit a PR.

  • dirai09
    dirai09May 25, 2023

    Hi,
    I don't think I am able to mount the volumes on the hostPath. Am I missing something here.

  • dirai09
    dirai09May 26, 2023

    I have tried this and the other similar option mentioned in this blog. blog.meain.io/2020/mounting-s3-buc.... In neither case, the mounting to hostPath was successful for the cluster managed by AWS EKS.

    • Ant(on) Weiss
      Ant(on) WeissMay 28, 2023

      Hi @dirai09 , this was originally tested on AWS EKS. I haven't tested it since but it should in theory still work. What is the error you're getting when trying to mount the hostPath?
      Also - can you share your config in a gist?

  • Big Bunny
    Big BunnyJul 25, 2023

    I tried to use the sharing method to complete the entire demo, but unfortunately this didn't work. Because the goofys mount directory( /var/s3fs) in daemonset is not the same as the directory I want to share with the host(/var/s3fs:shared);

    /otomato # df -h
    Filesystem                Size      Used Available Use% Mounted on
    /dev/nvme0n1p1           50.0G      4.7G     45.3G   9% /var/s3fs:shared
    poc-s3goofys-source 1.0P         0      1.0P   0% /var/s3fs
    
    Enter fullscreen mode Exit fullscreen mode

    Is there any configuration I missed?

    Daemonset.yaml

    
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      labels:
        app: s3-mounter
      name: s3-mounter
      namespace: otomount
    spec:
      selector:
        matchLabels:
          app: s3-mounter
      template:
        metadata:
          labels:
            app: s3-mounter
        spec:
          serviceAccountName: s3-mounter
          containers:
          - name: mounter 
            image: otomato/goofys
            securityContext:
              privileged: true
            command: ["/bin/sh"]
            args: ["-c", "mkdir -p /var/s3fs && ./goofys --region xxxxx -f poc-s3goofys-source /var/s3fs"]
            volumeMounts:
              - name: devfuse
                mountPath: /dev/fuse
              - name: mntdatas3fs
                mountPath: /var/s3fs:shared
          volumes:
            - name: devfuse
              hostPath:
                path: /dev/fuse
            - name: mntdatas3fs
              hostPath:
                path: /mnt/s3data
    
    Enter fullscreen mode Exit fullscreen mode
  • Jane
    JaneMar 30, 2024

    I ran into an issue where goofys doesn't reload the content of a small txt file. It updates the timestamp though. Do you know what could be wrong?
    I have goofys run inside a container.

Add comment