Enterprise Authentication for Self-Hosted Data Infrastructure

Open-source products like Dagster and Airbyte come with comprehensive installation guides and automated setup scripts, making it relatively easy to get started and see your first pipelines or replication jobs running in your environment. However, things become more challenging when you need to productionize the installation, secure it properly, and integrate it with your existing infrastructure while making it available to internal users.

One critical step in this process is managing user access to the web UI of these products. Both applications feature beautiful interfaces, but only their cloud versions include fully-featured user authentication.

There are several approaches to adding user authentication to any application, ranging from building custom authentication solutions to implementing basic auth through your ingress controller.

Since both products run in Kubernetes, most ingress controllers can provide basic authentication capabilities out of the box. While this solution is simple, it doesn't scale with organizational needs and becomes manual and difficult to manage over time. Fortunately, there are numerous open-source projects that can help address this challenge.


Our recommended solution: Authentik

Our choice is Authentik (https://goauthentik.io/), an open-source Identity Provider (IdP) and Single Sign-On (SSO) platform. If you're familiar with enterprise products like Keycloak or ForgeRock, Authentik serves as a lean alternative to these heavyweight solutions. It's equally feature-rich but significantly more modern, easier to manage, and intuitive, while maintaining enterprise-grade security and readiness.


How it all works together

Since all products are designed to run in Kubernetes, integrating them is straightforward. The principle is simple: the ingress controller handles most of the heavy lifting by checking if requests are authenticated. If authentication is present, it verifies the credentials against Authentik. If no authentication is provided, it redirects users to the corresponding Identity Provider's login interface.


Integration flexibility

Authentik can function as a standalone identity provider where you create users, groups, enable self-registration, and manage user accounts directly within the platform. However, the real value comes from integrating with your existing company identity provider. Authentik supports numerous providers out of the box, including Azure AD and Google OAuth.

Similarly, you can integrate with any OAuth provider your organization currently uses.


Architecture Overview

The authentication architecture consists of the following key components:

External Identity Provider

  • Azure Active Directory serves as the primary identity provider

  • Contains two user groups:

    • dagster-users - Users authorized to access Dagster

    • airbyte-users - Users authorized to access Airbyte

Kubernetes Cluster Components

  • Authentik Deployment - Open-source IdP running within the cluster

  • Ingress Controller - Entry point for all external traffic

  • Dagster Application - Data orchestration platform

  • Airbyte Application - Data integration platform



Authentication Flow

Step 1: Initial User Request

  1. User attempts to access either Dagster or Airbyte application

  2. Request hits the Ingress Controller (entry point to the cluster)

Step 2: Authentication Check

  1. Ingress Controller forwards the request to Authentik for authentication verification

  2. Authentik checks if the user has valid authentication credentials

Step 3: Authentication Decision

If user is NOT authenticated:

5. Authentik presents login page to the user.

6. User authenticates with their corporate credentials (Azure AD)

7. Azure AD return User object including all attributes like groups (dagster-users or airbyte-users)

8. Upon successful authentication, Azure AD returns user to Authentik with authentication token

If user IS authenticated:

5. Authentik validates the existing session

Step 4: Authorization and Access

  1. Authentik confirms user authentication and returns success response to Ingress Controller

  2. Authentik includes the X-authentik-groups header containing the user's group memberships (e.g., dagster-users, airbyte-users)

  3. Ingress Controller examines the X-authentik-groups header to verify the user has appropriate permissions for the requested application

  4. If group membership is valid, Ingress Controller forwards the original request to the appropriate application (Dagster or Airbyte)

  5. User gains access to the requested application with their authorized permissions


Configuration Examples

For our cluster, we use Istio with Service Mesh and ingress controller. Authentik has excellent support for Envoy, which makes it very easy to integrate with not just Istio but any other Envoy-based ingress solution.

# Config on Istio side
values:
  meshConfig:
    accessLogFile: /dev/stdout
    extensionProviders:
      - name: authentik
        envoyExtAuthzHttp:
          service: authentik-server.authentik.svc.cluster.local
          port: 80
          pathPrefix: /outpost.goauthentik.io/auth/envoy
          includeRequestHeadersInCheck:
            - cookie
          headersToUpstreamOnAllow:
            - set-cookie
            - x-authentik-*
          headersToDownstreamOnAllow:
            - cookie
          includeAdditionalHeadersInCheck:
            x-forwarded-proto: '%REQ(:SCHEME)%'

and the remaining part to create Authorization Policy

# AuthorizationPolicy on dagster workload 
# instructing istio to authenticate with authentik
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: require-authentik
  namespace: dagster
spec:
  action: CUSTOM
  provider:
    name: authentik
  selector:
    matchLabels:
      component: dagster-webserver
  rules:
    - to:
        - operation:
            paths: ["/*"]


In the same way it's straight-forward to setup the same for Nginx or Traefik


Conclusion

Securing your self-hosted data stack doesn't have to break the bank or overwhelm your team. Authentik provides enterprise-grade authentication that integrates seamlessly with your existing setup, giving you the security controls you need without the complexity.

Your data teams get secure access to the tools they need, while your organization maintains centralized control over who can access what. It's a win-win solution that scales with your business.

Most open-source data tools come with beautiful interfaces but no authentication. The real challenge isn't getting your first pipeline running - it's making it enterprise-ready with proper user management, group-based access control, and integration with your existing identity provider. That's where the right authentication strategy transforms a simple setup into a production-grade data platform.

Most open-source data tools come with beautiful interfaces but no authentication. The real challenge isn't getting your first pipeline running - it's making it enterprise-ready with proper user management, group-based access control, and integration with your existing identity provider. That's where the right authentication strategy transforms a simple setup into a production-grade data platform.

Most open-source data tools come with beautiful interfaces but no authentication. The real challenge isn't getting your first pipeline running - it's making it enterprise-ready with proper user management, group-based access control, and integration with your existing identity provider. That's where the right authentication strategy transforms a simple setup into a production-grade data platform.

Abhivan Chekuri

Subscribe for the latest blogs and news updates!

Related Posts

dagster

Jul 7, 2025

Enter GitOps - a modern operations model where desired state lives in version control, and automation reconciles it to reality. In software engineering, GitOps is already the go-to for managing microservices. But it’s just as powerful for managing pipelines.

© MetaOps 2024

© MetaOps 2024

© MetaOps 2024