Skip to main content

Command Palette

Search for a command to run...

Implement the SCALE Framework on GCP: Step-by-Step Guide

Published
5 min read

How to Implement the SCALE Framework on GCP: A Step-by-Step Guide

Most GCP platforms I audit weren't designed — they accumulated. A VM here, a Cloud Run service there, IAM permissions granted ad hoc until nobody knows who can access what. The SCALE Framework exists to prevent this. But knowing the five pillars isn't the same as implementing them.

This guide walks you through the foundational setup I use in every new GCP engagement. Follow these steps and you'll have a security-first, IaC-driven, audit-ready platform foundation in place before you write your first application service.

Prerequisites

Before starting:

  • GCP Organization created with billing linked
  • Owner access to the Organization (you'll delegate down from here)
  • Terraform 1.5+ installed locally
  • gcloud CLI authenticated with your admin account
  • A Git repository for your infrastructure code

If you're working in an existing GCP environment, these steps still apply — you'll just need to import existing resources into Terraform state.

Step 1: Establish Your Folder and Project Hierarchy

Why this matters (Security by Design): Flat project structures make IAM a nightmare. Folders let you apply security policies at the right level — org-wide guardrails, environment-specific permissions, project-level exceptions.

Create the folder structure:

gcloud resource-manager folders create \
  --display-name="Production" \
  --organization=YOUR_ORG_ID

gcloud resource-manager folders create \
  --display-name="Non-Production" \
  --organization=YOUR_ORG_ID

gcloud resource-manager folders create \
  --display-name="Shared-Services" \
  --organization=YOUR_ORG_ID

What goes wrong: Teams skip this step because they "only have one project right now." Six months later, they have twelve projects with no consistent structure, and retrofitting folders means rewriting IAM policies across everything.

Step 2: Set Up Your Terraform Foundation

Why this matters (Automation & IaC): Every environment provisioned manually is a future incident waiting to happen. Terraform isn't optional — it's the only way to guarantee Dev, Staging, and Production stay consistent.

Create your base Terraform structure:

infrastructure/
├── modules/
│   ├── project/
│   ├── networking/
│   └── iam/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
└── shared/
    └── org-policies/

Your environment configurations should reference shared modules:

# environments/prod/main.tf
module "project" {
  source = "../../modules/project"

  project_name   = "myapp-prod"
  folder_id      = "folders/123456789"
  billing_account = var.billing_account

  labels = {
    environment = "production"
    cost-center = "platform"
  }
}

module "networking" {
  source = "../../modules/networking"

  project_id = module.project.project_id
  region     = "northamerica-northeast1"

  # Prod gets dedicated /20, non-prod shares /22 ranges
  subnet_cidr = "10.0.0.0/20"
}

What goes wrong: Teams put environment-specific values directly in modules instead of parameterizing them. Three months later, someone copies the prod config for staging and forgets to change the subnet range. Now you have overlapping CIDRs and broken VPC peering.

Step 3: Implement Organization Policies

Why this matters (Security by Design): IAM tells you who can do what. Org policies tell you what's allowed at all — regardless of who has permissions. This is your security baseline.

Apply foundational constraints:

# shared/org-policies/main.tf
resource "google_organization_policy" "restrict_vm_external_ip" {
  org_id     = var.org_id
  constraint = "compute.vmExternalIpAccess"

  list_policy {
    deny {
      all = true
    }
  }
}

resource "google_organization_policy" "require_shielded_vm" {
  org_id     = var.org_id
  constraint = "compute.requireShieldedVm"

  boolean_policy {
    enforced = true
  }
}

resource "google_organization_policy" "restrict_public_buckets" {
  org_id     = var.org_id
  constraint = "storage.publicAccessPrevention"

  boolean_policy {
    enforced = true
  }
}

What goes wrong: I've seen teams apply org policies at the folder level instead of the organization, thinking it gives them flexibility. Then someone creates a project outside the folder structure and it has none of the security controls. Apply at the org level, create exceptions explicitly where needed.

Step 4: Configure Workload Identity for CI/CD

Why this matters (Lifecycle Operations): Service account keys are the leading cause of credential leaks in GCP. Workload Identity Federation lets your CI/CD pipeline authenticate without any long-lived credentials.

Set up Workload Identity for GitHub Actions:

resource "google_iam_workload_identity_pool" "github" {
  project                   = var.project_id
  workload_identity_pool_id = "github-actions"
  display_name              = "GitHub Actions Pool"
}

resource "google_iam_workload_identity_pool_provider" "github" {
  project                            = var.project_id
  workload_identity_pool_id          = google_iam_workload_identity_pool.github.workload_identity_pool_id
  workload_identity_pool_provider_id = "github-provider"

  attribute_mapping = {
    "google.subject"       = "assertion.sub"
    "attribute.repository" = "assertion.repository"
  }

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

What goes wrong: Teams configure Workload Identity but leave the old service account key in their CI/CD as a fallback. The key eventually leaks, and the Workload Identity setup provided zero protection because the compromised key still worked.

Step 5: Establish Cost Visibility

Why this matters (Elastic Scalability): You can't optimize what you can't see. Most cost problems I encounter aren't about expensive services — they're about spend that nobody's tracking.

Enable BigQuery billing export:

gcloud billing accounts describe YOUR_BILLING_ACCOUNT_ID

bq mk --dataset \
  --location=northamerica-northeast1 \
  billing_export

Then configure the export in the Console under Billing → Billing Export → BigQuery Export.

Set up budget alerts:

resource "google_billing_budget" "monthly" {
  billing_account = var.billing_account
  display_name    = "Monthly Platform Budget"

  budget_filter {
    projects = ["projects/${var.project_id}"]
  }

  amount {
    specified_amount {
      currency_code = "CAD"
      units         = "5000"
    }
  }

  threshold_rules {
    threshold_percent = 0.5
  }

  threshold_rules {
    threshold_percent = 0.9
  }
}

What goes wrong: Teams set a single 90% threshold and ignore it because they've already exceeded budget by the time it fires. Set multiple thresholds — 50%, 75%, 90% — so you see trends before they become problems.

Common Mistakes Across All Steps

  1. Skipping the foundation to ship faster. Every shortcut here costs 10x to fix later.
  2. Treating Terraform as a one-time setup tool. It's not — it's how you operate. Changes go through PRs and CI/CD, always.
  3. Applying IAM at the project level when it should be inherited from folders. You end up with hundreds of bindings instead of dozens.
  4. Not testing in non-prod first. Org policies especially can break things in unexpected ways. Always validate in a dev project before applying org-wide.

This foundation gives you the first three SCALE pillars — Security, Cloud-Native Architecture, and Automation — in place. From here, you build your CI/CD pipelines (Lifecycle Operations) and scaling policies (Elastic Scalability) on a stable base.

If you're inheriting an existing GCP environment, the same principles apply — but the sequencing changes. You need an assessment first to understand what you're working with.

Work with a GCP specialist — book a free discovery call


Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc


Work with a GCP specialist — book a free discovery callhttps://buoyantcloudtech.com

More from this blog

Buoyant Cloud Inc

9 posts