Implement the SCALE Framework on GCP: Step-by-Step Guide
How to Implement the SCALE Framework on GCP: A Step-by-Step Guide
Most GCP platforms I audit weren't designed — they accumulated. A VM here, a Cloud Run service there, IAM permissions granted ad hoc until nobody knows who can access what. The SCALE Framework exists to prevent this. But knowing the five pillars isn't the same as implementing them.
This guide walks you through the foundational setup I use in every new GCP engagement. Follow these steps and you'll have a security-first, IaC-driven, audit-ready platform foundation in place before you write your first application service.
Prerequisites
Before starting:
- GCP Organization created with billing linked
- Owner access to the Organization (you'll delegate down from here)
- Terraform 1.5+ installed locally
gcloudCLI authenticated with your admin account- A Git repository for your infrastructure code
If you're working in an existing GCP environment, these steps still apply — you'll just need to import existing resources into Terraform state.
Step 1: Establish Your Folder and Project Hierarchy
Why this matters (Security by Design): Flat project structures make IAM a nightmare. Folders let you apply security policies at the right level — org-wide guardrails, environment-specific permissions, project-level exceptions.
Create the folder structure:
gcloud resource-manager folders create \
--display-name="Production" \
--organization=YOUR_ORG_ID
gcloud resource-manager folders create \
--display-name="Non-Production" \
--organization=YOUR_ORG_ID
gcloud resource-manager folders create \
--display-name="Shared-Services" \
--organization=YOUR_ORG_ID
What goes wrong: Teams skip this step because they "only have one project right now." Six months later, they have twelve projects with no consistent structure, and retrofitting folders means rewriting IAM policies across everything.
Step 2: Set Up Your Terraform Foundation
Why this matters (Automation & IaC): Every environment provisioned manually is a future incident waiting to happen. Terraform isn't optional — it's the only way to guarantee Dev, Staging, and Production stay consistent.
Create your base Terraform structure:
infrastructure/
├── modules/
│ ├── project/
│ ├── networking/
│ └── iam/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
└── shared/
└── org-policies/
Your environment configurations should reference shared modules:
# environments/prod/main.tf
module "project" {
source = "../../modules/project"
project_name = "myapp-prod"
folder_id = "folders/123456789"
billing_account = var.billing_account
labels = {
environment = "production"
cost-center = "platform"
}
}
module "networking" {
source = "../../modules/networking"
project_id = module.project.project_id
region = "northamerica-northeast1"
# Prod gets dedicated /20, non-prod shares /22 ranges
subnet_cidr = "10.0.0.0/20"
}
What goes wrong: Teams put environment-specific values directly in modules instead of parameterizing them. Three months later, someone copies the prod config for staging and forgets to change the subnet range. Now you have overlapping CIDRs and broken VPC peering.
Step 3: Implement Organization Policies
Why this matters (Security by Design): IAM tells you who can do what. Org policies tell you what's allowed at all — regardless of who has permissions. This is your security baseline.
Apply foundational constraints:
# shared/org-policies/main.tf
resource "google_organization_policy" "restrict_vm_external_ip" {
org_id = var.org_id
constraint = "compute.vmExternalIpAccess"
list_policy {
deny {
all = true
}
}
}
resource "google_organization_policy" "require_shielded_vm" {
org_id = var.org_id
constraint = "compute.requireShieldedVm"
boolean_policy {
enforced = true
}
}
resource "google_organization_policy" "restrict_public_buckets" {
org_id = var.org_id
constraint = "storage.publicAccessPrevention"
boolean_policy {
enforced = true
}
}
What goes wrong: I've seen teams apply org policies at the folder level instead of the organization, thinking it gives them flexibility. Then someone creates a project outside the folder structure and it has none of the security controls. Apply at the org level, create exceptions explicitly where needed.
Step 4: Configure Workload Identity for CI/CD
Why this matters (Lifecycle Operations): Service account keys are the leading cause of credential leaks in GCP. Workload Identity Federation lets your CI/CD pipeline authenticate without any long-lived credentials.
Set up Workload Identity for GitHub Actions:
resource "google_iam_workload_identity_pool" "github" {
project = var.project_id
workload_identity_pool_id = "github-actions"
display_name = "GitHub Actions Pool"
}
resource "google_iam_workload_identity_pool_provider" "github" {
project = var.project_id
workload_identity_pool_id = google_iam_workload_identity_pool.github.workload_identity_pool_id
workload_identity_pool_provider_id = "github-provider"
attribute_mapping = {
"google.subject" = "assertion.sub"
"attribute.repository" = "assertion.repository"
}
oidc {
issuer_uri = "https://token.actions.githubusercontent.com"
}
}
What goes wrong: Teams configure Workload Identity but leave the old service account key in their CI/CD as a fallback. The key eventually leaks, and the Workload Identity setup provided zero protection because the compromised key still worked.
Step 5: Establish Cost Visibility
Why this matters (Elastic Scalability): You can't optimize what you can't see. Most cost problems I encounter aren't about expensive services — they're about spend that nobody's tracking.
Enable BigQuery billing export:
gcloud billing accounts describe YOUR_BILLING_ACCOUNT_ID
bq mk --dataset \
--location=northamerica-northeast1 \
billing_export
Then configure the export in the Console under Billing → Billing Export → BigQuery Export.
Set up budget alerts:
resource "google_billing_budget" "monthly" {
billing_account = var.billing_account
display_name = "Monthly Platform Budget"
budget_filter {
projects = ["projects/${var.project_id}"]
}
amount {
specified_amount {
currency_code = "CAD"
units = "5000"
}
}
threshold_rules {
threshold_percent = 0.5
}
threshold_rules {
threshold_percent = 0.9
}
}
What goes wrong: Teams set a single 90% threshold and ignore it because they've already exceeded budget by the time it fires. Set multiple thresholds — 50%, 75%, 90% — so you see trends before they become problems.
Common Mistakes Across All Steps
- Skipping the foundation to ship faster. Every shortcut here costs 10x to fix later.
- Treating Terraform as a one-time setup tool. It's not — it's how you operate. Changes go through PRs and CI/CD, always.
- Applying IAM at the project level when it should be inherited from folders. You end up with hundreds of bindings instead of dozens.
- Not testing in non-prod first. Org policies especially can break things in unexpected ways. Always validate in a dev project before applying org-wide.
This foundation gives you the first three SCALE pillars — Security, Cloud-Native Architecture, and Automation — in place. From here, you build your CI/CD pipelines (Lifecycle Operations) and scaling policies (Elastic Scalability) on a stable base.
If you're inheriting an existing GCP environment, the same principles apply — but the sequencing changes. You need an assessment first to understand what you're working with.
Work with a GCP specialist — book a free discovery call
Amit Malhotra, Principal GCP Architect, Buoyant Cloud Inc
Work with a GCP specialist — book a free discovery call → https://buoyantcloudtech.com