Audit Your Stack: A DevOps Playbook to Detect Underused SaaS with Logs & Billing
A practical DevOps playbook (with scripts and queries) to find underused SaaS using logs, billing and API probes — and safely kill or consolidate them.
Hook: Your SaaS bills are climbing while feature flags gather dust — here's how to prove what to kill
Tool sprawl and hidden SaaS cost are a predictable drag: recurring invoices, duplicated features, integration maintenance and the mental overhead for teams. If you’re a DevOps or platform lead asked to cut costs without breaking workflows, you need more than opinions: you need repeatable, scriptable audits that surface actual usage, per-feature ROI and technical overlap so you can justify consolidation or sunsetting with data.
What this playbook delivers (fast)
- Step-by-step procedures to build a scriptable SaaS inventory from logs, billing exports and repo scans.
- Queries and scripts (BigQuery / Athena / ELK / Python / bash) you can run today to quantify usage and cost per action.
- API probe patterns to validate live dependencies and surface shadow integrations.
- A practical scoring matrix to rank consolidation candidates and an operational checklist for safe decommissioning.
Context: Why 2026 makes this urgent
By 2026, SaaS proliferation has accelerated: more micro-SaaS vendors, AI-native feature add-ons and consumption-based pricing mean bills can spike unpredictably. FinOps practices matured across many orgs in 2024–2025, and major cloud providers expanded granular billing exports and real-time usage APIs. That makes this the year to move from manual guesswork to automated, evidence-based SaaS pruning.
Overview: The audit workflow (inverted pyramid — do this order)
- Inventory: Collect a canonical list of all SaaS subscriptions, service accounts, API keys and integrations.
- Telemetry: Gather usage logs, billing exports and API usage telemetry into a queryable store.
- Probes: Actively test endpoints, webhooks and tokens to confirm live dependencies.
- Normalize & Analyze: Map cost to activity and features; detect overlap.
- Score & Prioritize: Rank candidates to kill, consolidate, or keep.
- Decommission Plan: Migration steps, data retention, and rollback triggers.
1) Inventory: Automated discovery (80% of the battle)
Start with a canonical SaaS inventory. Manual spreadsheets are fragile; automate discovery from three sources:
- Billing exports (credit card statements, vendor invoices, cloud marketplace).
- Source repositories and IaC (search for SDKs, providers, and env vars).
- Secrets stores and CI/CD config (service tokens live here).
Repo & config scan (fast wins)
Use ripgrep or ag to find SDK usage and env var names. Run from your monorepo root and aggregate results.
# Find common provider SDKs and API keys with ripgrep (rg)
rg --hidden --no-ignore-vcs "(AWS|GCP|AZURE|SLACK|SENDGRID|STRIPE|SENTRY|DATADOG|ROLLBAR|NEWRELIC|AUTH0|OKTA)" -S -n
# Scan for environment variables that look like API keys
rg --hidden --no-ignore-vcs "(API_KEY|_TOKEN|_SECRET|CLIENT_ID|CLIENT_SECRET|SERVICE_ACCOUNT)" -S -n
Export results to CSV and correlate with billing records. This uncovers hidden integrations and developer experiments.
Secrets & CI systems
Query your secrets manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) for key names. Many vendors use standard prefixes — this helps find service accounts used only in pipelines.
2) Telemetry: Ingest billing & usage data into a single store
Combine three streams into one analytics layer: billing exports, SaaS provider usage APIs, and your own request logs. Use an existing data warehouse (BigQuery, Snowflake) or cloud storage + query engine (S3 + Athena).
Common ingestion sources and how to get them
- Cloud billing: AWS Cost and Usage Report (CUR) to S3 → Athena; GCP Billing export to BigQuery; Azure Cost Management exports to Storage → Synapse/Azure Data Explorer.
- SaaS invoices: Pull vendor invoices via accounting exports (CSV) or via vendor billing APIs where available.
- Usage APIs: Many vendors (Datadog, Snyk, Sentry, etc.) expose endpoints for API key usage, seat counts and metered features.
- Application logs: Centralize to ELK/Opensearch or Splunk and forward to data warehouse.
Sample SQL: find top SaaS cost centers (BigQuery / Athena style)
SELECT
service AS vendor,
SUM(cost) AS total_cost,
COUNT(DISTINCT invoice_id) AS invoices
FROM
billing_exports.saas_costs
WHERE
usage_start BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY
service
ORDER BY
total_cost DESC
LIMIT 50;
Map cost to teams and tags
If you tag invoices or use cost centers in cloud providers, join billing data with your org mapping table. If not, infer team ownership by searching for account emails or project IDs in invoice metadata.
3) Logs: Usage telemetry queries that reveal real users
Logs answer different questions than billing. Billing shows spend; logs show who actually used a feature and when. Look at API call counts, feature flags, webhook deliveries and metric emission.
ELK / OpenSearch query examples
# Find unique users hitting a vendor integration endpoint in the last 30 days
POST /app-logs-*/_search
{
"size": 0,
"query": {"bool": {"filter": [{"term": {"path": "/thirdparty/slack"}}, {"range": {"@timestamp": {"gte": "now-30d"}}}]}}
,"aggs": {"users": {"cardinality": {"field": "user.id"}}}
}
SQL on events (data warehouse)
SELECT
integration_name,
COUNT(*) AS calls,
COUNT(DISTINCT user_id) AS active_users,
PERCENTILE_CONT(call_duration, 0.95) OVER (PARTITION BY integration_name) AS p95_ms
FROM
telemetry.api_calls
WHERE
integration_name IS NOT NULL
AND event_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP()
GROUP BY
integration_name
ORDER BY
active_users DESC;
Key metrics to compute
- Active users: distinct users who used a product feature in the last 30/90/365 days.
- Cost per active user: vendor monthly cost / active users.
- Calls per day: signal for automated integrations.
- Peak vs median: to spot over-provisioning or spikes from cron jobs.
- Error surface: high error rates on vendor API calls indicate brittle integrations.
4) API probes: Confirm live dependencies with safe probes
Logs show historical usage; probes validate current, possibly undocumented, dependencies. Build ephemeral probes with a low blast radius and rate limits.
Probe patterns
- Token introspection: Use vendor APIs to list active API keys and last-used timestamps.
- Webhook delivery audits: Check vendor webhook management APIs and verify destination success rates.
- Endpoint reachability probes: Regularly call integration endpoints (with test payloads) to confirm they're being hit and measure latencies.
Python example: list active API keys from a hypothetical vendor
import requests
API_BASE = 'https://api.vendor.example/v1'
ADMIN_KEY = 'REDACTED_ADMIN_KEY'
resp = requests.get(f"{API_BASE}/admin/api_keys", headers={"Authorization": f"Bearer {ADMIN_KEY}"})
resp.raise_for_status()
for key in resp.json().get('keys', []):
print(key['id'], key['last_used_at'], key['owner'])
Use pagination and exponential backoff. Many vendors return last_used timestamps — that's gold for identifying stale keys.
5) Integration discovery: scan for inbound & outbound hooks
Webhooks are the usual culprit for hidden dependencies. Check both sides:
- Vendor dashboard — list webhook subscriptions and targets.
- Your system logs — search for 2xx responses to vendor IP ranges or user-agents.
Example: find webhook receivers in Nginx logs
# count requests by upstream user-agent over last 30 days
zgrep "vendor-webhook" /var/log/nginx/*access*.gz | awk '{print $12}' | sort | uniq -c | sort -rn
6) Normalize & analyze: Join cost, logs and inventory
Now join the datasets into a single table keyed by vendor and team. Weight metrics to compute a usage score and cost-impact. Example schema columns:
- vendor, service_id, owner_team
- monthly_cost, invoices_last_12m
- active_users_30d, calls_30d, errors_30d
- last_api_key_use, webhook_success_rate
Scoring algorithm (practical)
Compute three normalized scores (0–100): UsageScore, CostScore, RiskScore. Then compute CandidateScore = CostScore * (1 - UsageScore) * RiskFactor.
# Pseudocode
UsageScore = normalize(active_users_30d / team_size)
CostScore = normalize(monthly_cost / total_platform_cost)
RiskFactor = 1 if last_api_key_use < 180 days else 0.5 # stale keys reduce immediate risk
CandidateScore = CostScore * (1 - UsageScore) * RiskFactor
# High CandidateScore -> prioritize for consolidation or sunsetting
7) Detect feature overlap and consolidation paths
Feature overlap can be the hardest to quantify. Build a feature matrix with boolean flags and usage counts. Example features: notifications, error-tracking, APM, SSO, secrets management, email delivery.
Practical method
- List features for each vendor (manual + vendor docs).
- Map events or logs to features (e.g., errors → error-tracking).
- For each feature, compute active_users and calls across vendors.
- If a single vendor covers 90%+ of feature events vs others covering <10%, consolidation is feasible.
Useful thresholds (opinionated but practical)
- Decommission candidate: monthly_cost > $500 and active_users_30d < 5% of expected users.
- Consolidation candidate: feature overlap > 60% and combined operational cost > 1.5x a single vendor option.
- Keep but review: high cost but high criticality (SSO, secrets, billing).
8) Decommissioning checklist (operational playbook)
- Stakeholder communication: announce impact, timeline, owners.
- Map affected teams, SLA changes, and rollback owners.
- Data export: export historical data and retention policy (CSV, JSON, or vendor export API).
- Migration runbook: cutover steps, integration points to update, migration scripts.
- Monitoring: create pre/post health checks (uptime, error rates, user complaints) and set rollback triggers.
- Contracts: review termination clauses, notice periods and data deletion timelines.
- Post-mortem: capture lessons and update your procurement guardrails.
Real-world example (short case study)
In late 2025, a mid-market SaaS company ran this audit and discovered two underused tools: a separate error-tracking product and a lightweight SaaS that handled internal notifications. The audit found:
- Monthly spend: $3.2k combined
- Error-tracking: 85% of events were already captured in the APM provider (which had a lower cost per event).
- Notifications: only 12 active users (internal engineers) and webhooks had a 95% delivery rate to a general Slack channel — trivial to migrate.
They consolidated error-tracking into the APM and replaced the notification SaaS with internally-managed webhooks. Annual savings: ~$30k and a 30% reduction in alerts maintenance time. The migration took 6 weeks from discovery to full decommission, with zero customer impact because the audit produced exact usage evidence and a tested rollback plan.
Automation recipes & CI checks (make audits routine)
Treat the audit as a pipeline job. Example pipeline tasks:
- Nightly job to update SaaS inventory (billing + repo + secrets scan).
- Weekly BigQuery/Athena queries to recompute candidate scores.
- Alert if a vendor's cost increases > 30% month-over-month or API key last_used > 365 days without recent calls.
Sample GitHub Actions step (pseudo)
name: saas-audit
on:
schedule:
- cron: '0 3 * * 1' # weekly
jobs:
run-audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run inventory scan
run: bash scripts/inventory_scan.sh
- name: Upload to BigQuery
run: python scripts/upload_to_bq.py
Common pitfalls & how to avoid them
- Relying on billing alone — bills lack granular feature mapping. Always join with logs and API telemetry.
- False positives from stale keys — verify last-used and actual calls before cutting access.
- Underestimating internal workflows — interview power users before killing tools used by a small but critical team.
- Ignoring contractual obligations — some contracts have minimum terms or data retention rules.
"Decommissioning without data is risky. Decommissioning with telemetry is accountable and reversible."
Advanced strategies for large orgs (scale & governance)
If you manage hundreds of vendors, add these layers:
- Feature catalogue: a centralized CMDB-style catalogue mapping vendors to capabilities and SLAs.
- Automated dependency graphs: instrument ingestion of CI/CD pipelines, Cloud IAM, and runtime logs to build a service dependency graph (e.g., Neo4j).
- Procurement policy enforcement: CI gate that rejects new SaaS unless a business case, owner and tag are provided.
- Cost guardrails: Programmatic alerts for per-vendor monthly spend limits using cloud provider budgets and vendor webhooks.
Future predictions (2026 and beyond)
Expect these trends in 2026–2027:
- Richer vendor usage APIs: Vendors will provide more granular, machine-readable usage telemetry as customers demand transparent unit pricing.
- Real-time FinOps: Real-time billing streams and predictive budgeting will make sudden spikes easier to catch.
- Policy-as-code for SaaS: Governance frameworks will extend to SaaS procurement, enabling automated approvals and spend limits.
Actionable takeaways — run this in 30/60/90 days
- 30 days: Run the repo + secrets scan and ingest last 90 days of billing into a warehouse. Produce a top-20 vendors by spend report.
- 60 days: Join billing with telemetry, compute CandidateScore, and run API probes for top 10 candidates. Schedule stakeholder reviews.
- 90 days: Execute 1–2 low-risk decommissions, measure savings and update procurement policy to prevent recurrence.
Call to action
If you want a reproducible starter repository with the scans, BigQuery templates and Python probe scripts used in this playbook, grab the frees.cloud DevOps SaaS Audit repo and run the included GitHub Action. Or reach out to your platform team and propose a 90-day audit sprint — start small, automate often, and make each decommission a data-driven win.
Related Reading
- TMNT vs Spider-Man: Which MTG Crossover Is a Better Collector Bet?
- Designing Quote Embroidery: From an Atlas of Stitching to Sellable Goods
- From Reddit to Digg: Building Tamil Community Forums Without Paywalls
- Nintendo's Takedown Decisions: Moderation, Creativity and the Limits of Player Worlds
- Are 3D‑Scanned Insoles Worth It for Cyclists? Science, Comfort, and Placebo
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Comparison: Best free hosting setups for short-form AI video apps — limits you should know
Privacy-first dataset licensing checklist for sourcing creator content for AI
Mini-project: Build a recommendation engine for micro-apps using small LLMs and curated creator datasets
Monetization playbook for micro-app creators: subscriptions, dataset licensing and creator payments
DevOps snippet pack: CI/CD for micro-apps with free CI, canary deploys and rollbacks
From Our Network
Trending stories across our publication group