Starter Template: Automated Tool-Usage Detector (Python + SQL + Dashboard)
templatesdevopsautomation

Starter Template: Automated Tool-Usage Detector (Python + SQL + Dashboard)

UUnknown
2026-02-25
11 min read
Advertisement

Fork a GitHub-ready Python+SQL template that pulls billing and API usage, normalizes metrics, and surfaces underused SaaS with alerts.

Hook: Stop paying for what nobody uses — a forkable starter to detect wasted SaaS spend

If you manage infrastructure or developer tooling in 2026, your inbox is full of invoices and your finance team is tired of surprises. The core problem is rarely an individual overpriced license — it’s the hidden layers of cost and friction across dozens of SaaS products. This article gives you a GitHub-ready template (Python + SQL + dashboard + alerts) that pulls billing and API usage, normalizes metrics, and surfaces underused platforms with alerting rules — ready to fork for any organization.

  • Usage-based pricing and AI tokens: Late 2024–2025 accelerated a shift from flat per-seat licensing to usage- or token-based pricing for AI services and analytics. That makes small, sporadic usage patterns harder to predict but easier to burn cash on when left unchecked.
  • Tool sprawl and FinOps focus: In 2025 more engineering orgs adopted FinOps practices for cloud; now, in 2026, the next frontier is SaaS spend observability and operationalization of deprovisioning workflows.
  • APIs are better but fragmented: Many vendors added richer billing and reporting endpoints in late 2025, but each exports different metrics, units, and identity models — forcing normalization.
  • Policy-first automation: Teams are shifting to automated governance: detect underused tools, open a ticket, and cut or renegotiate subscriptions — with human review gates.

What you’ll get from this template

  • A lightweight, forkable repo layout and Docker-ready services
  • Python connectors for billing & usage APIs (examples: Stripe, AWS cost, GitHub, Datadog, and a generic SaaS connector)
  • Normalized SQL schema and example PostgreSQL queries to detect underuse
  • An orchestration example (Prefect flow + GitHub Actions schedule)
  • A Grafana/Metabase dashboard spec and concrete alerting rules (Slack + PagerDuty)
  • Operational playbook: rate limits, secrets, tests, on-call workflows

High-level architecture (fork and run)

Design goals: minimal infra (managed Postgres), cron-style orchestration, and a dashboard + alert layer. Components:

  • Ingest workers — Python scripts that fetch billing/usage from vendor APIs, normalize, and write to Postgres.
  • Normalization DB — PostgreSQL with a small star schema for costs, usage events, identities, and metadata.
  • Orchestrator — Prefect flows triggered by GitHub Actions or scheduler to run ingest, backfill, and checks.
  • Dashboard — Grafana or Metabase connected to Postgres for visualizing cost/usage and underuse detectors.
  • Alerting — Grafana alerts + webhook forwarding to Slack/PagerDuty and an automated ticket opener in Jira/GitHub Issues.
usage-detector/
├─ README.md
├─ docker-compose.yml
├─ infra/
│  └─ postgres/ (init scripts)
├─ connectors/
│  ├─ stripe_connector.py
│  ├─ aws_cost_connector.py
│  ├─ generic_saas_connector.py
│  └─ github_connector.py
├─ normalization/
│  └─ sql_schema.sql
├─ orchestrator/
│  ├─ flows.py (Prefect)
│  └─ ci.yml (GitHub Actions)
├─ dashboards/
│  └─ grafana/ (JSON model)
├─ alerts/
│  └─ alert_rules.yml
└─ docs/
   └─ runbook.md

Normalization schema (core tables)

Normalize vendor outputs into a consistent set of columns. Keep the table structure small to make queries fast.

-- normalization/sql_schema.sql

CREATE TABLE vendors (
  vendor_id SERIAL PRIMARY KEY,
  vendor_key TEXT UNIQUE, -- e.g. "stripe", "aws", "datadog"
  display_name TEXT,
  metadata JSONB
);

CREATE TABLE resources (
  resource_id SERIAL PRIMARY KEY,
  vendor_id INT REFERENCES vendors(vendor_id),
  resource_key TEXT,      -- vendor-provided id or SKU
  name TEXT,
  type TEXT,              -- "service", "seat", "api-key"
  metadata JSONB
);

CREATE TABLE cost_events (
  event_id BIGSERIAL PRIMARY KEY,
  vendor_id INT REFERENCES vendors(vendor_id),
  resource_id INT REFERENCES resources(resource_id),
  usage_date DATE,        -- the date the cost applies to
  cost_amount NUMERIC(12,4),
  currency TEXT,
  raw_payload JSONB,
  ingested_at TIMESTAMP DEFAULT now()
);

CREATE TABLE usage_metrics (
  metric_id BIGSERIAL PRIMARY KEY,
  vendor_id INT REFERENCES vendors(vendor_id),
  resource_id INT REFERENCES resources(resource_id),
  metric_name TEXT,       -- e.g. "api_calls", "active_users", "seats_used"
  metric_value NUMERIC,
  metric_units TEXT,
  metric_date DATE,
  raw_payload JSONB,
  ingested_at TIMESTAMP DEFAULT now()
);

CREATE INDEX ON usage_metrics(metric_name, metric_date);
CREATE INDEX ON cost_events(usage_date);

Python connector pattern (copy & adapt)

Each connector follows the same pattern: paginate API, map vendor fields to normalized schema, upsert resource metadata, and insert cost/usage rows.

# connectors/generic_saas_connector.py
import os
import requests
import psycopg2
from datetime import date

DB_DSN = os.getenv('DB_DSN')
API_KEY = os.getenv('SaaS_API_KEY')
BASE_URL = os.getenv('SaaS_BASE_URL')

def get_vendor_id(conn, vendor_key, display_name):
    with conn.cursor() as cur:
        cur.execute("SELECT vendor_id FROM vendors WHERE vendor_key=%s", (vendor_key,))
        row = cur.fetchone()
        if row:
            return row[0]
        cur.execute("INSERT INTO vendors (vendor_key, display_name) VALUES (%s,%s) RETURNING vendor_id", (vendor_key, display_name))
        return cur.fetchone()[0]

def upsert_resource(conn, vendor_id, resource_key, name, rtype, metadata):
    with conn.cursor() as cur:
        cur.execute(
            "SELECT resource_id FROM resources WHERE vendor_id=%s AND resource_key=%s",
            (vendor_id, resource_key)
        )
        row = cur.fetchone()
        if row:
            return row[0]
        cur.execute(
            "INSERT INTO resources (vendor_id,resource_key,name,type,metadata) VALUES (%s,%s,%s,%s,%s) RETURNING resource_id",
            (vendor_id, resource_key, name, rtype, metadata)
        )
        return cur.fetchone()[0]


def fetch_and_write():
    conn = psycopg2.connect(DB_DSN)
    vendor_id = get_vendor_id(conn, 'generic_saas', 'Generic SaaS')

    # Example: paginated endpoint returning usage items
    page = 1
    while True:
        r = requests.get(f"{BASE_URL}/usage?page={page}", headers={"Authorization": f"Bearer {API_KEY}"}, timeout=30)
        r.raise_for_status()
        data = r.json()
        if not data['items']:
            break
        with conn.cursor() as cur:
            for item in data['items']:
                resource_key = item['id']
                resource_id = upsert_resource(conn, vendor_id, resource_key, item.get('name'), item.get('type', 'service'), item)
                # Insert usage metric
                cur.execute(
                    "INSERT INTO usage_metrics (vendor_id,resource_id,metric_name,metric_value,metric_units,metric_date,raw_payload) VALUES (%s,%s,%s,%s,%s,%s,%s)",
                    (vendor_id, resource_id, 'api_calls', item.get('calls', 0), 'count', item.get('date'), item)
                )
                # If cost present
                if item.get('cost') is not None:
                    cur.execute(
                        "INSERT INTO cost_events (vendor_id,resource_id,usage_date,cost_amount,currency,raw_payload) VALUES (%s,%s,%s,%s,%s,%s)",
                        (vendor_id, resource_id, item.get('date'), item.get('cost'), item.get('currency', 'USD'), item)
                    )
        conn.commit()
        page += 1
    conn.close()

if __name__ == '__main__':
    fetch_and_write()

Connector tips and 2026 considerations

  • Respect rate limits: Use exponential backoff and persistent cursors. Many vendors added stricter quotas in 2025 to protect AI models.
  • Timezones & billing windows: Normalize dates to UTC and store vendor-reported windows in raw_payload for reconciliation.
  • Identity mapping: Map vendor user ids or team ids to your SSO IDs where possible to attribute seats to teams.
  • Idempotency: Use event ids and upserts to make connectors safe to re-run.

Underuse detection logic — a practical algorithm

Goal: identify resources that cost > X/month but have low activity. Basic scoring formula:

underuse_score = (normalized_monthly_cost / max(1, normalized_monthly_active_units))

Where:

  • normalized_monthly_cost — vendor cost normalized to USD for the 30-day window
  • normalized_monthly_active_units — a combined metric of active users, API calls (scaled), or hours used

Example SQL: detect platforms with high cost per active unit

-- Example: compute 30-day cost and active_units per resource
WITH last_30_days AS (
  SELECT date_trunc('day', current_date) - s.a AS day
  FROM generate_series(0,29) s(a)
),
cost_30 AS (
  SELECT resource_id, SUM(cost_amount) AS cost_30
  FROM cost_events
  WHERE usage_date >= current_date - INTERVAL '30 days'
  GROUP BY resource_id
),
active_30 AS (
  SELECT resource_id, SUM(metric_value) FILTER (WHERE metric_name IN ('active_users','seats_used')) +
         SUM(metric_value) FILTER (WHERE metric_name='api_calls')/10000.0 AS active_units
  FROM usage_metrics
  WHERE metric_date >= current_date - INTERVAL '30 days'
  GROUP BY resource_id
)
SELECT r.resource_id, r.name, v.display_name, coalesce(c.cost_30,0) AS cost_30, coalesce(a.active_units,0) AS active_units,
       coalesce(c.cost_30,0) / GREATEST(coalesce(a.active_units,0),1) AS cost_per_active_unit
FROM resources r
JOIN vendors v ON r.vendor_id = v.vendor_id
LEFT JOIN cost_30 c ON c.resource_id = r.resource_id
LEFT JOIN active_30 a ON a.resource_id = r.resource_id
ORDER BY cost_per_active_unit DESC
LIMIT 50;

Interpretation

  • Sort by cost_per_active_unit to prioritize targets.
  • Filter out infra or reserved services where low activity is expected.
  • Present top N to business owners for human review before deprovisioning.

Alerting rules and automation playbook

Automate detection → enrich with metadata → notify → open a ticket. Use thresholds and anomaly detection together.

  1. Static threshold rule: cost_per_active_unit > $1000 for two consecutive 7-day windows → create Slack alert to owner channel.
  2. Anomaly rule: sudden 3x drop in active units while cost remains steady → open a ticket (label: investigate underuse).
  3. Human-in-the-loop: every automated action must include a review step before cutting access or cancelling subscriptions. Use a GitHub Issue template with checkboxes for negotiation/backup export.
# alerts/alert_rules.yml (Grafana style simplified)
- name: High cost per active unit (30d)
  query: |
    SELECT resource_id, cost_30 / GREATEST(active_units,1) AS cost_per_active_unit FROM analytics_cost_per_unit WHERE date = current_date
  condition: cost_per_active_unit > 1000
  for: 2d
  notify:
    - slack: '#finops-alerts'
    - pagerduty: 'P123456'
  action: '/scripts/open_issue.sh --resource_id={resource_id} --template=underused'

Dashboard essentials (Grafana / Metabase)

Minimum dashboards to include in the template:

  • Overview: 30/90/365-day cost trends by vendor
  • Usage vs cost: scatter plot of cost_per_active_unit
  • Top candidates: table of resources with owner, cost_30, active_units, score, and recent anomalies
  • Alert history: list of triggered alerts and resolution status

Orchestration example: Prefect flow + GitHub Actions

Use Prefect for local dev and graceful retries; schedule via GitHub Actions for small orgs to avoid extra infra.

# orchestrator/flows.py (simplified)
from prefect import flow, task
from connectors.generic_saas_connector import fetch_and_write

@task(retries=3, retry_delay_seconds=60)
def run_connectors():
    fetch_and_write()

@flow
def ingest_flow():
    run_connectors()

if __name__ == '__main__':
    ingest_flow()
# .github/workflows/ci.yml (schedule)
name: scheduled-ingest
on:
  schedule:
    - cron: '0 2 * * *' # daily at 02:00 UTC
  workflow_dispatch: {}

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install deps
        run: pip install -r requirements.txt
      - name: Run ingest
        env:
          DB_DSN: ${{ secrets.DB_DSN }}
          SaaS_API_KEY: ${{ secrets.SAAS_API_KEY }}
          SaaS_BASE_URL: ${{ secrets.SAAS_BASE_URL }}
        run: python orchestrator/flows.py

Security and operational best practices

  • Secrets management: never check API keys. Use GitHub Secrets or Vault and ephemeral tokens where possible.
  • Least-privilege API keys: create read-only keys for billing/usage endpoints.
  • Testing: include unit tests for mapping logic and integration tests with recorded vendor responses (VCR-style cassette files).
  • Monitoring the monitor: track connector success rates, lag, and schema drift. Create an internal heartbeat metric written to Postgres and monitored by Grafana.

Case study (hypothetical, realistic)

Team X has 120 SaaS subscriptions across engineering, marketing, and data. After deploying this starter template and running for 6 weeks, they discovered:

  • 10 platforms with cost_per_active_unit > $5,000. After review, 6 were unused sandboxes and were cancelled — saving $22k/month.
  • 3 AI services spiking due to a broken CI job. Automated alerts caught the issue and reduced runaway token spend by 80% in 48 hours.
  • By attributing seat usage to teams, they renegotiated 2 large licenses to team-based access, saving another $10k/year.
Operationalizing detection + human review returned ROI in under one month for this hypothetical org.

Advanced strategies (if you want to level up)

  • Attribution models: attribute costs to projects using tag maps, usage patterns, and SSO identities for chargeback/showback.
  • Anomaly detection with ML: train a simple seasonal model (Prophet or a lightweight LSTM) on usage metrics to detect drop-offs or spikes beyond static thresholds.
  • Automated negotiation workflow: where policies allow, raise a renewal note and capture historical usage to present to vendor reps for discounts.
  • Centralized license pool: for organizations with many one-off tools, create a supervised pool of licenses accessible via approvals to reduce duplication.

2026 future predictions and how to plan

  • More granular vendor telemetry: Expect vendors to release more fine-grained usage endpoints (per-feature usage) — design schemas to accept arbitrary metric_name/metric_units now.
  • Shift-left on SaaS procurement: Teams will gate purchases with a spend tag and automated enrollment into the usage detector before approvals.
  • Policy APIs: In 2026 we’ll see broader support for policy-as-code for SaaS purchases, letting you automatically enforce rules from the template.

Onboarding checklist: fork & go

  1. Fork the repo and populate secrets (DB_DSN, vendor API keys) in GitHub Secrets or your CI/CD secrets store.
  2. Run docker-compose up to provision Postgres and Grafana for local testing.
  3. Enable one connector (start with the largest vendor) and run a backfill for 90 days to seed trends.
  4. Install the dashboard JSON and configure alert webhooks.
  5. Run the Prefect flow via GitHub Actions schedule; confirm heartbeats and connector success metrics.
  6. Invite finance and tool owners to review the Top Candidates dashboard weekly and define an SLA for response.

Actionable takeaways

  • Start small: onboard 1–3 high-cost vendors first, measure impact, then scale connectors.
  • Normalize early: store raw payloads plus normalized metrics to make audits straightforward.
  • Automate detection, humanize remediation: auto-open tickets but require human approval before cancelling or revoking access.
  • Measure ROI: track monthly saved cost vs operational hours spent — present this at the next FinOps review.

Where to find the template

The full GitHub-ready template is structured for forking: connectors you can copy, a normalization schema, orchestration examples, dashboard JSONs, and alert rules. Clone, run the onboarding checklist, and adapt metadata fields to your org.

Final notes and call-to-action

In 2026, saving on SaaS is no longer a purely procurement exercise — it’s an operational and engineering problem. This starter template turns scattered billing signals into a reproducible pipeline: ingest, normalize, detect, alert, and remediate. Fork it, run it, and share your custom connectors back to the community to accelerate others.

Ready to fork and run? Download the template on GitHub, run the onboarding checklist, and open an issue in the repo with your biggest blocker — we’ll add a connector or an example to the starter within a week.

Advertisement

Related Topics

#templates#devops#automation
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-25T04:40:57.650Z