TeamDay Docs
Examples

BigQuery Insights

Build an AI agent that connects to Google BigQuery, analyzes large datasets, and generates data-driven insights using TeamDay

BigQuery Insights Agent

Build an AI agent that connects to Google BigQuery, runs complex SQL queries, and generates actionable insights from your data warehouse.

What You'll Build

A data analysis agent that:

  • Connects to Google BigQuery via MCP
  • Writes and executes SQL queries on your datasets
  • Analyzes trends, detects anomalies, and segments customers
  • Runs on a schedule to deliver daily or weekly reports

Time to complete: 45-60 minutes

Prerequisites

  • A TeamDay account with an organization
  • Google Cloud project with BigQuery enabled
  • BigQuery datasets with data to analyze
  • Basic SQL knowledge

Architecture

graph LR
    A[User / Mission] --> B[Data Analyst Agent]
    B --> C[BigQuery MCP Server]
    C --> D[BigQuery API]
    D --> E[Your Datasets]
    B --> F[Insights & Recommendations]

Step 1: Google Cloud Setup

1.1 Enable BigQuery API

gcloud config set project your-project-id
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.com

Or enable via the Console: APIs & Services > Enable APIs > BigQuery API.

1.2 Create a Service Account

# Create service account
gcloud iam service-accounts create teamday-bigquery \
  --display-name="TeamDay BigQuery Agent"

SA_EMAIL="teamday-bigquery@your-project-id.iam.gserviceaccount.com"

# Grant BigQuery permissions
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.jobUser"

# Download key file
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
  --iam-account=$SA_EMAIL

Required roles:

  • bigquery.dataViewer -- Read dataset contents
  • bigquery.jobUser -- Run queries
  • bigquery.dataEditor -- (Optional) Write data back

1.3 Create Sample Data (Optional)

If you want to follow along with the examples below:

bq mk --dataset --location=US --description="Analytics Data" \
  your-project-id:analytics

bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
  GENERATE_UUID() as event_id,
  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
  CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
  ['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
  RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"

Step 2: TeamDay Setup

2.1 Store GCP Credentials as Space Secrets

Base64-encode your service account key and store it alongside your project ID:

# Base64 encode the service account file
SA_ENCODED=$(base64 < ~/teamday-bigquery-key.json)

# Store secrets on the space
curl -X POST https://cc.teamday.ai/api/v1/spaces/$SPACE_ID/secrets \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"secrets\": {
      \"GCP_SERVICE_ACCOUNT_JSON\": \"$SA_ENCODED\",
      \"GCP_PROJECT_ID\": \"your-project-id\"
    }
  }"

Secret keys must be UPPER_SNAKE_CASE.

2.2 Create the Data Analyst Agent

curl -X POST https://cc.teamday.ai/api/v1/agents \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BigQuery Data Analyst",
    "role": "Data Analyst",
    "model": "claude-sonnet-4-6",
    "visibility": "organization",
    "tags": ["bigquery", "analytics", "data"],
    "systemPrompt": "You are an expert data analyst with deep knowledge of SQL and BigQuery.\n\n## Capabilities\n- Write efficient BigQuery SQL queries with proper partitioning and clustering\n- Analyze query results for trends, patterns, and anomalies\n- Generate clear insights with business context\n- Make data-driven recommendations\n\n## BigQuery Best Practices\n- Select only needed columns (avoid SELECT *)\n- Use LIMIT for exploratory queries\n- Use approximate aggregation functions when precision is not critical\n- Leverage window functions for complex calculations\n- Use WITH clauses for query readability\n\n## Response Format\nFor each analysis:\n1. Explain your approach\n2. Show the SQL query\n3. Summarize the results\n4. List key insights\n5. Provide actionable recommendations"
  }'

Response:

{
  "success": true,
  "id": "kf8x2mNp4qRt",
  "name": "BigQuery Data Analyst",
  "status": "active",
  "chatUrl": "/agents/kf8x2mNp4qRt/chat"
}

Save the agent ID (kf8x2mNp4qRt) -- you will need it in subsequent steps.

Step 3: Configure BigQuery MCP

3.1 Register the MCP Instance

curl -X POST https://cc.teamday.ai/api/v1/mcps \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "mcpType": "bigquery",
    "name": "BigQuery Analytics",
    "description": "Connection to production BigQuery datasets",
    "isActive": true,
    "credentials": {
      "GCP_PROJECT_ID": {
        "value": "your-project-id",
        "isSecret": false
      },
      "GCP_SERVICE_ACCOUNT_JSON": {
        "value": "'"$SA_ENCODED"'",
        "isSecret": true
      }
    }
  }'

Response:

{
  "success": true,
  "id": "wR3jLm9vKxYz",
  "name": "BigQuery Analytics",
  "mcpType": "bigquery"
}

3.2 Attach Agent and MCP to the Space

curl -X PATCH https://cc.teamday.ai/api/v1/spaces/$SPACE_ID \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "addAgents": ["kf8x2mNp4qRt"],
    "addMcps": ["wR3jLm9vKxYz"]
  }'

3.3 Alternative: .mcp.json in the Space

If you prefer file-based configuration, place an .mcp.json in the space's root directory:

{
  "mcpServers": {
    "bigquery": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-bigquery"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
        "GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
      }
    }
  }
}

3.4 Verify the Connection

Execute the agent with a simple test query:

curl -X POST https://cc.teamday.ai/api/v1/agents/kf8x2mNp4qRt/execute \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"message\": \"List all available datasets and tables in the project.\",
    \"spaceId\": \"$SPACE_ID\"
  }"

Step 4: Usage Examples

4.1 Basic Query

You: How many total events do we have, broken down by type?

Agent: I'll query the user_events table.

SELECT
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
GROUP BY event_type
ORDER BY event_count DESC

Results:
| Event Type | Events | Unique Users | Avg Value |
|------------|--------|--------------|-----------|
| page_view  | 2,534  | 412          | 49.82     |
| click      | 2,489  | 398          | 51.34     |
| signup     | 2,501  | 497          | 48.91     |
| purchase   | 2,476  | 491          | 50.12     |

Total: 10,000 events across ~500 users over the last 30 days.

4.2 User Behavior Analysis

SELECT
  DATE(timestamp) as date,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESC

The agent will interpret results and surface insights like:

  • Conversion funnel health (view > click > signup > purchase rates)
  • Day-over-day trends and growth rate
  • Recommendations for top-of-funnel optimization

4.3 Cohort Retention Analysis

WITH user_cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
),
user_activity AS (
  SELECT
    user_id,
    DATE_TRUNC(timestamp, WEEK) as activity_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id, activity_week
)
SELECT
  cohort_week,
  COUNT(DISTINCT uc.user_id) as cohort_size,
  DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
  COUNT(DISTINCT ua.user_id) as active_users,
  ROUND(
    COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1
  ) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 5 WEEK)
ORDER BY cohort_week, weeks_since_signup

Example output:

CohortSizeWeek 0Week 1Week 2Week 3Week 4
Jan 20178100%56.2%38.8%29.2%24.7%
Jan 27195100%58.5%41.0%31.3%--
Feb 3203100%59.1%43.3%----
Feb 10189100%61.4%------

The agent will identify trends (improving Week 1 retention), pinpoint the biggest drop-off window (Week 1 to Week 2), and recommend interventions like onboarding email sequences or re-engagement triggers.

4.4 Revenue & Customer Segmentation

WITH customer_metrics AS (
  SELECT
    user_id,
    COUNT(*) as total_purchases,
    SUM(value) as total_revenue,
    AVG(value) as avg_order_value,
    DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE event_type = 'purchase'
  GROUP BY user_id
)
SELECT
  CASE
    WHEN total_revenue >= 500 THEN 'VIP'
    WHEN total_revenue >= 200 THEN 'High Value'
    WHEN total_revenue >= 100 THEN 'Medium Value'
    ELSE 'Low Value'
  END as segment,
  COUNT(*) as customers,
  ROUND(SUM(total_revenue), 2) as segment_revenue,
  ROUND(AVG(total_revenue), 2) as avg_customer_value,
  ROUND(AVG(avg_order_value), 2) as avg_order_value,
  ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY segment
ORDER BY segment_revenue DESC

Example output:

SegmentCustomersRevenueAvg ValueAOVDays Since
VIP8$4,234$529.25$77.873.2
High Value23$5,892$256.17$80.055.1
Medium Value67$8,123$121.24$80.838.7
Low Value156$7,234$46.37$42.1512.3

The agent surfaces the Pareto distribution (top 12% of customers drive 40% of revenue), flags the at-risk low-value segment (12+ days inactive), and recommends VIP retention programs and win-back campaigns.

4.5 Anomaly Detection

WITH hourly_metrics AS (
  SELECT
    TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
    event_type,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  GROUP BY hour, event_type
),
historical_avg AS (
  SELECT
    EXTRACT(HOUR FROM timestamp) as hour_of_day,
    event_type,
    AVG(hourly_count) as avg_count,
    STDDEV(hourly_count) as stddev_count
  FROM (
    SELECT
      TIMESTAMP_TRUNC(timestamp, HOUR) as timestamp,
      event_type,
      COUNT(*) as hourly_count
    FROM `your-project-id.analytics.user_events`
    WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
      AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
    GROUP BY timestamp, event_type
  )
  GROUP BY hour_of_day, event_type
)
SELECT
  hm.hour,
  hm.event_type,
  hm.event_count as actual,
  ROUND(ha.avg_count, 1) as expected,
  ROUND(
    (hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2
  ) as z_score,
  CASE
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2
      THEN 'ANOMALY'
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
      THEN 'WARNING'
    ELSE 'NORMAL'
  END as status
FROM hourly_metrics hm
JOIN historical_avg ha
  ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
  AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY z_score DESC

The agent compares current hourly metrics against the trailing 7-day average, flags anything beyond 1.5 standard deviations, and recommends immediate investigation steps for critical anomalies (z-score > 2).

Step 5: Advanced Patterns

5.1 Funnel Analysis

WITH funnel_steps AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
    MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
    MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
    MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY user_id
)
SELECT
  SUM(step_1_view) as viewed,
  SUM(step_2_click) as clicked,
  SUM(step_3_signup) as signed_up,
  SUM(step_4_purchase) as purchased,
  ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as view_to_click_pct,
  ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1)
    as click_to_signup_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1)
    as signup_to_purchase_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as overall_conversion_pct
FROM funnel_steps

5.2 Churn Prediction

Identify users likely to churn based on activity patterns:

WITH user_features AS (
  SELECT
    user_id,
    COUNT(*) as total_events,
    COUNT(DISTINCT DATE(timestamp)) as active_days,
    DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_inactive,
    AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
)
SELECT
  user_id,
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
    WHEN days_inactive > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
    WHEN days_inactive > 3 AND total_events < 5 THEN 'LOW_RISK'
    ELSE 'ACTIVE'
  END as churn_risk,
  days_inactive,
  total_events,
  purchase_count,
  ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_inactive > 3
ORDER BY
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 1
    WHEN days_inactive > 7 AND active_days < 3 THEN 2
    ELSE 3
  END,
  days_inactive DESC

Step 6: Schedule with Missions

6.1 Daily Insights Report

curl -X POST https://cc.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Daily BigQuery Insights",
    "goal": "Analyze yesterday data from BigQuery: 1) Overall event metrics and trends 2) Revenue summary 3) Any anomalies detected 4) Top recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 8 * * *"
    }
  }'

Note: characterId is the agent's ID in the API.

Response:

{
  "success": true,
  "id": "nV7bTq2wXpLk",
  "title": "Daily BigQuery Insights",
  "status": "pending",
  "schedule": {
    "type": "cron",
    "value": "0 8 * * *"
  }
}

6.2 Weekly Deep Dive

curl -X POST https://cc.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Weekly Analytics Deep Dive",
    "goal": "Comprehensive weekly analysis: 1) Cohort retention trends 2) Revenue by customer segment 3) Funnel conversion rates 4) Churn risk report 5) Key insights and strategic recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 9 * * 1"
    }
  }'

Troubleshooting

Authentication failed (403)

  • Verify the service account JSON is valid and base64-encoded correctly
  • Confirm the service account has bigquery.dataViewer and bigquery.jobUser roles
  • Check that the BigQuery API is enabled in your GCP project

Query timeout

  • Add a LIMIT clause for exploratory queries
  • Use partitioned and clustered tables for large datasets
  • Break complex queries into smaller CTEs

No data returned

  • Verify dataset and table names (they are case-sensitive)
  • Check date range filters -- timestamps may be in UTC
  • Test with a simple SELECT COUNT(*) FROM table first

Cost concerns

  • BigQuery charges $5 per TB scanned -- always select only needed columns
  • Use maximumBytesBilled in your query settings to cap cost
  • Partition tables by date and cluster by frequently filtered columns

Cost Estimation

ComponentEstimate
BigQuery queries (typical: 10-100 MB each)$0.00005-$0.0005 per query
BigQuery storage$0.02 per GB/month
TeamDay agent execution (~15K input + ~4K output tokens)~$0.18 per analysis
Monthly total (daily reports)~$6-10/month

Next Steps

Resources