# Connecting Data Sources

# Connecting data sources

A workspace pulls its CRM, call recordings, meeting notes, comms, docs, support
tickets, and social accounts in through **connectors**. Connecting one means
picking a provider from the catalog and authorizing it; from then on the
platform syncs that provider's data into your workspace.

This guide covers the four connect methods and how to connect (and reconnect) a
source on each surface: the Connections page, the REST API, and MCP. It also
explains the `health` badge each connection carries.

The full list of connectors (with each one's connect method, ownership, and the
data streams it brings in) is the
[connector catalog reference](../api-reference/connector-catalog.md).

## The four connect methods

Every connector uses exactly one of four flows. The flow is a property of the
connector (its `authMethod` in the catalog), so you do not choose it; the
surface routes on it for you. Knowing which one applies tells you what you need
on hand before you start.

| Method             | When it applies                                              | What you supply                                                                                       |
| ------------------ | ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------- |
| **OAuth redirect** | HubSpot, Salesforce, Gong, Gmail, Outlook, Slack, Notion     | Nothing up front. You are redirected to the provider, you approve, and the callback stores the token. |
| **API key**        | Pipedrive, Fathom, Granola, Fireflies, Grain, Aircall, Pylon | An API key or token from the provider. It is verified, then stored encrypted.                         |
| **Webhook**        | (reserved; no launch connector uses it yet)                  | A webhook URL and signing secret to paste into the provider.                                          |
| **Handle**         | X / Twitter, LinkedIn                                        | A public handle. The handle is resolved to a confirmed account, then tracked.                         |

Ownership is a second axis. Most connections are **workspace** connections: one
org-wide connection that any admin manages. A few (Gmail, Outlook) are
**personal**: each member connects their own mailbox. The catalog records this
per connector.

Direction is a third axis. Almost every connector is **inbound** - it pulls a
provider's data into your workspace. There is one **outbound** connector,
[Notion Knowledge Sync](./notion-knowledge-sync.md), which does the reverse: it
mirrors your Amdahl knowledge base out into a Notion database in your own Notion
workspace. It connects over OAuth like the inbound connectors, but is configured
and monitored on its own surface - see that guide for the full setup and the
sync policy options.

**Not every catalog entry is connectable yet.** Each connector carries a
`connectable` flag. A **coming-soon** connector (e.g. **Circleback**) is shown in
the catalog for discovery but cannot be connected: it has no live puller behind
it, so a connect is refused with a clear `connector_not_connectable` error and
the UI renders a "Coming soon" placeholder instead of a live Connect button.
OAuth connectors whose provider app is not yet configured for your environment
are likewise marked not-connectable. Reading `connection://catalog` (or
`GET /connections/catalog`) returns `connectable` per entry so you can tell which
ones are ready.

## Connect a source

### Via the Connections page (universal-ui)

The Connections page in the console (universal-ui) is the primary surface for
managing connectors. It has three parts, each backed by a connections read:

1. **The catalog picker** ("Add a connection") lists every connector you can
   connect, grouped by category. It reads `connection://catalog`.
2. **The inventory** lists what you have already connected - every data source,
   including tracked X / LinkedIn accounts (which are first-party connectors
   too). It reads `connection://list`.
3. **A status badge** polls a connecting / syncing connection until it settles.
   It reads `connection://<id>/status`.

To connect:

1. Open the Connections page and click **Add a connection**.
2. Pick a provider. The page routes on the connector's connect method:
   - **OAuth redirect**: you are sent to the provider to approve, then redirected
     back to the Connections page, which flashes the result and shows the new row.
   - **API key**: paste the provider's key into the form and submit. (Fireflies's
     API key requires a Fireflies **Business**-tier plan - lower tiers do not
     expose API access.)
   - **Handle**: enter the public handle; confirm the resolved account.
     A coming-soon connector shows a disabled "Coming soon" tile instead of a
     connect form.
3. The new connection appears in the inventory. A `pending` / `syncing` badge
   resolves to `connected` once the first sync completes, and a `health` badge
   (see [Connection health and statuses](#connection-health-and-statuses)) shows
   how it is doing from then on.

To disconnect, open a connection and choose **Disconnect**. The row is retained
so you can re-connect later; no synced history is purged.

### Via the REST API

The REST surface lives under the platform base URL
(`https://api.amdahl.com/api/platform/v1`) and authenticates with a bearer
token. See [Authentication](../authentication.md).

#### List the catalog

```
GET /connections/catalog
```

Returns every connector you can connect. Optional `?category=crm|calls|comms|docs|support|social`
narrows the list (use `category=social` for X / LinkedIn). This describes what
_can_ be connected, not what _is_ connected.

#### List your connections

```
GET /connections
```

Returns every connection you have established, newest first, as a normalized
list. Each entry carries `id`, `kind` (`first_party`), `connector_type`, `name`,
`status`, `scope`, `last_synced_at`, `last_error`, `is_syncing` (a sync is in
flight right now), `health` (the derived badge - see
[Connection health and statuses](#connection-health-and-statuses)), `last_run`
(a summary of the most recent sync run: its `status`, `started_at` /
`finished_at`, `streams_total` / `streams_failed`, `records_written`, and
`error_reason`, or `null` when the connection has never run), `owner_user_id`
(for personal connections), and (for the social channels X / LinkedIn) `handle` /
`display_name` / `avatar_url`. `followers` is read from the metrics surface
(`social.get_metrics_summary`), not the connection list.

Sync **cadence** is no longer a per-connection field. It is a platform-admin,
per-connector-type global setting (see [Sync cadence](#sync-cadence) below), so
there is no `sync_interval` / `next_sync_at` on a connection.

#### Connect

```
POST /connections
```

The body always carries `connector_type` plus the per-method params:

- **OAuth redirect** connector (e.g. `hubspot`):

  ```json
  {
    "connector_type": "hubspot",
    "name": "Acme HubSpot",
    "scope": "business",
    "return_to": "https://console.amdahl.co/acme/connections"
  }
  ```

  The response is `{ "mode": "oauth_redirect", "authorize_url": "...", "connection": { ... } }`.
  Redirect the user to `authorize_url`; the OAuth callback finishes the
  connection. The `connection` is created in `pending` status until the callback
  lands. Pass an optional `return_to` (a first-party console URL) and the callback
  redirects the browser there with a `?connected=<type>` or
  `?connection_error=<code>` result instead of rendering its own terminal page;
  non-first-party URLs are ignored, falling back to that page.

- **API key** connector (e.g. `fathom`):

  ```json
  { "connector_type": "fathom", "api_key": "<provider-key>", "name": "Acme Fathom" }
  ```

  The response is `{ "mode": "connected", "connection": { ... } }`. The key is
  stored encrypted; the connection is `connected` immediately.

- **Handle** connector (e.g. `twitter`):

  ```json
  {
    "connector_type": "twitter",
    "account_handle": "acme",
    "external_account_id": "1234567890",
    "display_name": "Acme",
    "followers": 4200
  }
  ```

  Resolve the handle to its `external_account_id` first (the handle alone is not
  enough). The response is `{ "mode": "connected", "connection": { ... } }`, or
  `{ "mode": "already_connected", "existing_id": "..." }` when the account is
  already tracked.

`name` and `scope` (`business` | `personal`) are optional; `scope` defaults from
the catalog. Connect stores the connection but does **not** start a sync.

#### Disconnect

```
DELETE /connections/:id
```

Returns `{ "disconnected": true, "id": "..." }`. The `:id` is any connection
(a CRM / calls / docs source or a tracked X / LinkedIn account - all first-party).
The row is retained for re-connect. A non-existent or cross-tenant id returns a
clean `404` that never reveals whether the id exists in another workspace.

#### Reconnect

```
POST /connections/:id/reconnect
```

Restore a connection that has gone bad - one in `error` / needs-reauth /
`disconnected` state - **in place** on the existing row, so its sync history and
id survive (no duplicate row is created). The flow routes on the connector's
method, like connect:

- **OAuth redirect**: returns a fresh `{ "mode": "oauth_redirect", "authorize_url": "...", "connection": { ... } }`.
  Redirect the user through the provider again; the callback refreshes the
  credentials on the same row. Pass an optional `return_to` (a first-party
  console URL) just as with connect.
- **API key**: send a replacement key, `{ "api_key": "<new-key>" }`. It is
  re-stored on the existing source and the connection flips back to `connected`;
  the response is `{ "mode": "connected", "connection": { ... } }`.
- **Handle** (X / LinkedIn): no key to replace - the call clears the error state
  and re-triggers the sync, returning `{ "mode": "connected", "connection": { ... } }`.

Reconnect refuses a connection that is already healthy (`connected` or actively
`syncing`) with a clear error - there is nothing to restore, so disconnect it
first if you intend to re-authorize or replace its key. A non-existent or
cross-tenant id returns a clean `404`.

#### Update a connection (PATCH)

```
PATCH /connections/:id
```

`PATCH` updates a connection's editable fields - its display `name`, and for a
**personal** connector (Gmail / Outlook) its `owner_user_id` (the workspace
member the connection belongs to). It does **not** set sync cadence (cadence is
global - see [Sync cadence](#sync-cadence)):

```json
{ "name": "Acme HubSpot (prod)", "owner_user_id": "<member-user-id>" }
```

Setting `owner_user_id` is only valid on a personal connector and only for a
user who is a member of the workspace; pass `"owner_user_id": null` to clear it.
The response is `{ "updated": true, "connection": { ... } }`. A non-existent or
cross-tenant id returns a clean `404`.

#### Sync cadence

Sync cadence - how often a connector re-syncs - is **not** a per-connection
setting. It is an operational decision Amdahl makes once per connector **type**
and applies to every tenant, set through the platform-admin endpoint
`PUT /api/admin/connector-cadence/:type` (presets `1m`, `10m`, `30m`, `1h`,
`6h`, `1d`, or `null` to clear the override and use the connector default). It is
not exposed on a connection or over MCP.

### Via MCP

MCP clients (Claude Desktop, Cursor, custom agents) see the connections surface
as four **read-only resources**. There is **no** `connections` coarse tool - the
connections MCP surface is reads only:

| Resource                   | What it returns                                                                                                   |
| -------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `connection://catalog`     | The connector catalog (what can be connected), with the `connectable` flag. Optional `kind` / `category` filters. |
| `connection://list`        | Your established connections (what is connected), newest first, each with `health` + `last_run`.                  |
| `connection://<id>`        | One established connection by id.                                                                                 |
| `connection://<id>/status` | Just the sync state of one connection: status, `health`, `is_syncing`, last good sync, last error.                |

The `connection://` reads require the `connections:read` scope (granted to
read-only MCP keys). There are **no** connection WRITES over MCP - connecting,
disconnecting, reconnecting, and re-attributing a connection are **not** on the
MCP surface by design. They are workspace configuration, managed from the console
Connections page and the REST API, the same posture as workspace and team
lifecycle: a leaked key driving them over a coarse-tool dispatch is a
confused-deputy surface. Sync cadence is likewise out of scope for MCP - it is a
platform-admin global setting (see [Sync cadence](#sync-cadence)). An MCP agent
can read what is connected and report on sync health, but a human performs the
connect / reconnect from the console or your app calls the REST endpoints.

## After connecting

Connect stores the connection; it does not start a sync. The first sync is
kicked off separately, and a connection moves through `pending` → `syncing` →
`connected` as data begins to flow. Poll `connection://<id>/status` (MCP) or
`GET /connections/:id/status` (REST) to watch a connection settle. A connection
that needs reauthorization or whose last sync failed surfaces as `error` with a
`last_error` message.

## Connection health and statuses

A connection carries two related fields.

`status` is the coarse lifecycle state: `pending`, `connected`, `syncing`,
`error`, `paused`, or `disconnected`.

`health` is a single **derived** badge that folds the status, the live
`is_syncing` flag, the most recent run's outcome, and how long ago the last
successful sync was into one actionable word. It is computed for you (you never
set it). The values, in precedence order (the first that applies wins):

| `health`       | Meaning                                                                                            |
| -------------- | -------------------------------------------------------------------------------------------------- |
| `disconnected` | You disconnected it. Reconnect to resume.                                                          |
| `needs_reauth` | The provider needs you to re-authorize (an OAuth token expired / was revoked).                     |
| `syncing`      | A sync is running right now (`is_syncing` is true).                                                |
| `rate_limited` | The last run hit the provider's rate limit. It will retry on the next sync.                        |
| `error`        | The connection is errored, or the last run failed outright. See `last_error`.                      |
| `degraded`     | The last run only partially succeeded - some data streams failed.                                  |
| `stale`        | It is connected, but its last successful sync is older than the freshness threshold (default 48h). |
| `healthy`      | Connected, recently synced, and the last run was clean.                                            |

The run-level detail behind `degraded` / `error` / `rate_limited` lives on
`last_run` (the most recent run's `status`, `streams_total` / `streams_failed`,
`records_written`, and bucketed `error_reason`: `auth`, `rate_limit`,
`transient`, `config`, or `unknown`). amdahl-data writes these fields at each
sync-run boundary, so a brand-new connection that has not run yet reports
`last_run: null` and derives its `health` from `status` alone.

## Sync history

`last_run` is just the latest run. For the full recent activity log of a
connection, read its sync-run history:

```
GET /connections/:id/runs        (REST)
connection://<id>/runs           (MCP resource)
```

Optional `?limit=` (default 25, capped at 100). The response is
`{ runs: [...] }`, newest-first, where each run is:

| Field             | Meaning                                                                                                                    |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `id`              | The run id.                                                                                                                |
| `status`          | `pending`, `running`, `completed`, `failed`, or `skipped_locked`.                                                          |
| `triggered_by`    | What kicked the run off (e.g. `connect`, `manual`, `schedule`).                                                            |
| `started_at`      | When the run started.                                                                                                      |
| `finished_at`     | When it reached a terminal outcome, or `null` while it is still running.                                                   |
| `streams_total`   | Streams the run attempted.                                                                                                 |
| `streams_failed`  | How many of those failed.                                                                                                  |
| `records_written` | Rows written by the run.                                                                                                   |
| `error_reason`    | Bucketed failure reason (`auth` / `rate_limit` / `transient` / `config` / `unknown`), or `null` when the run did not fail. |

When you trigger an on-demand refresh, the ack carries the `sync_run_id` of the
run it dispatched (when available), so you can find that exact run in this list.
