Skip to content

Terry HTTP Fetcher

Purpose

HTTP fetcher выполняет source_type=http data source в существующей Terry loader цепочке:

RunIngestionWorkflow -> LoaderActivity.Execute -> loader.NewFetcher -> fetcher/http

Fetcher читает уже сохранённый runtime config из terry.loader_configs.loader_settings, выполняет HTTP request и возвращает JSON payload в fetcher.Result.Body. LoaderActivity сохраняет этот payload в raw object storage и возвращает только LoaderOutput.raw_data и metadata.

Код:

  • internal/worker/loader/fetcher/http/fetcher.go
  • typed config: internal/worker/loader/types.go

Configuration Shape

Минимальный persisted config:

{
  "type": "http",
  "timeout": 30000000000,
  "retry_count": 1,
  "http": {
    "url": "https://www.mexc.com/api/financialactivity/financial/products/list/V2",
    "method": "GET",
    "headers": {
      "Accept": "*/*",
      "User-Agent": "Mozilla/5.0 ..."
    },
    "response": {
      "format": "json",
      "items_path": "data"
    }
  }
}

Supported top-level source fields used by HTTP fetcher:

  • type — must be http;
  • timeout — per-request timeout as Go time.Duration encoded in JSONB;
  • retry_count — legacy per-request retry count fallback;
  • headers and query — enabled-by-default defaults shared with source.http.

Supported source.http fields:

  • url;
  • methodGET, POST, PUT, PATCH, DELETE;
  • headers, query;
  • header_params[], query_params[]{name, value, enabled} entries; disabled entries are not sent and can remove same-name map defaults;
  • body;
  • response;
  • pagination;
  • retry;
  • follow_redirect.

Request Body

Supported body modes:

  • body.json — raw JSON payload; fetcher validates JSON and sets Content-Type: application/json when no explicit content type is configured;
  • body.raw — raw text payload; fetcher sets Content-Type: text/plain; charset=utf-8 when no explicit content type is configured.

body.form is intentionally rejected in the current implementation.

Only one body mode may be configured at a time.

Response Handling

Supported response formats:

  • json — default;
  • text or raw.

For JSON response with response.items_path, fetcher extracts an array at that dot-separated path and stores a normalized envelope:

{
  "records": [],
  "request_count": 1,
  "page_count": 1
}

For JSON response without items_path, fetcher preserves the response JSON as-is for single-request fetches.

For text/raw response, fetcher stores a JSON envelope so the existing raw payload contract remains application/json:

{
  "raw": "...",
  "content_type": "text/plain"
}

Pagination requires JSON response and response.items_path.

Pagination

Supported modes:

  • disabled or empty — one request;
  • page;
  • offset;
  • cursor.

Common fields:

  • max_requests or max_pages;
  • delay — delay between paginated requests;
  • stop_path and stop_value — optional explicit stop condition evaluated against JSON response.

Page mode:

  • page_param is required;
  • page_start defaults to 1;
  • optional page_size_param and page_size.

Offset mode:

  • offset_param is required;
  • offset_start defaults to 0;
  • offset_step defaults to limit, then page_size, then 1;
  • optional limit_param and limit.

Cursor mode:

  • cursor_param is required;
  • response.cursor_path is required;
  • first request is sent without cursor;
  • subsequent requests use cursor from previous response;
  • fetcher stops when cursor is missing/empty and rejects unchanged cursor as pagination loop protection.

If no explicit max is configured, fetcher uses an internal safety limit.

Retry And Timeout

source.http.retry.max_attempts overrides source.retry_count.

Retried status codes:

  • 429;
  • 500;
  • 502;
  • 503;
  • 504.

Network errors are retried while the context is active. Ordinary non-2xx responses that are not listed above are not retried.

Temporal activity retry remains the outer retry layer configured by RunIngestionWorkflow.

Auth

auth_settings.type=none is supported.

The following modes are recognized but intentionally fail until Terry secret storage is implemented:

  • bearer;
  • bearer_token;
  • basic;
  • basic_auth;
  • api_key_header;
  • api_key_query;
  • custom_header.

The error is explicit: ErrHTTPAuthCredentialsResolverNotConfigured.

Do not store real tokens, cookies, API keys, or passwords inline in loader_configs. The HTTP fetcher has the seam for future secret resolution, but the secret store itself is out of scope.

SSRF Protection

HTTP fetcher blocks unsafe outbound targets by default:

  • localhost;
  • loopback addresses;
  • private/internal IP ranges;
  • link-local addresses;
  • multicast/unspecified addresses;
  • metadata/link-local IPs.

The check is applied to:

  • initial URL;
  • redirects when follow_redirect=true;
  • dial addresses after DNS resolution.

Metadata

HTTP fetcher adds operational metadata to loader output:

  • http_method;
  • http_target;
  • http_host;
  • http_status;
  • http_request_count;
  • http_page_count;
  • http_record_count;
  • http_response_format.

LoaderActivity adds catalog provenance after fetcher execution:

  • data_source_id;
  • loader_config_id;
  • loader_config_version;
  • loader_config_status;
  • workspace_id;
  • domain_id;
  • page_type;
  • code_name.

Notes

  • Browser-copied exchange requests may omit User-Agent; some CDN-protected endpoints, including the tested MEXC financial products endpoint, return 403 without a browser-like User-Agent.
  • Full response body, request body, authorization headers, cookies, API keys, and tokens should not be logged by default.