Terry HTTP Fetcher
Purpose
HTTP fetcher выполняет source_type=http data source в существующей Terry loader цепочке:
Fetcher читает уже сохранённый runtime config из terry.loader_configs.loader_settings, выполняет HTTP request и возвращает JSON payload в fetcher.Result.Body. LoaderActivity сохраняет этот payload в raw object storage и возвращает только LoaderOutput.raw_data и metadata.
Код:
internal/worker/loader/fetcher/http/fetcher.go- typed config:
internal/worker/loader/types.go
Configuration Shape
Минимальный persisted config:
{
"type": "http",
"timeout": 30000000000,
"retry_count": 1,
"http": {
"url": "https://www.mexc.com/api/financialactivity/financial/products/list/V2",
"method": "GET",
"headers": {
"Accept": "*/*",
"User-Agent": "Mozilla/5.0 ..."
},
"response": {
"format": "json",
"items_path": "data"
}
}
}
Supported top-level source fields used by HTTP fetcher:
type— must behttp;timeout— per-request timeout as Gotime.Durationencoded in JSONB;retry_count— legacy per-request retry count fallback;headersandquery— enabled-by-default defaults shared withsource.http.
Supported source.http fields:
url;method—GET,POST,PUT,PATCH,DELETE;headers,query;header_params[],query_params[]—{name, value, enabled}entries; disabled entries are not sent and can remove same-name map defaults;body;response;pagination;retry;follow_redirect.
Request Body
Supported body modes:
body.json— raw JSON payload; fetcher validates JSON and setsContent-Type: application/jsonwhen no explicit content type is configured;body.raw— raw text payload; fetcher setsContent-Type: text/plain; charset=utf-8when no explicit content type is configured.
body.form is intentionally rejected in the current implementation.
Only one body mode may be configured at a time.
Response Handling
Supported response formats:
json— default;textorraw.
For JSON response with response.items_path, fetcher extracts an array at that dot-separated path and stores a normalized envelope:
For JSON response without items_path, fetcher preserves the response JSON as-is for single-request fetches.
For text/raw response, fetcher stores a JSON envelope so the existing raw payload contract remains application/json:
Pagination requires JSON response and response.items_path.
Pagination
Supported modes:
disabledor empty — one request;page;offset;cursor.
Common fields:
max_requestsormax_pages;delay— delay between paginated requests;stop_pathandstop_value— optional explicit stop condition evaluated against JSON response.
Page mode:
page_paramis required;page_startdefaults to1;- optional
page_size_paramandpage_size.
Offset mode:
offset_paramis required;offset_startdefaults to0;offset_stepdefaults tolimit, thenpage_size, then1;- optional
limit_paramandlimit.
Cursor mode:
cursor_paramis required;response.cursor_pathis required;- first request is sent without cursor;
- subsequent requests use cursor from previous response;
- fetcher stops when cursor is missing/empty and rejects unchanged cursor as pagination loop protection.
If no explicit max is configured, fetcher uses an internal safety limit.
Retry And Timeout
source.http.retry.max_attempts overrides source.retry_count.
Retried status codes:
429;500;502;503;504.
Network errors are retried while the context is active. Ordinary non-2xx responses that are not listed above are not retried.
Temporal activity retry remains the outer retry layer configured by RunIngestionWorkflow.
Auth
auth_settings.type=none is supported.
The following modes are recognized but intentionally fail until Terry secret storage is implemented:
bearer;bearer_token;basic;basic_auth;api_key_header;api_key_query;custom_header.
The error is explicit: ErrHTTPAuthCredentialsResolverNotConfigured.
Do not store real tokens, cookies, API keys, or passwords inline in loader_configs. The HTTP fetcher has the seam for future secret resolution, but the secret store itself is out of scope.
SSRF Protection
HTTP fetcher blocks unsafe outbound targets by default:
localhost;- loopback addresses;
- private/internal IP ranges;
- link-local addresses;
- multicast/unspecified addresses;
- metadata/link-local IPs.
The check is applied to:
- initial URL;
- redirects when
follow_redirect=true; - dial addresses after DNS resolution.
Metadata
HTTP fetcher adds operational metadata to loader output:
http_method;http_target;http_host;http_status;http_request_count;http_page_count;http_record_count;http_response_format.
LoaderActivity adds catalog provenance after fetcher execution:
data_source_id;loader_config_id;loader_config_version;loader_config_status;workspace_id;domain_id;page_type;code_name.
Notes
- Browser-copied exchange requests may omit
User-Agent; some CDN-protected endpoints, including the tested MEXC financial products endpoint, return403without a browser-likeUser-Agent. - Full response body, request body, authorization headers, cookies, API keys, and tokens should not be logged by default.