Pipeline Why PFC Compare Estimate Savings Quick Start FAQ Try It ↗ GitHub →
Official DuckDB Community Extension

Stop Paying the
Cloud Log Tax.

pfc-jsonl · pfc-fluentbit · pfc-duckdb · pfc-migrate

Archive JSONL logs at 5–13% of original size. Query any time window instantly with DuckDB — directly from S3, no restore, no full decompression. No egress fees, no heavy infrastructure, no lock-in.

5–13%
of original size
up to 57%
smaller than gzip
<6s
1-hour query via DuckDB
10
enterprise log types
How it works

One Pipeline. Four Steps.

From log collection to millisecond queries — a complete, self-hosted stack. No managed services, no vendor dependencies.

01 — COLLECT
📡
Fluent Bit Forwarder
pfc-fluentbit receives log streams via TCP and buffers them into compression windows.
pfc-fluentbit
02 — COMPRESS
⚙️
PFC Core Engine
BWT → MTF → RLE → sparse rANS O2 pipeline compresses each block and writes a timestamp index.
pfc_jsonl binary
03 — STORE
🗄️
Your Storage
.pfc files land on S3, Azure Blob, GCS or local filesystem. 91% smaller than raw JSONL.
S3 · Azure · GCS · Local
04 — QUERY
🦆
DuckDB Extension
read_pfc_jsonl() uses the block index to jump directly to your time window. Decompress only what you need.
pfc-duckdb
Coverage

Optimized for every enterprise log format.

Purpose-built patterns for the 10 most common log types. If your team produces it, PFC-JSONL understands it.

🌐
API Access Logs
12.76%
28% smaller than gzip
☸️
Kubernetes / Container
7.97%
33% smaller than gzip
📱
Application Logs
10.56%
28% smaller than gzip
🔐
Auth / Security
9.13%
31% smaller than gzip
🖥️
Infrastructure / System
5.31%
46% smaller than gzip
📡
Streaming / Kafka
7.49%
34% smaller than gzip
⚙️
Ops / CI-CD
6.54%
35% smaller than gzip
☁️
Cloud Provider Logs
10.43%
30% smaller than gzip
🔥
Network / Firewall
6.14%
47% smaller than gzip
🛒
E-Commerce / Transactions
10.18%
25% smaller than gzip
Full Benchmark Report — all 10 types, 1 GB datasets ↗
Why switch

Your current setup
is costing you.

Five problems. One pipeline to solve them all.

🔥
Scanning TBs to find 10 minutes of logs
Solution
Block-level timestamp index. DuckDB jumps directly to the relevant time window. Only decompress the blocks you actually need.
💸
S3 egress and scan costs killing your budget
Solution
Smaller files (25% smaller than gzip) + surgical block downloads. You pay for the data you actually read — not the entire archive.
📦
Years of gzip archives you can't afford to query
Solution
pfc-migrate converts existing gz/bz2/zstd archives on S3, Azure or GCS in-region. Lossless verified. No egress. One command.
🖥️
Elasticsearch/Loki clusters eating RAM and budget
Solution
No cluster needed. DuckDB + files. Runs on your laptop or your server. No database maintenance, no phone-home, no surprise bills.
🔒
Vendor lock-in — what if the tool disappears?
Solution
The decompressor is yours. Included in every install. Your .pfc files are readable in 10 years — no subscription, no service dependency.
Comparison

Why not just use...?

Every alternative has a hidden cost. Here's the honest breakdown.

Tool Cost Setup Random Access Your Infra
S3 Select $0.002/GB scanned + egress AWS-only ❌ No block index ❌ AWS lock-in
Athena $5 per TB scanned Glue catalog + partitioning ❌ Full file scan ❌ AWS lock-in
Parquet + Athena $5 per TB scanned Schema upfront, complex pipeline ⚠️ Row groups only ❌ AWS lock-in
Elasticsearch / ELK Cluster cost + 2–3× storage 3–5 servers, 16GB+ RAM ✅ But expensive ✅ Self-hosted
Loki (Grafana) Cluster or Grafana Cloud fees Kubernetes sidecar + object store ❌ No block-level seek ⚠️ Complex ops
PFC-JSONL Pipeline Free for personal / OSS 1 command ✅ Block-level index ✅ Runs anywhere

Calculate your real log costs.

Tell us about your current setup — see exactly what you pay today and what you'd save with PFC.

Monthly raw log volume 500 GB
How often do you query archived logs?
~30 queries / month or exact number:
How far back do you typically search?
~10% of monthly volume per query
How long do you retain logs?
Today
Storage (gzip + S3)
Athena queries
Monthly
Annual
With PFC
Storage (~9%)
Query cost ~$0.00
Monthly
Annual
Annual Savings
Over 1 year
⚠️ Athena scans entire files without perfect hour-level partitioning. Most teams pay 2–5× more than these estimates. PFC reads only the exact blocks you need — no overscanning.

Based on AWS S3 Standard ($0.023/GB/mo), Athena ($5/TB scanned), PFC 5–13% compression ratio (API access logs: ~12.8%, infra/system logs: ~5.3%). *Same-region access assumed; internet egress applies at $0.09/GB.

Already have archives? No problem.
pfc-migrate
Convert existing gzip, zstd or bz2 archives directly on S3, Azure or GCS. In-region conversion — no re-download, no egress cost. One command, lossless, MD5-verified. Parquet migration coming soon.
pfc-convert
Turn legacy log formats into modern JSONL before compression. Apache, Nginx, Syslog and custom formats — normalized and ready for PFC in one step.
Ecosystem

One format. A complete toolchain.

Every piece of your pipeline — from log ingestion to database archiving to SQL queries — covered by purpose-built tools that all speak PFC.

📡 Ingest
pfc-fluentbit
Fluent Bit TCP output → .pfc
pfc-vector
Vector.dev HTTP sink → .pfc
pfc-telegraf
Telegraf HTTP output plugin → .pfc
pfc-otel-collector
OpenTelemetry OTLP/HTTP → .pfc
pfc-kafka-consumer
Kafka / Redpanda consumer → .pfc
pfc-gateway
HTTP REST: POST /ingest + POST /query
🦆 Query & Visualization
pfc-duckdb
DuckDB Community Extension — SQL queries on .pfc files
pfc-gateway
REST API queries — no DuckDB required
pfc-grafana
Native Grafana datasource plugin
pfc-py
Python client library (PyPI: pfc-jsonl)
🗄️ Archive & Export
pfc-archiver-*
Autonomous archive daemons: CrateDB, QuestDB, InfluxDB, ClickHouse, TimescaleDB
pfc-export-*
One-shot DB table exports: same 5 databases
pfc-migrate
Convert gzip/zstd/bz2/lz4 → .pfc on S3, Azure, GCS
pfc-convert
Apache CLF, nginx, syslog, CSV → JSONL → .pfc
pfc-ingest-watchdog
Auto-compress when new files arrive (folder or S3)
Under the hood

Built on proven algorithms.

Not another gzip wrapper. A purpose-built compression pipeline for structured log data.

Compression Pipeline
BWT
MTF
RLE
sparse rANS O2

Burrows-Wheeler Transform reorders data for maximum symbol locality. Sparse rANS O2 entropy coding achieves near-theoretical compression limits. Block structure enables parallel compression and random access.

🦆
Official DuckDB Community Extension
Verified and published on the DuckDB Community Hub
INSTALL pfc FROM community;
Compression Benchmarks
CompressorRatioRandom Access
PFC-JSONL
5–13%
✅ Block-indexed
gzip -9
~12%
❌ Stream only
zstd -3
~14%
❌ Stream only
xz -6
~10%
❌ Very slow

Ratio varies by log type. Tested on 1 GB real-world datasets across 10 enterprise log formats. Full benchmark report ↗

Quick Start

Get started in 60 seconds.

Three paths into the pipeline. Pick the one that fits your stack.

-- Install once
INSTALL pfc FROM community;
LOAD pfc;

-- Query with timestamp filtering
SELECT level, message, service
FROM read_pfc_jsonl('logs/2026-01-01.pfc')
WHERE ts >= 1735686000 AND ts < 1735689600
  AND level = 'ERROR';

-- Works on local files or mounted paths
-- Only decompresses matching blocks
# Step 1: Install pfc_jsonl binary
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# Step 2: Download and start the forwarder
curl -L https://raw.githubusercontent.com/ImpossibleForge/pfc-fluentbit/main/pfc_forwarder.py \
     -o /opt/pfc_forwarder.py
python3 /opt/pfc_forwarder.py

# Step 3: Point Fluent Bit at it (fluent-bit.conf)
# [OUTPUT]
#     Name    tcp
#     Match   *
#     Host    127.0.0.1
#     Port    5170
#     Format  json_lines
# Convert existing S3 archives (no egress — runs in-region)
pip install "pfc-migrate[s3]"

pfc-migrate s3 --bucket my-logs --prefix 2025/ --pattern "*.gz"

# Azure Blob Storage
pip install "pfc-migrate[azure]"
pfc-migrate azure --container logs --pattern "*.gz"

# Lossless verified — MD5 checked before original is touched
# Supports: .gz / .bz2 / .zst / .lz4  →  .pfc
⚖️
Simple, honest licensing.

Free for personal use and open-source projects.
No account. No signup. No usage limits. No phone-home. Your data stays in your infrastructure.

Personal use & open-source: completely free, forever.

Commercial use? Contact us.

[email protected]

FAQ

Common questions.

Straight answers, no fluff.

How do I query .pfc files that are stored on S3?
The DuckDB extension reads from local file paths. Download the relevant .pfc file first (only the time-window blocks you need are fetched), then run your query via DuckDB. You can also mount S3 via s3fs or rclone for transparent access. We intentionally avoid vendor-locked S3-native query APIs.
Is this production-ready?
The PFC binary and DuckDB extension are stable and tested on datasets up to 5GB. The pipeline has been stress-tested end-to-end including lossless migration of all major archive formats. We recommend testing on your own data before rolling out to production.
What happens to my data if ImpossibleForge shuts down?
Nothing. The decompressor binary is included in every install and will keep working. Your .pfc files are yours. There is no cloud dependency, no license server, no phone-home. You can decompress everything back to plain JSONL at any time.
Does pfc-migrate delete my original archives?
No — never without your explicit confirmation. pfc-migrate converts and uploads the .pfc version first, verifies it losslessly via MD5 checksum, and only then asks whether to delete the original. The original is never touched before verification passes.
Does it work with Kubernetes / Docker?
Yes. The pfc-fluentbit forwarder runs as a standalone process or sidecar container. Point your Fluent Bit OUTPUT to tcp://localhost:5170 and the forwarder handles the rest. Docker images are available for both the forwarder and the CLI.
What log formats are supported besides JSONL?
PFC-JSONL is optimized for structured JSONL logs. For Apache/Nginx/syslog formats, see pfc-log — a separate tool with pattern sets tuned for traditional web server logs.
Ready?

Your logs.
Your infra.
Your PFC Pipeline.

Free for personal and open-source use. No account, no signup, no limits.

Commercial use? [email protected]