Lakeloop

Engineering

Postgres analytics without a warehouse: a 2026 guide for small teams

You don't need Snowflake or BigQuery to analyze a few million rows. Here's how to turn Postgres into a cheap, queryable lakehouse on your own S3 using Parquet, Iceberg and DuckDB.

The warehouse tax for small data

If your company runs on Postgres and you have a few million rows you'd like to slice in a BI tool, the conventional advice is to "stand up a warehouse." So you evaluate Snowflake, BigQuery, or Redshift, set up an ETL pipeline with Fivetran or Airbyte, wire up dbt, and three weeks later you have a $900/month bill and a pipeline only one person understands. For a team of five analyzing under a hundred gigabytes, that is wildly disproportionate.

The uncomfortable truth is that most "analytics workloads" at small companies are not big-data problems. They are medium-data problems that a single modern laptop could chew through in seconds. The bottleneck is not compute — it is the operational overhead of getting data out of your transactional database and into a shape that analytical tools enjoy.

What you actually need

Strip the problem down and there are only three moving parts:

  • A way to extract tables from Postgres without hammering your production database.
  • A columnar storage format that analytical engines can scan fast — that's Parquet, or Apache Iceberg if you want time travel and schema evolution.
  • A SQL endpoint your BI tool can point at.

Notice what's missing: a perpetually-running cluster. Analytical queries over Parquet are so fast on a vectorized engine like DuckDB that you don't need a warehouse sitting idle and billing you by the second. You need storage (cheap) plus on-demand compute (cheap).

Why Parquet on S3 beats a warehouse for this size

Parquet is columnar and compressed. A table that takes 200 MB in Postgres routinely lands under 20 MB as Parquet, and because it's columnar, a query that touches three columns out of forty reads only those three. Stored on S3, that data costs about $0.023 per GB-month — so a 10 GB lake costs roughly a quarter a month to store. Compare that to warehouse storage markups and always-on compute.

Crucially, when the Parquet lives in your own S3 bucket, you are never locked in. Any engine — DuckDB, Trino, Spark, Athena, Polars — can read it. Your vendor can disappear tomorrow and your data is still sitting in open formats you control. This is the single biggest architectural advantage of the lakehouse pattern, and it's available to a five-person team, not just FAANG.

The DIY version (and why it gets annoying)

You can absolutely build this yourself. DuckDB ships a postgres extension that can attach to a Postgres database and a COPY … TO 's3://…' (FORMAT PARQUET) that writes straight to object storage. A weekend of scripting gets you a one-shot export.

The annoyance starts on day two. You need this to run on a schedule. You need it to only re-sync changed rows, not re-dump five million rows every hour. You need credentials managed safely, a place to run the job, retries when the source is briefly unreachable, and a SQL endpoint that your teammates can hit without you handing out raw S3 keys. None of that is hard individually; together it's a small ops project that nobody on a lean team wants to own.

The managed-but-not-locked-in middle path

This is exactly the gap Lakeloop fills. You paste a read-only Postgres connection string, pick the tables you care about, and point it at your own S3 bucket. Lakeloop runs the connector — daily on the free tier, hourly change-data-capture on paid plans — writing Parquet (or an Iceberg catalog) into your storage, and exposes a SQL endpoint your BI tools already speak. You get the convenience of a managed pipeline without the warehouse bill or the lock-in, because the bytes never leave your account.

The pricing reflects the philosophy: storage cost stays on your AWS bill at raw rates, and you pay a flat connector fee — free for one table, $39/mo for ten tables with hourly sync, $99/mo for unlimited tables plus a real Iceberg catalog. Nobody marks up your bytes.

When you should still use a warehouse

To be fair: if you're routinely scanning terabytes, running hundreds of concurrent analyst queries, or you need fine-grained governance across dozens of teams, a real warehouse earns its keep. The lakehouse-on-your-S3 pattern shines specifically for small-to-medium data and small teams — the exact population that gets overcharged today.

Getting started in five minutes

The fastest way to see whether this fits your data is to try it on a real table. Connect a database, pick one table, and run a GROUP BY against the live SQL endpoint. If your dashboards are snappy and your bill is a rounding error, you've found your answer.

Turn your Postgres into a lakehouse on your own S3.

Connect a table free