Big Data, Small Prompt

AGI has brought out the worst in us. We want to take the lazy road to analytics.

May 16, 2023

∙ Paid

I should be able to load up all my time-series data and use natural language to ask it statistical questions and other analyses.

I hear this often. It’s a nice dream. However, stating what you “should” be able to do is a hypothesis that must also factor in practical boundaries.

Time-series data is typically raw and voluminous. Intentionally, IoT signals are collected to ensure real-time perturbations can be detected and corrected. This is especially important for mission-critical processes where a few missed events sometimes indicate a big problem. But time-series data is also valuable for machine learning. The idyllic goal described above is neither - it’s the lazy pathway to analytics. And I’m okay with that - I love a good lazy approach - it’s how great innovations are made.

IN THIS CASE, the AI “fit” is out of reach (based on my skill set, known approaches, and financial practicalities). Practically speaking, this is a round hole and giant earth mover problem. Putting a mega earth mover in a small hole has one challenge - physics.

AI interfaces (UIs and APIs) are presently limited; they’re tiny holes.

There are indications we’ll soon see 100k prompt capabilities. But that’s nothing compared to extremely granular time-series data - at least the volume that would produce valid assessments.

Anthropics Claude is capable of 100,000 token context windows. But analyzing 100,000 text tokens differs greatly from the comparably-sized time-series data streams.

It’s also challenging to take a slice of the series and expect your analytics to be valid; the entire point of analytics is to factor in lots of data. As such, the only rational pathway I can see is to aggregate extremely detailed data sets, then expose the aggregated summaries to the AI model in discrete learner prompts.

Like this:

Keep reading with a 7-day free trial

Subscribe to Impertinent to keep reading this post and get 7 days of free access to the full post archives.