To Data & Beyond

To Data & Beyond

Jaa

The place for continuous learning on Advanced AI, GenAI, and Data Science

31/05/2026

The model release cycles from Anthropic & OpenAI are genuinely insane, previously we had 6-12 months between updates, now models are released every 1.5 months.

This is an under-appreciated reason why Anthropic & OpenAI are in the lead. Google's releases are not as fast, sometimes 2-3x as long between major models. We've seen the same from DeepSeek - it took over a year to go from V3 to V4, which meant that they fell behind.

Note, these are only key model releases, not including mini, flash, pro, voice, image, video etc models, so all labs are shipping more than is listed here but want to get a more consistent picture.

30/05/2026

I bundled my 7 AI courses into one package.

Instead of buying them separately for $120, you can now get the full bundle for $50. That is more than 60% off.

The bundle includes:

- Designing Multi-Agent Deep Search Systems Course
- Claude Code Skills 101: Build Your First Skill in 1 Hour
- Designing Production-Ready RAG System Crash Course
- Build a Resume That Lands AI Engineering and Data Science Interviews
- Building Agentic RAG Application with DeepSeek R1 Crash Course
- LLM Roadmap Crash Course: From Beginner to Advanced Level
- Building an Outstanding Data Science Portfolio Crash Course

If you want to learn practical AI engineering concepts, build better projects, and improve how you present yourself to employers, this bundle should be useful.

➡️ You can get the full bundle from here:
https://youssefhosni.gumroad.com/l/hvuiwm

29/05/2026

Microsoft has open-sourced SkillOpt: a framework that trains agent skills the way you'd train a neural network, without ever touching model weights.

The idea is to treat a plain markdown skill file as the trainable parameter of a frozen LLM agent, then optimize it with the same discipline you'd use on weights: learning rate, validation gate, batch size, and epoch schedule.

Here's how it works:

▪️The skill document is the parameter
▪️Trajectory-derived edits are the gradient
▪️The edit budget is the learning rate
▪️A held-out split is the validation check

How the loop runs:

▪️ A frozen model runs tasks with the current skill and logs scored trajectories
▪️A separate optimizer model analyzes failures in minibatches, proposes structured add/delete/replace edits, and ranks them under a budget cap
▪️An edit is accepted only if it improves the held-out split; rejected edits are stored so the optimizer stops repeating them

Deployment is a single best_skill.md, ~300-2,000 tokens. No weight changes, no extra inference-time calls.

The numbers hold up: best or tied on all 52 (model, benchmark, harness) cells, beating human, one-shot, Trace2Skill, TextGrad, GEPA, and EvoSkill skills.

On GPT-5.5, it adds +23.5 points in direct chat, +24.8 in the Codex loop, and +19.1 in Claude Code over the no-skill baseline.

What got me: the best skills landed with just 1-4 accepted edits across the entire run. The output reads like rules a careful engineer would write after a day with the benchmark, except they were found automatically.

SkillOpt isn't alone here. Hermes Agent reached the same idea independently through skill_manage, Curator, and a GEPA optimization loop that scores, mutates, and promotes skill docs across runs.

Two teams, different architectures, one conclusion: in a frozen-model agent, the skill file is the highest-leverage thing to optimize.

29/05/2026

A lot of people are fine-tuning LLMs.

But the more important question is:

What is actually being updated inside the model?

Because fine-tuning does not always mean changing all the model weights.

With parameter-efficient fine-tuning, the base model often stays frozen, and the adaptation happens through small extra components.

Here are five techniques that show this clearly.

1. LoRA: learn the update, not the original weights

LoRA freezes the pretrained weight matrix `W`.

Instead of updating `W` directly, it trains two small matrices:

A and B

The update is:

ΔW = BA

So the effective weight becomes:

W = W + BA

The original weights stay untouched. The model adapts through a small, low-rank update.

2. LoRA-FA: freeze even more

LoRA-FA keeps the same structure, but freezes A too.

So W is frozen.

A is frozen.

Only B is trained.

Same idea as LoRA, but with fewer trainable parameters.

3. VeRA: learn scaling instead of matrices

VeRA goes one step further.

Both A and B are randomly initialized and frozen.

Instead of learning the matrices, VeRA learns small scaling vectors that control how much the frozen low-rank update contributes.

So LoRA learns the update.

VeRA learns how to scale a fixed update.

That makes it extremely parameter-efficient.

4. Delta-LoRA: let the base weights move, but carefully

Delta-LoRA is different because `W` is not fully frozen.

The base weights evolve using the change between low-rank updates across training steps:

W^(t+1) = W^t + c(B_(t+1)A_(t+1) − B_tA_t)

So W changes, but only through low-rank delta propagation.

It sits somewhere between LoRA and full fine-tuning.

5. LoRA+: same LoRA, smarter learning rates

LoRA+ keeps the same structure as LoRA.

W is frozen.

A and B are trained.

The difference is that B gets a larger learning rate than A:

η_B > η_A

Small change, but it can make LoRA training more effective.

The core idea behind all five methods is the same:

You do not always need to update the full model to adapt an LLM.

- LoRA trains two matrices.

- LoRA-FA trains one.

- VeRA trains scaling vectors.

- LoRA+ trains two matrices with different learning

28/05/2026

It's clear that growth for coding tools such as Claude Code has decelerated from the pace it was since the start of the year.

It might be compute-constrained or due to many clients blowing their full-year AI budgets.

I think it is very normal, since most software engineers are already using it at the moment, who are their base customers. The true measurements should be the number of tokens consumed, and I believe it will still be increasing, maybe exponentially

27/05/2026

Eid Al-Adha Mubarak from To Data & Beyond.

Wishing you and your families peace, joy, and blessings during this special time.

May this Eid bring you closer to the people you love and give you space to reflect, recharge, and move forward with gratitude.

26/05/2026

Most "AI research agents" people build are doing one retrieval call and calling it research.

Basic RAG does the same thing for every query: embed, retrieve k chunks, stuff them in the prompt, summarize. That works for "summarize this PDF." It falls apart on anything that looks like real research.

Real research questions need more than retrieval:

- Plan the search before running it. What to look for, in what order, when to stop

- Call multiple tools, not just a vector store. Web search, code, calculators, scrapers

- Validate sources instead of trusting whatever ranked first

- Check freshness when the topic moves week to week

- Reconcile contradictions across sources instead of averaging them out

- Merge evidence into a structured answer with provenance, not a paragraph of summary

That gap is what a deep search agent closes. Same LLM, completely different architecture around it.

We are writing a 12-part series, building one from scratch. Part 1 is up, in which I introduce and frame the problem and cover the high-level architecture we will build from scratch in this series.

Read it below ⬇️

25/05/2026

The tech layoff story is becoming less subtle.

Meta is reportedly cutting thousands of roles while committing tens of billions of dollars to AI infrastructure. Cisco’s CEO described cutting 4,000 jobs as “optimistically low.” Intuit laid off 3,000 employees as part of a restructuring around AI, while still saying publicly that it was “not about AI.”

And according to TrueUp, more than 100,000 tech jobs have already been impacted in 2026, with projections pointing to a much higher number by the end of the year.

The layoffs themselves are not the only important part here. Tech companies have always restructured, shifted priorities, and reduced teams when the market changed.

What feels different now is how openly the tradeoff is being discussed.

Human headcount is increasingly being treated as a budget line that can be redirected toward AI infrastructure, GPU clusters, and automation-heavy operating models.

That part used to be implied.

Now it is becoming part of the strategy.

23/05/2026

RAG vs. CAG, explained simply.

RAG is useful because it lets the model retrieve the right context at query time.

But it also has a practical problem:

Every user query may need to hit the vector database.

That makes sense when the knowledge is changing, but it becomes wasteful when the information is mostly static.

Think about things like:

- Internal documentation
- Product policies
- API references
- Company guidelines
- Course material

If this information does not change often, retrieving it again and again adds unnecessary cost and latency.

This is where Cache-Augmented Generation, or CAG, becomes useful.

The idea is simple:

Instead of retrieving the same static context every time, you cache it so the model can reuse it more efficiently through its KV cache or prompt caching mechanism.

In practice, the strongest setup is not RAG vs. CAG.

It is RAG + CAG.

You split your knowledge into two layers:

- Static knowledge goes into the cache.

- Dynamic knowledge stays in retrieval.

For example:

- Static data like policies, documentation, and stable reference material can be cached.

- Dynamic data like recent updates, live documents, user-specific records, or frequently changing content should still be retrieved.

This gives you a better balance:

- Faster responses.
- Lower cost.
- Less repeated retrieval.
- And a cleaner architecture.

But the important part is being selective.

You should not cache everything.

Cache only the knowledge that is stable, valuable, and repeatedly used.

If you cache too much, you will hit context limits and make the system harder to manage.

A better way to think about it:

Cold knowledge → cache it.
Hot knowledge → retrieve it.

This keeps the system both efficient and reliable.

And this is not just theoretical.

OpenAI and Anthropic already support prompt caching in their APIs, so you can start applying this pattern today.

22/05/2026

Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources.

Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months.

American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products.

This is how the AI revolution is different from the INDUSTRIAL revolution.

Steam engines made work faster but also cheaper. machines were expensive to build but once the factory was running, each product became cheaper to make.

AI is complicated.

AI is making work more productive (arguably) but with token-based pricing, you don’t own the machine. you rent it every time it thinks, writes, edits, debugs, or retries.

If the AI machine produces faster, the bill also grows bigger. the AI revolution may lower labour time but it can also raise usage cost to the point where the “replacement” becomes more expensive than the work it replaced.

Oma koulu listan huipulle Koulu Helsinki :ssa?

Klikkaa tästä saadaaksesi sponsoroidun listauksen.

Alue

Kategoria

Puhelin

Osoite


Helsinki