To Data & Beyond, Helsinki (2026)

20/06/2026

GLM-5.2 Max is now the leading open-weight model on the Artificial Analysis Intelligence Index v4.1.

It scored 51, placing it ahead of:

• MiniMax M3: 44
• DeepSeek V4 Pro Max: 44
• Kimi K2.6: 43

19/06/2026

I thought this was a joke. Meta has now made 30-50% of software engineers on core teams become data labelers.

Their job is "giving human feedback on AI-generated GitHub repos" in an org called Agent Data Optimization.

Maybe we are all training data generators after all.

18/06/2026

Tough day to be a closed-source Al lab

GLM-5.2 is here, and it's open.

Z.ai just put opus-class frontier intelligence into everyone's hands, with open weights anyone can download, run, and build on:

- Frontier-level reasoning and agentic capability

- 1M token context window

- Agentic-first, built for coding and long-horizon tasks

- Native in Transformers from day one, no remote code, so VLLM, SGLang and the rest of the ecosystem support it out of the box

Every few weeks the open frontier moves again, and the gap to the best closed models gets harder to see. This is one of those moments.

The future of Al is open. And it's accelerating.

17/06/2026

Google DeepMind recently published an interesting paper: From AGI to ASI.

Most discussions today focus on whether we can build AGI — an AI system that reaches human-level capabilities across a broad range of tasks. This paper focuses on a different question:

What happens after AGI?

The authors argue that once a human-level AI exists, several properties of digital systems may allow progress beyond human intelligence much faster than many people expect.

Unlike humans, AI systems can be copied at essentially zero cost. A single capable AI can become thousands or millions of identical workers, each carrying the same knowledge and skills. These systems can also operate faster than humans and share information almost instantly.

As a result, the transition from AGI to superhuman performance may not require a single “genius” AI. It may emerge from large populations of capable AI systems working together.

The paper discusses four potential drivers of progress:

- Continued scaling of existing approaches.

- New algorithmic breakthroughs when current methods reach limits.

- Recursive improvement, where AI systems help improve future AI systems.

- Collective intelligence, where large groups of AI agents coordinate to solve problems beyond the capabilities of any individual agent.

At the same time, the authors acknowledge several factors that could slow progress:

- Limited availability of high-quality training data.

- Increasing computational and energy requirements.

- The possibility that scientific and engineering progress becomes harder over time.

One of the most interesting questions raised in the paper concerns scientific discovery.

Current AI systems are extremely effective at learning from existing human knowledge and recombining ideas in useful ways. However, it remains unclear whether they can independently generate fundamentally new scientific concepts.

The paper references a thought experiment often discussed by Demis Hassabis:

If an AI were placed in the year 1900 with access to everything Einstein knew at the time, could it independently discover the theory of relativity?

Today, we do not have strong evidence that AI systems can reliably make that kind of breakthrough.

If genuine scientific discovery remains difficult for AI, then progress may ultimately be constrained by the pace of real-world experimentation rather than by improvements in compute alone.

One aspect of the paper that I particularly liked is that it avoids framing the future as a single AGI “moment.”

Instead, it presents a more gradual picture: a sequence of increasingly powerful waves of AI capability that transform different scientific, technical, and economic domains over time.

In that world, one of the most important challenges is not predicting exactly when AGI or ASI will arrive, but building better ways to measure progress so we can understand which stage of the transition we are already experiencing.

One final detail: the paper opens with instructions for AI assistants on how to summarize it. The authors clearly assume that many readers will ask an AI to read the paper before reading it themselves.

17/06/2026

A 3B parameter model called VibeThinker-3B just put up coding benchmark scores in the same league as Claude Opus 4.5.

Only 3 BILLION!

The weights are on Hugging Face; anyone can test it.

It uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training stack.

Here are some of the important pieces of their post-training stack based on their report:

1. High-signal synthetic data. Use synthetic data where the supervision is actually useful: math problems with credible solutions, code tasks with tests, and examples where correctness can be checked.

2. Multiple reasoning paths per answer. Do not rely on a single solution trace. Generate several reasoning paths for the same final answer so the model sees different valid ways to reach the result.

3. Aggressive filtering. The quality of the dataset matters more than the size. Filter incorrect, weak, noisy, duplicated, or low-value samples repeatedly.

4. Two-stage SFT. Start with broad supervised fine-tuning to build general capability, then run a second stage focused on harder long-reasoning examples.

5. Select checkpoints using target accuracy, not only validation loss. Instead of choosing checkpoints based purely on validation loss, evaluate them using task-level accuracy, such as pass@k, which better reflects the model’s actual reasoning performance.

6. MGPO for RLVR. Use MaxEnt-Guided Policy Optimization, an RLVR method similar in spirit to GRPO, but with an additional weighting mechanism that prioritizes examples that are not too easy and not too hard for the current policy.

7. Single 64k long-context RL stage. Train directly with a 64k context window. In this setup, progressive context expansion hurt performance because early truncation damaged the model’s long-term thinking behavior.

8. Careful RL data ordering. The RL stages are ordered as Math RL → Code RL → STEM RL. This specific order helped improve overall performance.

9. Efficiency optimization after accuracy optimization. After maximizing accuracy, add a final stage that rewards shorter correct trajectories. The goal is to make the model more efficient without sacrificing correctness.

Model page on Hugging Face in the comments!

16/06/2026

MIT, Stanford, New York Univ, Princeton paper says AI can make people feel more efficient even when they are not actually becoming much more efficient.

That people often use AI for simple tasks because it feels like it saves time and effort, but the measured benefit is often tiny, missing, or even negative.

The biggest point is the feedback loop: once people use AI, they become more likely to use it again, even for easy tasks where doing it themselves would often be just as fast or faster.

i.e. AI dependence can grow from a mistaken feeling of convenience, not just from real productivity gains.

Across three preregistered studies with 2,691 participants, people used AI for basic arithmetic, spelling, recall, and short rewriting at higher rates than they predicted, especially on easy tasks.

They also expected AI to save 55.7 seconds on average, when the measured saving was only 7.5 seconds.

For simple work, the hidden cost is not intelligence but interface friction: writing the prompt, waiting, reading, checking, and deciding whether the answer is acceptable.

Once that loop begins, it can feel like effort has been outsourced, even when effort has only been rearranged.

Here’s the key part: the study suggests that AI use can train its own justification.

After using AI on just two tasks, participants became more likely to use it again, even when independent completion was faster.

The danger is not dramatic dependence, but quiet recalibration.

A person who asks AI for a trivial answer today may not become less capable tomorrow, but they may become less accurate at judging when their own mind is already the faster tool.

15/06/2026

One image worth thousands word! Key folders in a production GenAl project!

You can read more about what a real production Gen AI projects looks like on our latest publication. (Link below👇)

14/06/2026

Claude Cowork lets you use Claude as a desktop assistant that integrates with your folders, files, connectors, scheduled tasks, and real workflows.

A new step-by-step guide on Claude Cowork on To Data & Beyond, which takes you from Prompts to Deliverables & Automated Workflows.

The guide walks through the full setup process, including:

- How Claude Cowork differs from Claude Chat and Claude Code
- The essential settings you should configure first
- How to add guardrails before giving Cowork access to files and tools
- How to use global instructions and persistent memory
- How to connect tools like Gmail, Slack, Drive, and custom connectors
- How to write better Cowork briefs instead of vague chat prompts

Then we build two practical workflows:

- Workflow #1: Cleaning and renaming a messy folder on your machine.
- Workflow #2: Creating your first recurring scheduled task, such as a daily Slack summary or weekly inbox digest.

The blog includes more than 50 figures and screenshots, so it is not just a high-level overview. It is designed as a practical walkthrough that you can follow step by step.

Read it from the link in the comments!

13/06/2026

Banning Anthropic's latest model would push everyone to adopt open-source models, which are lagging behind flagship proprietary models by only 3 months.

WE can't rely on external models and risk our access being revoked at a whim.

13/06/2026

The US government has reportedly issued an export control directive suspending access to Fable 5 and Mythos 5 for foreign nationals, whether they are inside or outside the United States.

According to Anthropic, this also affects foreign national employees inside the company.

- I do not think this is a good sign, but not mainly because of Anthropic (arguably, they should take responsibility for over hyping it). The bigger concern is how the restriction appears to have been designed.

- Using “foreign national” as the main criterion for model access is a very blunt policy tool. It is difficult to enforce in practice, creates a very wide ban, and does not map cleanly to actual risk.

- There are US citizens who may be hostile to US interests, and there are foreign nationals who are trusted employees, researchers, and engineers. A serious malicious actor would also likely look for ways around this kind of restriction, so the policy may end up hurting legitimate users more than it stops determined misuse.

- The second issue is process. Based on Anthropic’s post, it does not look like the government worked closely enough with the company to understand the actual vulnerability, the mitigation options, or the operational impact before ordering the shutdown. From the outside, this looks less like a carefully scoped safety intervention and more like a rushed reaction to a capability concern.

My guess is that access will recover once the government has a clearer understanding of the situation and the policy is narrowed or clarified.

Anthropic clearly worked hard on this release, and by most accounts, the model is strong. They have also invested more heavily in safety than most labs, perhaps even too heavily in some areas.

There is a fair argument that the “this model is very dangerous” narrative may now be backfiring on them, but this shutdown says more about the government’s policy response than about the quality of the model itself.

To Data & Beyond

Jaa

Alue

Kategoria

Ota yhteyttä koulu

Puhelin

Nettisivu

Osoite