Hubert (Marek) Pyskło

I'm a Computational Sciences & History student at Minerva University (graduating May 2027), currently an Applied Research Intern at Prime Intellect. I work on RL environments, multimodal post-training, and the synthetic data pipelines that feed them. I also co-founded Econverse, the largest student startup incubator in CEE, and sit on the board of AI Consensus. I like 20th-century history (Deng's reforms, the Cold War), poker, skiing, and shooting. I'm an RBF Scholar (2025 cohort).

publications

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation
KDD '26 (32nd ACM SIGKDD), Jeju Island · also EvalEval @ ACL 2026
A benchmarking framework for evaluating agentic LLMs on real-world productivity software API tasks via code execution, using state-diff contracts and containerized replicas of enterprise APIs (Slack, Linear, Box, Google Calendar). 108 endpoints, 224 benchmark tasks across nine LLMs. I also run RL training (Qwen 30B, 0.31 → 0.55) and SFT + GRPO (Ministral 14B, 0.28 → 0.45).
arXiv · DOI · code · dataset

Activation Steering for Tool-Poisoning Defense in Language-Model Agents
AIWILD @ ICML 2026 (Agents in the Wild: Safety, Security, and Beyond), Seoul
We study whether activation steering can defend tool-using LLM agents against tool-poisoning attacks, comparing difference-in-means, SAE-filtered, and input-conditioned hypernet steering on MCPTox. We pair defense-rate numbers with mechanism checks and benign-capability tests.
OpenReview · code

experience

2025	Research Engineer Intern at Wordware (YC S24). Designed evaluation frameworks, scoring metrics, and test suites with LLM-as-judge verification. Built automated Q&A test generation from company internal data. Integrated into CI/CD pipeline. Benchmarked retrieval architectures (vector DB, SQL, graph, filesystem) for agent memory.
2025	AI Engineer Intern at Samsung Heavy Industries, South Korea. Built local inference RAG system for shipbuilding ITT document analysis - 92.5% accuracy across 217 risk factors.
2024–	Board Member at AI Consensus. Previously Lead for Asia, organized one of the largest student AI conferences in Taiwan with the Ministry of Digital Affairs. Ran responsible AI hackathon in Korea with students from 23 countries - partners included AWS, Perplexity, and Upstage. (Nature, Minerva press release)
2024	Visiting Associate at S20. Due diligence and deal sourcing across e-commerce, circular economy, and AI tools.
2022–25	Co-Founder & VP at Econverse. Built and operated the largest student startup incubator in Central Europe - 3,500+ students across Poland, Czechia, Slovakia, and Hungary, $500K+ raised from Microsoft, ABB, National Development Bank, and Baker McKenzie. (Forbes Poland, Emerging Europe)
2020–22	Co-Founder at Token Studio. Crypto investment analytics - $25k+ angel from execs at Getin Noble Bank and BNP Paribas. Didn't find PMF, shut down.

education

2023–27	Minerva University, San Francisco - B.Sc. Computational Sciences, Minor in History. Benchmark-backed, ~2% acceptance rate. Built around active learning - each semester in a different city (San Francisco, Seoul, Taipei, Berlin, Hyderabad, Buenos Aires, Tokyo).
2020–22	IB World School No. 1349, Poznań - International Baccalaureate, 41/45.

recognition

RBF Scholar (2025 cohort) - full-ride scholarship for entrepreneurial achievement
Laureate & Finalist of the National Economics Olympiad (top 0.3%)

media

XYZ - interview on responsible AI and AI Consensus (2025)
Gazeta Prawna - podcast on Econverse and youth entrepreneurship (2023)
Minerva University - feature on co-founding Econverse (2023)