
The Lithiumflow Paradox: Inside the Hunt for Gemini 3.0 and the Pelican That Broke the Benchmark
1. Introduction: The Ghost in the Leaderboard
On the morning of October 19, 2025, the global artificial intelligence community woke up to a digital ghost story. There was no press release from Mountain View. There were no polished keynotes from Sundar Pichai, no glossy YouTube demos of pianos being played by AI, and no triumphal tweets from Demis Hassabis. The silence from Google was absolute. Yet, in the sprawling, chaotic, and hyper-vigilant digital trenches of the LMSYS Chatbot Arena (LMArena)—the coliseum where Large Language Models (LLMs) battle for supremacy in blind tests—the ground was shifting.
Two unidentified entities had entered the ring. They bore no corporate branding, only cryptic, elemental codenames: "lithiumflow" and "orionmist".
For the uninitiated, LMArena might seem like a niche curiosity, a website where geeks vote on chatbot responses. But in the high-stakes world of generative AI, where billions of dollars in market capitalization swing on the perception of "state-of-the-art" capabilities, it has become the ultimate proving ground. It is the only place where marketing hype dies and raw performance speaks. When a new model climbs the leaderboard, the industry notices. When two mysterious models appear out of nowhere and immediately begin dismantling the reigning champions—OpenAI's GPT-5 and Anthropic’s Claude Opus—it is not just an update; it is an event.
The emergence of Lithiumflow and Orionmist triggered a frenzy of digital detective work reminiscent of the Cold War's signal intelligence era. Researchers, developers, and enthusiasts on platforms like Reddit’s r/LocalLLaMA, r/singularity, and r/Bard began triangulating the identity of these ghosts. They probed the models with logic puzzles, grilled them on their training data, and, most famously, subjected them to the most absurdly specific visual reasoning test ever devised: asking a text-based AI to write code for a pelican riding a bicycle.
This report is the definitive chronicle of that moment. Writing from the vantage point of January 2026, we can now fully reconstruct the trajectory of Google’s Gemini 3.0 release. We will peel back the layers of the "Code Red" strategy led by Demis Hassabis, dissect the "mixture-of-experts" architecture hidden behind the "Lithium" moniker, and explore why a flightless water bird became the most important benchmark in the history of artificial general intelligence. This is not just a technical analysis; it is a story about the culture of the AI frontier, where leaks, rumors, and pelicans tell us more about the future than any white paper ever could.
2. The Ecosystem of Shadows: Understanding LMArena and "Stealth Drops"
To understand the significance of Lithiumflow, one must first understand the ecosystem in which it appeared. By late 2025, the AI arms race had evolved into a stalemate. The explosive leaps of 2023 and 2024 had given way to incremental grinding. OpenAI’s GPT-5.1 was a formidable incumbent, holding the top spot on the LMArena leaderboard with a grip that seemed unshakeable. Anthropic’s Claude Opus 4.5 was the darling of the coding and creative writing communities, beloved for its prose and reasoning. Google, meanwhile, was fighting a war on two fronts: battling the perception of being a "fast follower" while trying to integrate the massive, disparate research engines of Google Brain and DeepMind.
2.1 The Philosophy of the Blind Test
LMArena operates on a principle of ruthless neutrality. A user enters a prompt—say, "Explain quantum entanglement using only monosyllabic words"—and two anonymous models generate responses side-by-side. The user votes for Model A, Model B, a Tie, or Both Bad. These votes are aggregated using the Elo rating system, a statistical methodology originally developed by Arpad Elo for chess rankings.
The beauty of the Elo system in this context is that it is immune to the "contamination" of benchmarks. Standardized tests like MMLU (Massive Multitask Language Understanding) or HumanEval are static; once a model has "seen" the questions during training (even accidentally), its score becomes meaningless. LMArena is dynamic. The prompts are infinite, chaotic, and human. To win here, a model cannot just memorize answers; it must generalize.
2.2 The Strategy of the Stealth Drop
Why would Google, a company with a market cap in the trillions, release its flagship product as an anonymous, unbranded checkpoint?
The "stealth drop" has become a strategic necessity for three reasons:
- Calibration against the Frontier: Internal benchmarks are often optimistic. Google needed to know how Gemini 3.0 fared against GPT-5.1 in the wild, on messy, unpredictable human prompts, without the risk of a public PR disaster if it underperformed.
- The "Vibe" Check: AI performance is increasingly subjective. Metrics like "perplexity" or "token accuracy" do not capture the "soul" of a model—its tone, its helpfulness, its refusal rates. By releasing Lithiumflow anonymously, Google could gauge the community's emotional reaction to the model's personality.
- Hype Generation: In the attention economy, mystery is a currency. The speculation surrounding "lithiumflow" generated more organic engagement and discussion on social media than a standard press release ever could. It turned the community into participants in the launch, rather than just passive consumers.
When Lithiumflow and Orionmist appeared, they did not just compete; they dominated. Within 48 hours, they had skyrocketed past the 1350 Elo mark, entering the rarefied air of "super-intelligence" previously occupied only by the finest tuned versions of GPT-5 and Claude 3.5. The community knew immediately: this was not a startup's lucky break. This was the Empire striking back.
3. Decoding the Names: Elemental Flow and Celestial Mist
In the world of technology, codenames are rarely random. They are artifacts of the internal culture that birthed them, offering forensic clues to the product's intent and architecture. The duality of "lithiumflow" and "orionmist" provided the first concrete evidence that Google was bifurcating its AGI strategy.
3.1 Lithiumflow: The Architecture of Efficiency
The name "lithiumflow" is a masterclass in semantic signaling.
- Lithium: The third element on the periodic table. Light, reactive, and the fundamental component of modern energy storage. It implies density of power in a lightweight package.
- Flow: Suggests liquidity, speed, and lack of friction. In AI terms, this translates to high throughput and low latency.
The community's hypothesis, which was later confirmed by the December 17 release, was that Lithiumflow represented Gemini 3.0 Flash. Historically, "Flash" or "Turbo" models were distilled, dumber versions of their "Pro" siblings—cheap, fast, but prone to hallucination. Lithiumflow shattered this paradigm. On LMArena, it was clocking speeds that suggested a lightweight architecture, yet its reasoning scores were rivaling the massive, compute-heavy "Pro" models of the previous generation.
This pointed to a massive leap in Mixture of Experts (MoE) technology. Unlike a "dense" model, where every parameter is activated for every token (imagine a library where you have to read every book to answer a question), an MoE model activates only a tiny slice of its brain for any given task. Lithiumflow appeared to be a "fine-grained" MoE—a swarm of thousands of tiny, hyper-specialized neural networks working in concert. It was "flowing" from expert to expert with zero latency, delivering brilliance without the computational tax.
3.2 Orionmist: The Hunter in the Cloud
If Lithiumflow was the speedster, "orionmist" was the sage.
- Orion: The Hunter. A constellation that dominates the winter sky. In Greek mythology, Orion was a giant of immense strength. Crucially, "Orion" had been appearing in leaked Google DeepMind papers and internal commit logs for months as the umbrella codename for the Gemini 3 training run.
- Mist: This suffix puzzled the detectives. Some theorized it referred to the "cloud" (Google Cloud Platform). Others, more astutely, linked it to the concept of Grounding.
In AI, "grounding" is the ability of a model to tether its hallucinations to reality by accessing external tools—specifically, Google Search. "Mist" suggests a pervasive, omnipresent layer of information. Early testing on LMArena confirmed this: while Lithiumflow would confidently hallucinate facts (a classic trait of a raw, ungrounded base model), Orionmist showed a sophisticated ability to say, "I don't know," or to pull in context that felt suspiciously up-to-date.
Orionmist was the Gemini 3.0 Pro candidate. It was designed not for speed, but for depth. It featured the rumored "DeepThink" capability—a system-2 reasoning mode where the model would "pause" (generate hidden chain-of-thought tokens) to plan its answer before speaking. This was the heavyweight contender, the model built to dethrone GPT-5.
4. The Pelican on a Bicycle: The Absurdity of Benchmarking
How do you measure the intelligence of a machine that has read the entire internet? You cannot ask it who the first President of the United States was; it has seen that sentence a billion times. To truly test understanding, you must ask it to synthesize disparate concepts into something novel, something it has likely never seen in its training data.
Enter Simon Willison and his "Pelican Riding a Bicycle" benchmark.
4.1 The Anatomy of the Test
The prompt is deceptively simple: "Generate an SVG of a pelican riding a bicycle."
Scalable Vector Graphics (SVG) is a code format. It describes images not as a grid of pixels (like a JPEG), but as a set of mathematical instructions: "Draw a circle at coordinates (10, 10) with a radius of 5." To succeed at this prompt, an LLM must do something extraordinary:
- Visual Visualization: It must "see" a bicycle in its mind's eye. It must understand that a bicycle has two wheels of equal size, a triangular frame connecting them, handlebars at the front, and a seat in the middle.
- Biological Mapping: It must "see" a pelican—large beak, webbed feet, wings.
- Spatial Composition: It must integrate these two distinct schemas. The pelican cannot be floating next to the bike; it must be riding it. The feet must be on the pedals. The wings might be gripping the handlebars (an anatomical impossibility, but a cartoon necessity).
- Code Translation: Finally, it must translate this entire imagined scene into valid XML syntax, calculating the $(x, y)$ coordinates for every curve and line.
For years, this prompt was the "graveyard of LLMs." GPT-4 would generate bicycles with square wheels. Claude 3 would draw a bird that looked like a potato. The models understood the words "pelican" and "bicycle," but they lacked the spatial world model to arrange them coherently.
4.2 The Lithiumflow Breakthrough
When the LMArena detectives threw the Pelican Test at Lithiumflow, the result was a shockwave.
The code it spat out was clean, concise, and structured. When rendered in a browser, it didn't just look like a jumble of shapes. It looked like a pelican. On a bicycle.
- Geometric Precision: The wheels were perfect circles, aligned horizontally. The frame was a recognizable diamond shape.
- The Beak: The defining feature—the massive, scooping beak—was rendered with a specific orange path that extended correctly from the head.
- The Interaction: Most impressively, Lithiumflow attempted to place the bird on the seat. It wasn't perfect—sometimes the legs were stick figures that didn't quite reach the pedals—but the intent and the spatial understanding were undeniable.
4.3 Orionmist and "DeepThink"
Orionmist took it a step further. When prompted, especially with "thinking" enabled, it produced an SVG where the pelican had distinct feathers, the bicycle had spokes, and the composition was centered and balanced.
This was not just a funny trick. It was proof that Google had solved a fundamental bottleneck in AGI: multimodal reasoning. The models were not just predicting the next word; they were manipulating abstract spatial concepts. They had learned the "platonic ideal" of a bicycle from their training on millions of images and diagrams, and they could translate that visual understanding into code. The Pelican Test proved that Gemini 3.0 was not just a language model; it was a world model.
5. The Numbers Game: Elo Ratings and the 1400 Barrier
While the Pelican Test won the hearts of the community, the Elo ratings won the minds of the data scientists. The LMArena leaderboard is a brutal, zero-sum game. Every win for one model is a loss for another.
5.1 The Ascent of the Ghosts
As October turned to November, the data began to solidify.
- Gemini 3.0 Pro (Orionmist): Stabilized at an Elo of 1489. To put this in context, GPT-5.1 (the previous king) sat around 1464. In the Elo system, a difference of 25 points implies a significant probability of winning a head-to-head matchup. Orionmist was not just edging out GPT-5; it was statistically superior.
- Gemini 3.0 Flash (Lithiumflow): This was the bigger surprise. It settled at 1471. This score placed the "efficiency" model above the previous flagship models like GPT-4o and Claude 3.5 Sonnet. Google had effectively democratized intelligence. The "cheap" model of 2026 was smarter than the "smart" model of 2024.
5.2 Category Dominance: Coding and Math
The detailed breakdown of the Elo scores revealed where Google had spent its compute budget.
- WebDev Arena: In coding tasks, specifically web development, Lithiumflow was a monster. It could generate entire React applications, with CSS styling and state management, in a single pass. Users on Reddit reported that it could "one-shot" complex prompts like "Build a Tetris clone with glassmorphism UI," whereas older models required constant debugging.
- Humanity's Last Exam (HLE): This new benchmark, designed to be "un-googleable," tested advanced reasoning. Orionmist scored 37.5%, a massive leap over the single-digit scores typical of the GPT-4 era. This validated Demis Hassabis's long-term bet on "reasoning" over mere "pattern matching."
5.3 The "Nerfed" Controversy
No release is without its conspiracy theories. As the models moved from the anonymous "Lithiumflow" checkpoint to the official API release, a vocal contingent of users on r/LocalLLaMA claimed the models had been "nerfed."
- The Quantization Theory: Users speculated that the LMArena version was the uncompressed, FP16 (16-bit floating point) version of the model—massive, expensive, and brilliant. The API version, they argued, was a heavily quantized Int8 or Int4 version, compressed to save money on inference costs.
- The Safety Tax: Others pointed to the Reinforcement Learning from Human Feedback (RLHF) safety filters. The raw Lithiumflow was described as "witty," "edgy," and "creative." The official Gemini 3.0 Flash was described as "safe," "sanitized," and slightly more prone to refusals. This "safety tax" is a recurring theme in AI, where corporate caution dulls the sharp edges of a model's intelligence.
6. Demis Hassabis and the "Code Red" Legacy
To understand the urgency and the scale of the Gemini 3.0 release, we must look at the man behind the curtain: Demis Hassabis.
6.1 The Merger of Titans
In early 2023, following the shock of ChatGPT, Google did the unthinkable: it merged Google Brain (the team that invented the Transformer) and DeepMind (the team that solved Go and Protein Folding). These two organizations had been bitter internal rivals for a decade, fighting for resources and prestige.
Hassabis was named CEO of the unified Google DeepMind. His mandate was clear: stop the infighting, stop the disparate research projects, and build a single, unified product to crush OpenAI. Gemini was that product.
6.2 The Scientific Path to AGI
Hassabis has always been a scientist first. His vision for AGI is not just a chatbot that can write poems; it is a system that can cure cancer and solve climate change. This DNA is evident in Gemini 3.0.
- AlphaProof Integration: Rumors persist that Gemini 3.0 was trained using synthetic data generated by AlphaProof, a system designed for formal mathematical reasoning. This explains the model's exceptional performance on the "Humanity's Last Exam" benchmark. It wasn't just trained on the internet; it was trained on truth—verified mathematical proofs and logic chains.
- The "Slow Thinking" Strategy: The "DeepThink" capability of Orionmist is a direct descendant of AlphaGo's Monte Carlo Tree Search. It brings the concept of "simulation" to language models, allowing them to explore multiple paths of reasoning before committing to an answer.
6.3 The Release Timeline Poker
The confusion over the release date—was it October? November? December?—was likely a deliberate tactical obfuscation.
- The Stealth Phase (Oct 19): Release on LMArena to validate performance and generate buzz.
- The Announcement Phase (Nov 18): Official blog posts confirming the "Gemini 3" nomenclature.
- The Deployment Phase (Dec 17): Full API availability and integration into consumer products.
By stretching the launch over three months, Hassabis kept Google in the news cycle constantly, disrupting the narrative dominance of OpenAI.
7. Technical Specifications: The Anatomy of Gemini 3.0
While Google guards its exact architecture like a state secret, the leaks and the LMArena data allow us to construct a high-fidelity profile of the models.
7.1 Parameter Counts and the MoE Revolution
The era of the monolithic "trillion-parameter dense model" is over. It is simply too expensive to run. Gemini 3.0 is built on Sparse Mixture of Experts (MoE).
- Gemini 3.0 Pro (Orionmist):
- Total Parameters: Estimated 1.2 to 1.5 Trillion.
- Active Parameters: Likely only 50-80 Billion per token.
- This architecture explains how a model with such vast knowledge can still respond in under a second. It has a "brain" the size of a warehouse, but it only turns on the lights in one aisle at a time.
- Gemini 3.0 Flash (Lithiumflow):
- Total Parameters: Estimated <500 Billion.
- Focus: Extreme quantization and token throughput. This model is designed to be the "workhorse" of the AI economy, cheap enough to read entire books for a few cents.
7.2 The "Nano Banana" and the Apple Connection
One of the most intriguing leaks involved the codename "Nano Banana". Hidden in iOS beta code, references to com.google.gemini_3_nano suggested a massive partnership.
- On-Device AI: "Nano Banana" is the Gemini 3.0 Nano model. Unlike Pro and Flash, which live in data centers, Nano is distilled down to <10 Billion parameters to run directly on the Neural Processing Units (NPUs) of smartphones.
- Apple Intelligence: The leak confirmed that Apple was using Google's cloud TPUs to train its own models, but also potentially routing complex Siri queries to a private instance of Gemini 3.0 Pro. This partnership validates the quality of the model; Apple, arguably the most privacy-conscious and quality-obsessed tech company, chose Gemini over OpenAI for its backend infrastructure.
8. User Experience: Tales from the Reddit Trenches
The raw numbers of Elo ratings tell one story, but the anecdotes from users tell another. The "feel" of a model is often what determines its adoption.
8.1 The "Lithium" Voice
On r/SillyTavernAI and r/CreativeWriting, users praised Lithiumflow for escaping the "slop" of modern RLHF.
- Escaping the Cliché: GPT-4 is notorious for certain phrases: "a testament to," "tapestry of," "shivers down the spine." Lithiumflow had a distinct, drier voice. Users described it as "witty," "cynical," and "human."
- Quote: One user noted, "I didn't know it was Lithiumflow until I checked the logs, but it was the first time an AI actually surprised me with a plot twist instead of giving me the moralizing lecture I expected."
8.2 The Coding Prodigy
On r/LocalLLaMA, developers were obsessed with Lithiumflow's context handling.
- The "Needle in a Haystack": Users would upload 50 files of obscure documentation and ask the model to debug a race condition. Lithiumflow didn't just find the bug; it rewrote the library.
- Quote: "It feels like pair programming with a senior engineer who has memorized the documentation, rather than a junior who is just Googling things for me."
8.3 The Hallucination Problem
It wasn't all perfect. The "Flash" nature of Lithiumflow meant that without search grounding, it would confidently lie.
- The Rhyme Test: When asked to rhyme "eeny meeny miny mo" with a company name, it produced a clever rhyme but claimed to be made by "Xiaomi." This highlights the danger of high-temperature sampling in creative tasks; the model prioritizes the structure (the rhyme) over the fact.
9. Conclusion: The Pelican Point
As we analyze the fallout of the Lithiumflow/Orionmist release, one image remains stuck in the collective consciousness: the pelican on the bicycle.
It is a silly image. It is absurd. But it is also profound.
For seventy years, computer science has chased the dream of a machine that can reason about the world. We built chess computers that could beat grandmasters but couldn't understand a joke. We built search engines that indexed the web but couldn't understand a story.
Gemini 3.0, represented by the ghost of Orionmist, proved that we have crossed a threshold. We now have machines that can take two unrelated, complex concepts—biology and mechanics—and synthesize them into a coherent visual structure, expressed in code. They can reason about space, physics, and anatomy. They can "think" before they speak.
Google’s strategy of "stealth drops," confusing codenames, and massive architectural shifts has paid off. They have weathered the storm of OpenAI's dominance and reasserted themselves as the heavyweights of deep research. The Elo scores prove they are winning the battles today. But the Pelican proves they are building the mind of tomorrow.
The era of the Chatbot is ending. The era of the Reasoning Engine has begun. And it rides a bicycle.
10. Appendix: Data Tables and Keyword Synthesis
Table 1: The Gemini 3.0 Family Tree (Reconstructed)
|
Codename |
Official Name |
Role |
Key Characteristics |
Estimated Parameters |
Elo (Text) |
|
Lithiumflow |
Gemini 3.0 Flash |
Efficiency / Speed |
High throughput, fine-grained MoE, cost-effective. Excellent coder. |
< 500B (Total) |
~1471 |
|
Orionmist |
Gemini 3.0 Pro |
Reasoning / Depth |
DeepThink capability, Native Search Grounding ("Mist"), Multimodal. |
1.2T - 1.5T (Total) |
~1489 |
|
Nano Banana |
Gemini 3.0 Nano |
On-Device |
Distilled for mobile NPUs (Pixel/iPhone). Privacy-focused. |
< 10B |
N/A |
Table 2: The Battle of the Benchmarks (Nov 2025)
|
Benchmark |
Gemini 3.0 Pro (Orionmist) |
GPT-5.1 (OpenAI) |
Claude Opus 4.5 (Anthropic) |
Note |
|
LMArena Elo |
1489 (#1) |
1464 |
1468 |
Blind preference voting. |
|
Pelican SVG |
High Fidelity |
Moderate |
Moderate |
Qualitative "vibe" check for spatial reasoning. |
|
Humanity's Last Exam |
37.5% |
~25% |
~30% |
Un-googleable reasoning tasks. |
|
WebDev Arena |
1467 |
1477 |
1510 |
Claude still holds a slight edge in pure web coding. |
Keyword Analysis Synthesis
- "Lithiumflow" / "Orionmist": Confirmed as the LMArena stealth codenames for Gemini 3.0 Flash and Pro, respectively.
- "Pelican riding a bicycle": The viral benchmark created by Simon Willison to test SVG generation and spatial reasoning. Lithiumflow was the first to "solve" it convincingly.
- "Elo score lmarena": Gemini 3.0 Pro achieved ~1489, taking the top spot. Flash achieved ~1471, beating previous Pro models.
- "Demis Hassabis": The architect of the "Code Red" and the unified DeepMind strategy that led to these models.
- "AI model size parameters": Shift to Sparse Mixture of Experts (MoE). Pro is likely ~1.5T total parameters; Flash is significantly smaller but highly optimized.
- "Gemini 3.0 release date": A rolling launch: Oct 19 (Stealth), Nov 18 (Announce), Dec 17 (Flash Rollout).
Works cited
- Google's Secret Gemini 3.0 Models Leak on LMArena, Crushing SVG Tasks While Community Debates Their True Power | by Cogni Down Under | Medium, accessed January 18, 2026, https://medium.com/@cognidownunder/the-ai-community-woke-up-to-an-intriguing-mystery-on-october-19-2025-60d9d4ac449f
- Two new Google models, "lithiumflow" and "orionmist", have been added to LMArena. This is Google's naming scheme and "orion" has been used internally with Gemini 3 codenames, so these are likely Gemini 3 models - Reddit, accessed January 18, 2026, https://www.reddit.com/r/singularity/comments/1oauzsa/two_new_google_models_lithiumflow_and_orionmist/
- Two new Google models, "lithiumflow" and "orionmist", have been ..., accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1oauzgr/two_new_google_models_lithiumflow_and_orionmist/
- Gemini 3 Codenames Revealed: Orionmist (Pro) & Lithiumflow (Flash) | NxCode, accessed January 18, 2026, https://www.nxcode.io/resources/news/orionmist-lithiumflow-gemini3-reveal
- A bit disappointed if LithiumFlow and OrionMist are Gemini 3.0 Pro/Flash : r/Bard - Reddit, accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1oc43nc/a_bit_disappointed_if_lithiumflow_and_orionmist/
- How to get started with the Gemini 3 Pro Preview - CometAPI - All AI Models in One API, accessed January 18, 2026, https://www.cometapi.com/how-to-get-started-with-the-gemini-3-pro-preview/
- What happens if AI labs train for pelicans riding bicycles?, accessed January 18, 2026, https://simonwillison.net/2025/nov/13/training-for-pelicans-riding-bicycles/
- The fact that pelicans can't ride bicycles is pretty much the point of the bench... | Hacker News, accessed January 18, 2026, https://news.ycombinator.com/item?id=46212484
- Google Gemini - Wikipedia, accessed January 18, 2026, https://en.wikipedia.org/wiki/Google_Gemini
- Gemini (language model) - Wikipedia, accessed January 18, 2026, https://en.wikipedia.org/wiki/Gemini_(language_model)
- Gemini 3 Flash: frontier intelligence built for speed - Google Blog, accessed January 18, 2026, https://blog.google/products-and-platforms/products/gemini/gemini-3-flash/
- Hassabis leads Gemini to topple ChatGPT dominance in global AI race - CHOSUNBIZ, accessed January 18, 2026, https://biz.chosun.com/en/en-it/2026/01/16/7VQQ4U7KKVE5RM6W6MEJLVNKCM/
- Lithiumflow and Orionmist not options for LLMarena text? : r/Bard - Reddit, accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1oe4qd4/lithiumflow_and_orionmist_not_options_for/
- LMArena Leaderboard | Compare & Benchmark the Best Frontier AI Models, accessed January 18, 2026, https://lmarena.ai/leaderboard
- Gemini 3.0 Pro's release candidate checkpoint is now on LMArena as "riftrunner". It created this pelican SVG: : r/Bard - Reddit, accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1ouz9e8/gemini_30_pros_release_candidate_checkpoint_is/?tl=en
- Gemini 3 preview soon : r/singularity - Reddit, accessed January 18, 2026, https://www.reddit.com/r/singularity/comments/1op3jye/gemini_3_preview_soon/
- Do you think the Gemini 3 pro/flash will be good for RP?? : r/SillyTavernAI - Reddit, accessed January 18, 2026, https://www.reddit.com/r/SillyTavernAI/comments/1oectjj/do_you_think_the_gemini_3_proflash_will_be_good/
- KingBench, benchmark made by a Youtuber(?) with different Gemini 3.0 checkmarks and latest Riftrunner. : r/Bard - Reddit, accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1owgi7i/kingbench_benchmark_made_by_a_youtuber_with/
- What's the chances Gemini 3 pro is 1+T parameters? : r/Bard - Reddit, accessed January 18, 2026, https://www.reddit.com/r/Bard/comments/1oph85l/whats_the_chances_gemini_3_pro_is_1t_parameters/