With the launch of Gemini, a new generative AI platform, Google is attempting to create an uproar. Gemini, however, is lacking in several areas even as it seems promising in others. So what exactly is a Gemini? In what way is it useful? Furthermore, how does it compare to the competition?
We’ve put up this helpful guide to make it easier to stay up to date with the newest Gemini advancements. It will be updated as new Gemini models and features are available.
What is Gemini?
Google’s next generation of generative AI models, called Gemini, has been in the works for a while now. DeepMind and Google Research are the AI research divisions of Google. Three types are available:
- Gemini Ultra, the flagship Gemini models
- Gemini Pro, a “lite” Gemini models
- Gemini Nano, a more compact and “distilled” variant that functions on smartphones such as the Pixel 8 Process
Every Gemini model was educated to be “natively multimodal,” or capable of utilizing and interacting with media other than text. A wide range of music, pictures, videos, codebases, and text in several languages were used for pre-training and fine-tuning.
That differentiates Gemini from models like Google’s own huge language model LaMDA, which was trained solely on text data. LaMDA cannot interpret or create anything other than text (such as essays, email drafts, and so on), whereas Gemini models can. Their capacity to grasp visuals, sounds, and other modalities remains limited, but it is better than nothing.
What is the Difference Between Bard and Gemini?
Google once again demonstrated its lack of branding skills by failing to make it obvious from the start that Gemini is different and distinct from Bard. Bard is merely an interface that allows access to certain Gemini models think of it as an app or client for Gemini and other generations of AI models. In contrast, Gemini is a family of models rather than an app or frontend. There is no solitary Gemini experience, and there will most likely never be. If you compare it to OpenAI’s products, Bard relates to ChatGPT, the company’s popular conversational AI tool, and Gemini refers to the language model that powers it, which in the case of ChatGPT is GPT-3.5 or 4.
In addition, Gemini is completely independent of Imagen-2, a text-to-image model that may or may not fit into the company’s broader AI plan. Don’t worry; you’re not alone in your confusion!
What can Gemini do?
Because Gemini models are multimodal, they may theoretically do a variety of tasks, including voice transcription, picture and video captioning, and artwork generation. Few of these features have yet to be released as products (more on that later), but Google promises that all of them and more will be available shortly.
Of course, it is difficult to believe the company’s claims.
Google badly underperformed with the first Bard launch. More recently, it stirred eyebrows with a film professing to demonstrate Gemini’s capabilities, which turned out to be extensively doctored and more or less aspirational. Gemini is, to the tech titan’s credit, available in some form today, albeit in a restricted capacity.
Still, if Google is more or less accurate in its claims, here’s what the various tiers of Gemini models will be able to perform once they’re released:
So yet, only a “select set” of consumers from a handful of Google products and services have had access to Gemini Ultra, the “foundation” model around which the rest are constructed. That won’t change until later this year when Google’s biggest model is released more freely. Most of the information regarding Ultra comes from Google-led product demos, so take it with a grain of salt.
According to Google, Gemini Ultra may be used to assist with physics homework, answering problems step by step on a worksheet, and pointing out potential errors in previously filled-in solutions. Gemini Ultra may also be used for activities like locating scientific publications relevant to a certain topic, extracting information from those papers, and “updating” a chart by creating the formulae required to reproduce the chart with more recent data.
As previously mentioned, Gemini Ultra allows picture creation. However, Google says that capacity will not be included in the model’s productized version when it launches — maybe because the method is more sophisticated than how applications like ChatGPT produce photos. Rather than feeding suggestions to a picture generator (like DALL-E 3 does in ChatGPT), Gemini produces graphics “natively” without an intermediary step.
Gemini Pro, unlike Gemini Ultra, is available to the general public today. However, its capabilities are unclear since they vary depending on where it is employed.
Google claims that in Bard, where Gemini Pro was initially released in text-only format, the model outperforms LaMDA in terms of thinking, planning, and comprehending. A separate investigation by Carnegie Mellon and BerriAI researchers discovered that Gemini Pro outperforms OpenAI’s GPT-3.5 in handling longer and more complicated reasoning chains.
However, the study discovered that, like other big language models, Gemini Pro has difficulties with maths issues requiring several numbers, and users have provided numerous examples of poor reasoning and blunders. It made several factual inaccuracies for simple questions such as who won the current Oscars. Google has promised changes, but it is unclear when they will occur.
Gemini Pro is also available through the API in Vertex AI, Google’s fully managed AI developer platform that receives text as input and produces text as output. Gemini Pro Vision, an extra endpoint, can interpret text and images (including photographs and video) and produce text similar to OpenAI’s GPT-4 with Vision model.
Gemini Pro may be fine-tuned or “grounded” to certain situations and use cases inside Vertex AI by developers. Gemini Pro may also be connected to other, third-party APIs to accomplish certain tasks.
Developers have access to both the Gemini Pro and Gemini Pro Vision endpoints, and they may alter the model temperature to manage the output’s creative range, offer examples to give tone and style guidelines and fine-tune the safety parameters.
Gemini Nano is a significantly smaller version of the Gemini Pro and Ultra variants, and it is efficient enough to do tasks directly on (certain) phones rather than transmitting them to a server. So far, it enables two Pixel 8 Pro features: summarise in Recorder and smart reply on Gboard.
Gemini Nano is now available on Gboard, Google’s keyboard software, as a developer preview. It enables a function called Smart Reply, which suggests what you should say next while you’re conducting a discussion in a messaging app. The function is now only available on WhatsApp, but it will be added to additional applications in 2024, according to Google.
Is Gemini Better than OpenAI’s GPT-4?
There’s no way to tell how the Gemini family stacks up until Google launches Ultra later this year, but the firm has claimed improvements over the current state of the art, which is often OpenAI’s GPT4.
Google has repeatedly emphasized Gemini’s advantage in benchmarking, saying that Gemini Ultra outperforms current state-of-the-art findings on “30 of the 32 widely used academic benchmarks used in large language model research and development. According to the business, Gemini Pro outperforms GPT-3.5 in activities like content summarization, ideation, and writing.
Leaving aside the question of whether benchmarks imply a superior model, Google’s scores appear to be just marginally better than OpenAI’s similar models. And, as previously said, some early impressions have been negative, with users and academics claiming that Gemini Pro frequently gets fundamental information wrong, has problems with translations, and provides poor code advice.
How much will Gemini cost?
Gemini Pro is currently free to use in Bard, as well as AI Studio and Vertex AI.
When Gemini Pro exits preview in Vertex, the model costs $0.0025 per character, but the output costs $0.00005 for each character. Vertex clients pay per 1,000 characters (about 140 to 250 words) or, in the case of models such as Gemini Pro Vision, each picture ($0.0025).
Where you can try Gemini?
Gemini Pro is most easily experienced in Bard. A fine-tuned version of Pro is currently addressing text-based Bard inquiries in English in the United States, with more languages and countries to follow later.
Gemini Pro is also available in preview on Vertex AI via an API. The API is now free to use “within limits” and supports 38 languages and locations, including Europe, as well as features such as chat capabilities and filters.
Alternatively, Gemini Pro may be accessed in AI Studio. Developers may use the service to refine prompts and Gemini-based chatbots before receiving API keys to utilize them in their apps or export the code to a more feature-rich IDE.
The Pixel 8 Pro has Gemini Nano, which will be available on additional devices in the future. Developers who want to include the model in their Android apps may join up for a preliminary peek.