Gemini Multimodal Web App Codelab

Welcome to the Codelab! 👋

This codelab will guide you through a complete development journey. You will start by building a simple command-line Python script to understand the core logic of Gemini, and then evolve it into a fully interactive web application with a front-end and back-end.

What You'll Build:

A Python script that analyzes a local image and a text prompt
A web app where users can upload an image, ask a question, and see Gemini's response in their browser

⏱️ Time to Complete: 20-25 minutes

Prerequisites

Python 3.8+ installed on your system
A Google Gemini API Key: Get your free key from Google AI Studio
A code editor like VS Code

Part 1: The Core Logic (Command-Line App)

First, we'll prove the concept with a simple script. This ensures the AI part works before we build the web interface.

Step 1: Set Up Your Project Folder

Open your terminal and create a clean workspace.

# Create a new folder for our project
mkdir gemini-web-app

# Navigate into the new folder
cd gemini-web-app

Step 2: Create and Activate a Python Virtual Environment

This isolates our project's libraries.

# Create a virtual environment named 'venv'
python -m venv venv

Now, activate it:

On macOS / Linux: source venv/bin/activate
On Windows: .\venv\Scripts\activate

Your terminal prompt should now start with (venv)

Step 3: Install Core Libraries

We need pip to install the packages for Gemini and to handle our API key.

pip install google-generativeai pillow python-dotenv

What we just installed:

google-generativeai: The official Google client library
Pillow: For opening and handling images in Python
python-dotenv: To manage our secret API key

Step 4: Add an Image and Your API Key

Download an Image: Find an image of a famous landmark (e.g., Eiffel Tower) and save it inside your gemini-web-app folder as landmark.jpg
Create .env file: In the same folder, create a new file named .env

Add your key: Open the .env file and add your Gemini API key like so:

GEMINI_API_KEY="YOUR_API_KEY_HERE"

Step 5: Create the First Python Script

Create a Python file named app.py
Paste the following code into it. This is our initial command-line version.

import os
import google.generativeai as genai
import PIL.Image
from dotenv import load_dotenv

# --- Load the API Key ---
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
    raise ValueError("Gemini API key not found. Please set it in the .env file.")

# --- Configure the Gemini Client ---
genai.configure(api_key=api_key)

# --- Create the Model ---
print("Loading Gemini model...")
model = genai.GenerativeModel('gemini-2.5-flash')

# --- Prepare Image and Prompt ---
image = PIL.Image.open("landmark.jpg")
prompt = "What are three interesting facts about this landmark?"

# --- Generate Content ---
print("Asking Gemini...")
response = model.generate_content([prompt, image])

# --- Display the Result ---
print("\n--- Gemini's Response ---")
print(response.text)
print("-------------------------\n")

Step 6: Run and Verify

Execute the script from your terminal:

python app.py

✅ Success! You should see a text-only response from Gemini printed directly in your terminal. This confirms our core AI logic is working perfectly!

Part 2: Level Up to a Web Application

Now that the core works, let's wrap it in a user-friendly web interface using Flask.

Step 7: Update Project Structure and Install Flask

Install Flask:

pip install Flask

Create new folders: We need folders to organize our HTML and CSS files.

mkdir templates
mkdir static

Your project folder should now look like this:

gemini-web-app/ ├── venv/ ├── static/ ├── templates/ ├── .env ├── app.py └── landmark.jpg

Step 8: Create the Front-End (HTML, CSS, JS)

We'll create three files for the user interface.

HTML (templates/index.html): Inside the templates folder, create index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Multimodal Analyzer</title>
    <link rel="stylesheet" href="/static/style.css" />
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Image + Text Analyzer 🧠</h1>
      <p>Upload an image and ask Gemini a question about it.</p>
      <form id="analyzer-form">
        <div class="form-group">
          <label for="image-upload">Upload Image</label>
          <input
            type="file"
            id="image-upload"
            name="image"
            accept="image/*"
            required
          />
        </div>
        <div class="form-group">
          <label for="prompt-input">Ask a Question</label>
          <input
            type="text"
            id="prompt-input"
            name="prompt"
            placeholder="e.g., Suggest a funny caption for this"
            required
          />
        </div>
        <button type="submit" id="analyze-btn">
          <span class="btn-text">Analyze!</span>
          <span class="spinner" style="display: none;"></span>
        </button>
      </form>
      <div id="result-container">
        <h2>Gemini's Answer:</h2>
        <p id="result-text">Your answer will appear here...</p>
      </div>
    </div>
    <script src="/static/script.js"></script>
  </body>
</html>

CSS (static/style.css): Inside the static folder, create style.css

/* Base Styles */
body {
    font-family: sans-serif;
    background-color: #f4f4f9;
    color: #333;
}

.container {
    max-width: 600px;
    margin: 40px auto;
    padding: 20px;
    background: #fff;
    border-radius: 8px;
    box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
}

h1 {
    color: #444;
}

/* Form Styles */
.form-group {
    margin-bottom: 20px;
}

input[type="file"],
input[type="text"] {
    width: 100%;
    padding: 10px;
    border-radius: 4px;
    border: 1px solid #ddd;
    box-sizing: border-box;
}

/* Button Styles */
button {
    width: 100%;
    padding: 10px;
    border: none;
    background-color: #5c67f2;
    color: white;
    border-radius: 4px;
    cursor: pointer;
    font-size: 16px;
    transition: opacity 0.3s ease;
}

button:hover {
    background-color: #4a56e2;
}

button:disabled {
    opacity: 0.7;
    cursor: not-allowed;
}

/* Spinner Animation */
.spinner {
    border: 3px solid rgba(255, 255, 255, 0.3);
    border-top: 3px solid white;
    border-radius: 50%;
    width: 16px;
    height: 16px;
    animation: spin 0.8s linear infinite;
    display: inline-block;
    vertical-align: middle;
}

@keyframes spin {
    0% {
        transform: rotate(0deg);
    }
    100% {
        transform: rotate(360deg);
    }
}

/* Result Container */
#result-container {
    margin-top: 30px;
    padding-top: 20px;
    border-top: 1px solid #eee;
}

#result-text {
    line-height: 1.6;
}

/* Markdown Styling */
#result-text h1,
#result-text h2,
#result-text h3 {
    margin-top: 1em;
    margin-bottom: 0.5em;
}

#result-text p {
    margin-bottom: 1em;
}

#result-text ul,
#result-text ol {
    margin-left: 20px;
    margin-bottom: 1em;
}

#result-text code {
    background-color: #f4f4f9;
    padding: 2px 6px;
    border-radius: 3px;
    font-family: monospace;
}

#result-text pre {
    background-color: #f4f4f9;
    padding: 10px;
    border-radius: 4px;
    overflow-x: auto;
}

#result-text blockquote {
    border-left: 4px solid #5c67f2;
    padding-left: 15px;
    margin: 1em 0;
    color: #666;
}

JavaScript (static/script.js): Inside static, create script.js

document.getElementById('analyzer-form').addEventListener('submit', async function (event) {
    event.preventDefault();
    const formData = new FormData(event.target);
    const resultText = document.getElementById('result-text');
    const analyzeBtn = document.getElementById('analyze-btn');
    const btnText = analyzeBtn.querySelector('.btn-text');
    const spinner = analyzeBtn.querySelector('.spinner');

    // Show spinner and disable button
    btnText.style.display = 'none';
    spinner.style.display = 'inline-block';
    analyzeBtn.disabled = true;
    resultText.innerHTML = "<em>Analyzing... please wait.</em>";

    try {
        const response = await fetch('/analyze', {
            method: 'POST',
            body: formData
        });

        const data = await response.json();

        // Render markdown response
        if (data.text) {
            resultText.innerHTML = marked.parse(data.text);
        } else {
            resultText.innerHTML = `<span style="color: #e74c3c;">${data.error}</span>`;
        }
    } catch (error) {
        resultText.innerHTML = `<span style="color: #e74c3c;">Error: ${error.message}</span>`;
    } finally {
        // Hide spinner and re-enable button
        btnText.style.display = 'inline';
        spinner.style.display = 'none';
        analyzeBtn.disabled = false;
    }
});

Step 9: Build the Back-End (Update app.py)

Replace the entire contents of your app.py file with this new Flask server code.

import os
import google.generativeai as genai
import PIL.Image
from dotenv import load_dotenv
from flask import Flask, request, jsonify, render_template

# --- CONFIGURATION ---
load_dotenv()
try:
    genai.configure(api_key=os.environ["GEMINI_API_KEY"])
except KeyError:
    raise ValueError("GEMINI_API_KEY not found in .env file.")

# --- MODEL INITIALIZATION ---
model = genai.GenerativeModel('gemini-2.5-flash')

# --- FLASK APP ---
app = Flask(__name__)

# --- ROUTES ---
@app.route('/')
def index():
    """Renders the main HTML page."""
    return render_template('index.html')

@app.route('/analyze', methods=['POST'])
def analyze():
    """Handles the image and prompt submission for analysis."""
    if 'image' not in request.files:
        return jsonify({'error': 'No image file provided'}), 400

    image_file = request.files['image']
    prompt = request.form.get('prompt', 'Describe this image.')

    try:
        image = PIL.Image.open(image_file.stream)
        response = model.generate_content([prompt, image])
        return jsonify({'text': response.text})
    except Exception as e:
        return jsonify({'error': f'An error occurred: {e}'}), 500

# --- RUN THE APP ---
if __name__ == '__main__':
    app.run(debug=True)

Step 10: Launch the Web Application! 🚀

Go to your terminal (ensure (venv) is active)

Run the Flask app: