Built by Ifihan | Contribute to this page

🧠 Gemini Multimodal Web App

From Python Script to Web Application

Page 1 of 11

Welcome to the Codelab! 👋

This codelab will guide you through a complete development journey. You will start by building a simple command-line Python script to understand the core logic of Gemini, and then evolve it into a fully interactive web application with a front-end and back-end.

What You'll Build:

  1. A Python script that analyzes a local image and a text prompt
  2. A web app where users can upload an image, ask a question, and see Gemini's response in their browser
⏱️ Time to Complete: 20-25 minutes

Prerequisites

  1. Python 3.8+ installed on your system
  2. A Google Gemini API Key: Get your free key from Google AI Studio
  3. A code editor like VS Code

Part 1: The Core Logic (Command-Line App)

First, we'll prove the concept with a simple script. This ensures the AI part works before we build the web interface.

Step 1: Set Up Your Project Folder

Open your terminal and create a clean workspace.

# Create a new folder for our project mkdir gemini-web-app # Navigate into the new folder cd gemini-web-app

Step 2: Create and Activate a Python Virtual Environment

This isolates our project's libraries.

# Create a virtual environment named 'venv' python -m venv venv

Now, activate it:

  • On macOS / Linux: source venv/bin/activate
  • On Windows: .\venv\Scripts\activate
Your terminal prompt should now start with (venv)

Step 3: Install Core Libraries

We need pip to install the packages for Gemini and to handle our API key.

pip install google-generativeai pillow python-dotenv

What we just installed:

  • google-generativeai: The official Google client library
  • Pillow: For opening and handling images in Python
  • python-dotenv: To manage our secret API key

Step 4: Add an Image and Your API Key

  1. Download an Image: Find an image of a famous landmark (e.g., Eiffel Tower) and save it inside your gemini-web-app folder as landmark.jpg
  2. Create .env file: In the same folder, create a new file named .env

Add your key: Open the .env file and add your Gemini API key like so:

GEMINI_API_KEY="YOUR_API_KEY_HERE"

Step 5: Create the First Python Script

  1. Create a Python file named app.py
  2. Paste the following code into it. This is our initial command-line version.
import os import google.generativeai as genai import PIL.Image from dotenv import load_dotenv # --- Load the API Key --- load_dotenv() api_key = os.getenv("GEMINI_API_KEY") if not api_key: raise ValueError("Gemini API key not found. Please set it in the .env file.") # --- Configure the Gemini Client --- genai.configure(api_key=api_key) # --- Create the Model --- print("Loading Gemini model...") model = genai.GenerativeModel('gemini-2.5-flash') # --- Prepare Image and Prompt --- image = PIL.Image.open("landmark.jpg") prompt = "What are three interesting facts about this landmark?" # --- Generate Content --- print("Asking Gemini...") response = model.generate_content([prompt, image]) # --- Display the Result --- print("\n--- Gemini's Response ---") print(response.text) print("-------------------------\n")

Step 6: Run and Verify

Execute the script from your terminal:

python app.py
✅ Success! You should see a text-only response from Gemini printed directly in your terminal. This confirms our core AI logic is working perfectly!

Part 2: Level Up to a Web Application

Now that the core works, let's wrap it in a user-friendly web interface using Flask.

Step 7: Update Project Structure and Install Flask

Install Flask:

pip install Flask

Create new folders: We need folders to organize our HTML and CSS files.

mkdir templates mkdir static

Your project folder should now look like this:

gemini-web-app/ ├── venv/ ├── static/ ├── templates/ ├── .env ├── app.py └── landmark.jpg

Step 8: Create the Front-End (HTML, CSS, JS)

We'll create three files for the user interface.

HTML (templates/index.html): Inside the templates folder, create index.html

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <title>Multimodal Analyzer</title> <link rel="stylesheet" href="/static/style.css" /> <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> </head> <body> <div class="container"> <h1>Image + Text Analyzer 🧠</h1> <p>Upload an image and ask Gemini a question about it.</p> <form id="analyzer-form"> <div class="form-group"> <label for="image-upload">Upload Image</label> <input type="file" id="image-upload" name="image" accept="image/*" required /> </div> <div class="form-group"> <label for="prompt-input">Ask a Question</label> <input type="text" id="prompt-input" name="prompt" placeholder="e.g., Suggest a funny caption for this" required /> </div> <button type="submit" id="analyze-btn"> <span class="btn-text">Analyze!</span> <span class="spinner" style="display: none;"></span> </button> </form> <div id="result-container"> <h2>Gemini's Answer:</h2> <p id="result-text">Your answer will appear here...</p> </div> </div> <script src="/static/script.js"></script> </body> </html>

CSS (static/style.css): Inside the static folder, create style.css

/* Base Styles */ body { font-family: sans-serif; background-color: #f4f4f9; color: #333; } .container { max-width: 600px; margin: 40px auto; padding: 20px; background: #fff; border-radius: 8px; box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1); } h1 { color: #444; } /* Form Styles */ .form-group { margin-bottom: 20px; } input[type="file"], input[type="text"] { width: 100%; padding: 10px; border-radius: 4px; border: 1px solid #ddd; box-sizing: border-box; } /* Button Styles */ button { width: 100%; padding: 10px; border: none; background-color: #5c67f2; color: white; border-radius: 4px; cursor: pointer; font-size: 16px; transition: opacity 0.3s ease; } button:hover { background-color: #4a56e2; } button:disabled { opacity: 0.7; cursor: not-allowed; } /* Spinner Animation */ .spinner { border: 3px solid rgba(255, 255, 255, 0.3); border-top: 3px solid white; border-radius: 50%; width: 16px; height: 16px; animation: spin 0.8s linear infinite; display: inline-block; vertical-align: middle; } @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } } /* Result Container */ #result-container { margin-top: 30px; padding-top: 20px; border-top: 1px solid #eee; } #result-text { line-height: 1.6; } /* Markdown Styling */ #result-text h1, #result-text h2, #result-text h3 { margin-top: 1em; margin-bottom: 0.5em; } #result-text p { margin-bottom: 1em; } #result-text ul, #result-text ol { margin-left: 20px; margin-bottom: 1em; } #result-text code { background-color: #f4f4f9; padding: 2px 6px; border-radius: 3px; font-family: monospace; } #result-text pre { background-color: #f4f4f9; padding: 10px; border-radius: 4px; overflow-x: auto; } #result-text blockquote { border-left: 4px solid #5c67f2; padding-left: 15px; margin: 1em 0; color: #666; }

JavaScript (static/script.js): Inside static, create script.js

document.getElementById('analyzer-form').addEventListener('submit', async function (event) { event.preventDefault(); const formData = new FormData(event.target); const resultText = document.getElementById('result-text'); const analyzeBtn = document.getElementById('analyze-btn'); const btnText = analyzeBtn.querySelector('.btn-text'); const spinner = analyzeBtn.querySelector('.spinner'); // Show spinner and disable button btnText.style.display = 'none'; spinner.style.display = 'inline-block'; analyzeBtn.disabled = true; resultText.innerHTML = "<em>Analyzing... please wait.</em>"; try { const response = await fetch('/analyze', { method: 'POST', body: formData }); const data = await response.json(); // Render markdown response if (data.text) { resultText.innerHTML = marked.parse(data.text); } else { resultText.innerHTML = `<span style="color: #e74c3c;">${data.error}</span>`; } } catch (error) { resultText.innerHTML = `<span style="color: #e74c3c;">Error: ${error.message}</span>`; } finally { // Hide spinner and re-enable button btnText.style.display = 'inline'; spinner.style.display = 'none'; analyzeBtn.disabled = false; } });

Step 9: Build the Back-End (Update app.py)

Replace the entire contents of your app.py file with this new Flask server code.

import os import google.generativeai as genai import PIL.Image from dotenv import load_dotenv from flask import Flask, request, jsonify, render_template # --- CONFIGURATION --- load_dotenv() try: genai.configure(api_key=os.environ["GEMINI_API_KEY"]) except KeyError: raise ValueError("GEMINI_API_KEY not found in .env file.") # --- MODEL INITIALIZATION --- model = genai.GenerativeModel('gemini-2.5-flash') # --- FLASK APP --- app = Flask(__name__) # --- ROUTES --- @app.route('/') def index(): """Renders the main HTML page.""" return render_template('index.html') @app.route('/analyze', methods=['POST']) def analyze(): """Handles the image and prompt submission for analysis.""" if 'image' not in request.files: return jsonify({'error': 'No image file provided'}), 400 image_file = request.files['image'] prompt = request.form.get('prompt', 'Describe this image.') try: image = PIL.Image.open(image_file.stream) response = model.generate_content([prompt, image]) return jsonify({'text': response.text}) except Exception as e: return jsonify({'error': f'An error occurred: {e}'}), 500 # --- RUN THE APP --- if __name__ == '__main__': app.run(debug=True)

Step 10: Launch the Web Application! 🚀

  1. Go to your terminal (ensure (venv) is active)

Run the Flask app:

python app.py
  1. The terminal will show a URL, typically http://127.0.0.1:5000
  2. Open this URL in your web browser

Your UI should look like this:

Web Application UI
🎉 Success! You will now see your web application. Upload an image, type a question, and get a response from Gemini, all within a user-friendly interface!

Congratulations! 🎊

You have successfully built a multimodal application from a simple script to a full-fledged web app. You've combined Python, Gemini, Flask, and basic web technologies to create a powerful AI-driven tool.