Welcome to the Codelab! 👋
This codelab will guide you through a complete development journey. You will start by building a simple command-line Python script to understand the core logic of Gemini, and then evolve it into a fully interactive web application with a front-end and back-end.
What You'll Build:
- A Python script that analyzes a local image and a text prompt
- A web app where users can upload an image, ask a question, and see Gemini's response in their browser
Prerequisites
- Python 3.8+ installed on your system
- A Google Gemini API Key: Get your free key from Google AI Studio
- A code editor like VS Code
Part 1: The Core Logic (Command-Line App)
First, we'll prove the concept with a simple script. This ensures the AI part works before we build the web interface.
Step 1: Set Up Your Project Folder
Open your terminal and create a clean workspace.
# Create a new folder for our project
mkdir gemini-web-app
# Navigate into the new folder
cd gemini-web-app
Step 2: Create and Activate a Python Virtual Environment
This isolates our project's libraries.
# Create a virtual environment named 'venv'
python -m venv venv
Now, activate it:
- On macOS / Linux:
source venv/bin/activate
- On Windows:
.\venv\Scripts\activate
(venv)
Step 3: Install Core Libraries
We need pip to install the packages for Gemini and to handle our API key.
pip install google-generativeai pillow python-dotenv
What we just installed:
- google-generativeai: The official Google client library
- Pillow: For opening and handling images in Python
- python-dotenv: To manage our secret API key
Step 4: Add an Image and Your API Key
- Download an Image: Find an image of a famous landmark (e.g., Eiffel Tower) and save it inside your
gemini-web-app
folder aslandmark.jpg
- Create .env file: In the same folder, create a new file named
.env
Add your key: Open the .env file and add your Gemini API key like so:
GEMINI_API_KEY="YOUR_API_KEY_HERE"
Step 5: Create the First Python Script
- Create a Python file named
app.py
- Paste the following code into it. This is our initial command-line version.
import os
import google.generativeai as genai
import PIL.Image
from dotenv import load_dotenv
# --- Load the API Key ---
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
raise ValueError("Gemini API key not found. Please set it in the .env file.")
# --- Configure the Gemini Client ---
genai.configure(api_key=api_key)
# --- Create the Model ---
print("Loading Gemini model...")
model = genai.GenerativeModel('gemini-2.5-flash')
# --- Prepare Image and Prompt ---
image = PIL.Image.open("landmark.jpg")
prompt = "What are three interesting facts about this landmark?"
# --- Generate Content ---
print("Asking Gemini...")
response = model.generate_content([prompt, image])
# --- Display the Result ---
print("\n--- Gemini's Response ---")
print(response.text)
print("-------------------------\n")
Step 6: Run and Verify
Execute the script from your terminal:
python app.py
Part 2: Level Up to a Web Application
Now that the core works, let's wrap it in a user-friendly web interface using Flask.
Step 7: Update Project Structure and Install Flask
Install Flask:
pip install Flask
Create new folders: We need folders to organize our HTML and CSS files.
mkdir templates
mkdir static
Your project folder should now look like this:
Step 8: Create the Front-End (HTML, CSS, JS)
We'll create three files for the user interface.
HTML (templates/index.html): Inside the templates folder, create index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Multimodal Analyzer</title>
<link rel="stylesheet" href="/static/style.css" />
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
</head>
<body>
<div class="container">
<h1>Image + Text Analyzer 🧠</h1>
<p>Upload an image and ask Gemini a question about it.</p>
<form id="analyzer-form">
<div class="form-group">
<label for="image-upload">Upload Image</label>
<input
type="file"
id="image-upload"
name="image"
accept="image/*"
required
/>
</div>
<div class="form-group">
<label for="prompt-input">Ask a Question</label>
<input
type="text"
id="prompt-input"
name="prompt"
placeholder="e.g., Suggest a funny caption for this"
required
/>
</div>
<button type="submit" id="analyze-btn">
<span class="btn-text">Analyze!</span>
<span class="spinner" style="display: none;"></span>
</button>
</form>
<div id="result-container">
<h2>Gemini's Answer:</h2>
<p id="result-text">Your answer will appear here...</p>
</div>
</div>
<script src="/static/script.js"></script>
</body>
</html>
CSS (static/style.css): Inside the static folder, create style.css
/* Base Styles */
body {
font-family: sans-serif;
background-color: #f4f4f9;
color: #333;
}
.container {
max-width: 600px;
margin: 40px auto;
padding: 20px;
background: #fff;
border-radius: 8px;
box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
}
h1 {
color: #444;
}
/* Form Styles */
.form-group {
margin-bottom: 20px;
}
input[type="file"],
input[type="text"] {
width: 100%;
padding: 10px;
border-radius: 4px;
border: 1px solid #ddd;
box-sizing: border-box;
}
/* Button Styles */
button {
width: 100%;
padding: 10px;
border: none;
background-color: #5c67f2;
color: white;
border-radius: 4px;
cursor: pointer;
font-size: 16px;
transition: opacity 0.3s ease;
}
button:hover {
background-color: #4a56e2;
}
button:disabled {
opacity: 0.7;
cursor: not-allowed;
}
/* Spinner Animation */
.spinner {
border: 3px solid rgba(255, 255, 255, 0.3);
border-top: 3px solid white;
border-radius: 50%;
width: 16px;
height: 16px;
animation: spin 0.8s linear infinite;
display: inline-block;
vertical-align: middle;
}
@keyframes spin {
0% {
transform: rotate(0deg);
}
100% {
transform: rotate(360deg);
}
}
/* Result Container */
#result-container {
margin-top: 30px;
padding-top: 20px;
border-top: 1px solid #eee;
}
#result-text {
line-height: 1.6;
}
/* Markdown Styling */
#result-text h1,
#result-text h2,
#result-text h3 {
margin-top: 1em;
margin-bottom: 0.5em;
}
#result-text p {
margin-bottom: 1em;
}
#result-text ul,
#result-text ol {
margin-left: 20px;
margin-bottom: 1em;
}
#result-text code {
background-color: #f4f4f9;
padding: 2px 6px;
border-radius: 3px;
font-family: monospace;
}
#result-text pre {
background-color: #f4f4f9;
padding: 10px;
border-radius: 4px;
overflow-x: auto;
}
#result-text blockquote {
border-left: 4px solid #5c67f2;
padding-left: 15px;
margin: 1em 0;
color: #666;
}
JavaScript (static/script.js): Inside static, create script.js
document.getElementById('analyzer-form').addEventListener('submit', async function (event) {
event.preventDefault();
const formData = new FormData(event.target);
const resultText = document.getElementById('result-text');
const analyzeBtn = document.getElementById('analyze-btn');
const btnText = analyzeBtn.querySelector('.btn-text');
const spinner = analyzeBtn.querySelector('.spinner');
// Show spinner and disable button
btnText.style.display = 'none';
spinner.style.display = 'inline-block';
analyzeBtn.disabled = true;
resultText.innerHTML = "<em>Analyzing... please wait.</em>";
try {
const response = await fetch('/analyze', {
method: 'POST',
body: formData
});
const data = await response.json();
// Render markdown response
if (data.text) {
resultText.innerHTML = marked.parse(data.text);
} else {
resultText.innerHTML = `<span style="color: #e74c3c;">${data.error}</span>`;
}
} catch (error) {
resultText.innerHTML = `<span style="color: #e74c3c;">Error: ${error.message}</span>`;
} finally {
// Hide spinner and re-enable button
btnText.style.display = 'inline';
spinner.style.display = 'none';
analyzeBtn.disabled = false;
}
});
Step 9: Build the Back-End (Update app.py)
Replace the entire contents of your app.py
file with this new Flask server code.
import os
import google.generativeai as genai
import PIL.Image
from dotenv import load_dotenv
from flask import Flask, request, jsonify, render_template
# --- CONFIGURATION ---
load_dotenv()
try:
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
except KeyError:
raise ValueError("GEMINI_API_KEY not found in .env file.")
# --- MODEL INITIALIZATION ---
model = genai.GenerativeModel('gemini-2.5-flash')
# --- FLASK APP ---
app = Flask(__name__)
# --- ROUTES ---
@app.route('/')
def index():
"""Renders the main HTML page."""
return render_template('index.html')
@app.route('/analyze', methods=['POST'])
def analyze():
"""Handles the image and prompt submission for analysis."""
if 'image' not in request.files:
return jsonify({'error': 'No image file provided'}), 400
image_file = request.files['image']
prompt = request.form.get('prompt', 'Describe this image.')
try:
image = PIL.Image.open(image_file.stream)
response = model.generate_content([prompt, image])
return jsonify({'text': response.text})
except Exception as e:
return jsonify({'error': f'An error occurred: {e}'}), 500
# --- RUN THE APP ---
if __name__ == '__main__':
app.run(debug=True)
Step 10: Launch the Web Application! 🚀
- Go to your terminal (ensure
(venv)
is active)
Run the Flask app:
python app.py
- The terminal will show a URL, typically
http://127.0.0.1:5000
- Open this URL in your web browser
Your UI should look like this:

Congratulations! 🎊
You have successfully built a multimodal application from a simple script to a full-fledged web app. You've combined Python, Gemini, Flask, and basic web technologies to create a powerful AI-driven tool.