Using Google’s Gemini to manage photos in G-Drive

Using Google’s Gemini (specifically, the Gemini 1.0 Pro model through the Google AI Studio or API) to manage photos in Google Drive is a powerful idea. Gemini’s multimodal capabilities mean it can understand the content of your images, not just their filenames.

Here’s a comprehensive guide on how to approach this, from the basic concept to a practical Python example.

The Core Idea: How It Works

You will use a two-pronged approach:

Google Drive API: To list, search, and manage files (move, delete, rename).
Google Generative AI API (Gemini): To analyze the content of the images and generate descriptions, categories, or other metadata based on what it “sees.”

You’ll write a script that:

Fetches images from a specific Drive folder.
For each image, downloads it (or a thumbnail) temporarily.
Sends the image data to the Gemini model with a prompt.
Interprets Gemini’s response to perform an action in Drive (e.g., move to a new folder, add a description, rename the file).

Step-by-Step Implementation Guide

Prerequisites:

Python Environment: Set up Python on your computer.
Install Libraries:
bash pip install google-generativeai google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client pillow
API Keys & Enable APIs:
- Google AI Studio API Key: Go to Google AI Studio, sign in, and create an API key for the Gemini API.
- Google Cloud Project & Credentials:
  1. Go to the Google Cloud Console.
  2. Create a new project or select an existing one.
  3. Enable the “Google Drive API”.
  4. Enable the “Generative Language API” (this might be listed under a different name like “Gen AI API”).
  5. Go to Credentials and create OAuth 2.0 Client IDs for a “Desktop application”. This will download a credentials.json file. This is necessary for the Drive API to access your personal files.

Example 1: Categorize & Organize Photos by Content

This script will:

Ask Gemini what’s in a photo.
Create a folder for that category (e.g., “Cats”, “Beaches”, “Documents”).
Move the photo into that folder.

Python Code (gemini_drive_manager.py):

import os
import google.generativeai as genai
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
import io
from PIL import Image

# ===== CONFIGURATION =====
# Gemini API Key from AI Studio
GEMINI_API_KEY = 'YOUR_GEMINI_API_KEY_HERE'

# Define the scopes for Drive API
SCOPES = ['https://www.googleapis.com/auth/drive']

# ID of the Drive folder where your unsorted photos are
SOURCE_FOLDER_ID = 'your_source_folder_id_here'

# Authenticate and create services
def authenticate():
    # Drive API Authentication (OAuth)
    creds = None
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
    if not creds or not creds.valid:
        flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
        creds = flow.run_local_server(port=0)
        with open('token.json', 'w') as token:
            token.write(creds.to_json())
    drive_service = build('drive', 'v3', credentials=creds)

    # Gemini API Configuration (API Key)
    genai.configure(api_key=GEMINI_API_KEY)
    gemini_model = genai.GenerativeModel('gemini-1.0-pro-vision')

    return drive_service, gemini_model

def get_file_list(service, folder_id):
    # Get all image files from the source folder
    query = f"'{folder_id}' in parents and (mimeType contains 'image/' or mimeType='application/pdf')"
    results = service.files().list(q=query, fields="files(id, name, mimeType)").execute()
    return results.get('files', [])

def download_file(service, file_id, file_name):
    # Download a file from Drive
    request = service.files().get_media(fileId=file_id)
    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while not done:
        status, done = downloader.next_chunk()
    fh.seek(0)
    return fh

def create_folder(service, folder_name, parent_id=None):
    # Create a new folder in Drive, optionally inside a parent folder
    file_metadata = {
        'name': folder_name,
        'mimeType': 'application/vnd.google-apps.folder'
    }
    if parent_id:
        file_metadata['parents'] = [parent_id]
    folder = service.files().create(body=file_metadata, fields='id').execute()
    print(f'Created folder: {folder_name}')
    return folder.get('id')

def move_file(service, file_id, new_folder_id):
    # Move a file to a new folder
    # First, get the current parents to remove later
    file = service.files().get(fileId=file_id, fields='parents').execute()
    previous_parents = ",".join(file.get('parents'))

    # Update the file to add the new parent and remove the old one
    service.files().update(
        fileId=file_id,
        addParents=new_folder_id,
        removeParents=previous_parents,
        fields='id, parents'
    ).execute()
    print(f'Moved file {file_id} to folder {new_folder_id}')

def analyze_image_with_gemini(model, image_bytes, mime_type):
    # Prompt Gemini to analyze the image
    prompt = "Analyze this image and suggest a single, simple category name (e.g., 'Cat', 'Beach Vacation', 'Receipt', 'Diagram'). Just return the category name, nothing else."

    image_part = {
        "mime_type": mime_type,
        "data": image_bytes.getvalue() # Pass the image bytes
    }

    response = model.generate_content([prompt, image_part])
    return response.text.strip().lower() # e.g., "cat"

def main():
    # Authenticate with both services
    drive_service, gemini_model = authenticate()

    # Get list of files to process
    files = get_file_list(drive_service, SOURCE_FOLDER_ID)
    print(f"Found {len(files)} files to process.")

    # Create a cache for folder IDs so we don't create duplicates
    category_folder_ids = {}

    for file in files:
        file_id = file['id']
        file_name = file['name']
        mime_type = file['mimeType']
        print(f"\nProcessing: {file_name}")

        try:
            # 1. Download the file
            file_bytes = download_file(drive_service, file_id, file_name)

            # 2. Analyze it with Gemini
            category = analyze_image_with_gemini(gemini_model, file_bytes, mime_type)
            print(f"Gemini says: '{category}'")

            # 3. Check if we've already created a folder for this category
            if category not in category_folder_ids:
                # Create the new folder inside our source folder
                new_folder_id = create_folder(drive_service, category, SOURCE_FOLDER_ID)
                category_folder_ids[category] = new_folder_id

            # 4. Move the file to its category folder
            move_file(drive_service, file_id, category_folder_ids[category])

        except Exception as e:
            print(f"Failed to process {file_name}: {e}")

    print("\nFinished organizing photos!")

if __name__ == '__main__':
    main()

Important Notes for this Script:

Cost: The Gemini API is not free for high usage. You will be charged per character for the input (the image data is calculated based on its size) and output. Check the pricing page.
Safety: Always test with a copy of your photos, not your originals!
Rate Limiting: The free tier has quotas. For large libraries, you will need to add delays (time.sleep()) between requests.
Prompt Engineering: The prompt ("Analyze this image and suggest a single, simple category name...") is crucial. You can change it to anything:
- "Is this a photo of a dog or a cat? Answer only 'dog' or 'cat' or 'other'."
- "Extract the text from this document." (for OCR)
- "Describe the mood of this photo in one word (e.g., happy, serene, chaotic)."

Example 2: Generate Descriptions for Images (Metadata)

Instead of moving files, you can use Gemini to generate descriptions and then write them to the file’s description field in Drive.

Modified analyze_image_with_gemini function:

def analyze_image_with_gemini(model, image_bytes, mime_type):
    prompt = "Describe this image in a single, concise sentence suitable for alt-text or an image description."
    image_part = {
        "mime_type": mime_type,
        "data": image_bytes.getvalue()
    }
    response = model.generate_content([prompt, image_part])
    return response.text.strip()

# Then, instead of move_file, use this to update the description:
def update_file_description(service, file_id, description):
    body = {'description': description}
    service.files().update(fileId=file_id, body=body).execute()
    print(f"Updated description for {file_id}")

Advanced Ideas & Next Steps

Smart Search: Once your images have descriptions, you can search Drive for description:"my dog playing in the park" and find the relevant images, even if their filenames are IMG_1234.jpg.
Face Grouping (Conceptual): While Gemini isn’t a dedicated facial recognition API, you could prompt it with: "Does this photo contain a person? If so, describe their appearance (e.g., 'man with beard and glasses', 'child with red hair')" and use that description to group photos of the same person.
Duplicate Finding: Download small thumbnails of all images, ask Gemini to describe them concisely, and then find files with identical or very similar descriptions.

By combining these two powerful Google APIs, you can move from managing photos based on filenames to managing them based on their actual content.

Using Google’s Gemini to manage photos in G-Drive

The Core Idea: How It Works

Step-by-Step Implementation Guide

Prerequisites:

Example 1: Categorize & Organize Photos by Content

Example 2: Generate Descriptions for Images (Metadata)

Advanced Ideas & Next Steps

Comments

Leave a Reply Cancel reply

More posts

Eat These 5 Foods to Activate Autophagy Without Fasting

Enabling live subtitles on IPTV

The #1 Best Meal to Protect Your Heart and Support Healthier Arteries (According to an A&E Doctor)

Rebellious Bots

New Study: A Ketogenic Diet May Reverse Biological Aging in People with Obesity

Chios Mastic Gum: A Traditional Resin with Emerging Evidence for Gut Health and Selective Anticancer Activity