What is sitemap

Files required when registering a site with search engines such as Google and Yahoo. The one that is asked in Search Console (Google search registration service). The contents mainly contain information on the page you want to display in the search.

Added sitemap_generator

Added to Gemfile. * Click here for Github (https://github.com/kjvarga/sitemap_generator)


gem 'sitemap_generator'

Install gem

$ bundle install

Create sitemap.rb

Execute the following command to generate config / sitemap.rb.

$ rails sitemap:install

Edit the generated config / sitemap.rb. SitemapGenerator :: Sitemap.default_host contains the production host. In SitemapGenerator :: Sitemap.create, describe the page you want to register for search.


require 'rubygems'
require 'sitemap_generator'

SitemapGenerator::Sitemap.default_host = "http://www.example.com"
SitemapGenerator::Sitemap.create do
  add '/', changefreq: 'weekly', priority: 0.9
  add '/about', changefreq: 'weekly', priority: 0.5

  User.all.each do |user|
    add user_path(user), lastmod: spot.updated_at

Try running it locally.

$ rails sitemap:refresh

Notifications run on sitemap updates and search engines. If you don't want to notify the search engine, add no_ping.

$ rails sitemap:refresh:no_ping

When you execute it, you can see that public / sitemap.xml.gz is generated. You can download it from http: //localhost: 3000/sitemap.xml.gz.

[GCP] cron job settings

Since sitemap.xml.gz needs to be updated every time a User is created, it is executed periodically every day with cron job. Added an endpoint for that.


class CronJobsController
  def refresh
    logger.info `bundle exec rails sitemap:refresh`
    head :ok

  rescure StandardError => e
    logger.error e.full_message
    head :internal_server_error


Rails.application.routes.draw do

  resources :sitemaps, only: [:index]

Added settings to cron.yaml.


- description: sitemap
  url: /cron_jobs/sitemaps
  timezone: Asia/Tokyo
  schedule: every day 03:00

Deploy the settings for cron jobs.

$ gcloud app deploy cron.yaml --project=target-project

This completes the settings for periodic execution by cron jobs.

[GCP] Set sitemap.xml.gz to go up to GCS

GAE in the production environment handles 3 instances for scale-out. Therefore, if you dynamically generate a file and place it on an instance, there is only a 1/3 chance that the file will hit.

In the first place, putting the generated file in the instance under the PaaS environment is an anti-pattern. It is good to upload to external storage (GCS). There is no problem if you are using Computed Engine etc.

Add the GCS settings to config / sitemap.rb as described in the sitemap_generator documentation.

SitemapGenerator::GoogleStorageAdapter Uses Google::Cloud::Storage to upload to Google Cloud storage. You must require 'google/cloud/storage' in your sitemap config before using this adapter. An example of using this adapter in your sitemap configuration with options:

by https://github.com/kjvarga/sitemap_generator#upload-sitemaps-to-a-remote-host-using-adapters


require 'rubygems'
require 'sitemap_generator'
require 'google/cloud/storage'

SitemapGenerator::Sitemap.default_host = ENV['BASE_URL']
SitemapGenerator::Sitemap.sitemaps_host = "https://storage.googleapis.com/#{ENV['GOOGLE_BUCKET']}"
SitemapGenerator::Sitemap.adapter = SitemapGenerator::GoogleStorageAdapter.new(
  credentials: ENV['GOOGLE_CREDENTIAL'],
  project_id: ENV['GOOGLE_PROJECT_ID'],
  bucket: ENV['GOOGLE_BUCKET']
SitemapGenerator::Sitemap.create do
  add '/', changefreq: 'weekly', priority: 0.9
  add '/about', changefreq: 'weekly', priority: 0.5

  User.all.each do |user|
    add user_path(user), lastmod: spot.updated_at

Added a routing to redirect to sitemap.xml on GCS when access comes at https://domain/sitemap.xml.gz.


Rails.application.routes.draw do

  get '/sitemap.xml.gz', to: redirect("https://storage.googleapis.com/#{ENV['GOOGLE_BUCKET']}/sitemap.xml.gz", status: 301)

Now, if you deploy the GAE instance again, the setting is completed. After running cron job, you can download sitemap.xml.gz by accessing https://domain/sitemap.xml.gz.


