I have just finished migrating my long-neglected blog from a self-hosted Wordpress installation on my own VPS to a static site hosted on the Amazon S3 storage service using its website hosting feature.
This post captures the main steps involved, in case anyone else is interested in doing the same thing.
Why Migrate from Wordpress?
My blog has run well enough on a self-hosted Wordpress set-up for years, first at a cheap site hosting company (where performance was terrible) then on my own Linode VPS server (where performance has been great).
However, I have never been very happy relying on the PHP-based Wordpress app due to the frequent security updates it requires and my gut-level distrust of PHP itself. I know it well enough not to trust it, especially when my blog can lie neglected for months at a time when I'm busy. It made me nervous knowing I could be one tardy security-update away from a hacked site and server.
Besides, this blog is small enough that any powerful blogging system is overkill for my needs.
Static Site Generation
Because my blog is simple, it is a perfect candidate to be served as a static site instead of a dynamically-generated one. Instead of Wordpress and PHP code running on my server to generate HTML pages when someone lands on a page, why not just serve HTML pages directly? This is about as simple as it can get.
But even a simple blog like this is far too complex to hand-code all the HTML and other files necessary to show do things like display a post, list all posts, provide RSS/Atom syndication feeds for blog reader software, provide sitemap XML files for Google, etc. etc. So I need a middle ground: a static site generator.
At this stage it's worth saying explicitly that using static site generators, while "simpler" from a web-server point of view, requires much more knowledge about how web sites work and how to host them. As the rest of this post will make clear, generating and hosting your own static site is an involved and deeply nerdy process best suited to people who are web programmers, or interested in becoming one.
Pelican
There are roughly fourteen bazillion site generator projects around, with the number growing constantly.
I chose Pelican for my blog because:
- It seems relatively well-established and mature in the field.
- It allows posts to be written in Markdown (or other human-friendly markup languages)
- It's written in Python, a language I like and that I can easily hack on if I want to contribute fixes or improvements to the project.
- It has plugins that do things like generating sitemap files and other niceties that would be tedious to build myself.
I installed a very recent code version (pre 3.3) of Pelican into a pre-prepared virtualenv Python environment. I used the latest code because it has improvements that are particularly useful for migrating existing Wordpress sites. I also installed some additional requirements:
# Pelican pre 3.3 code version, with slug and pages import improvements
pip install -e git+git@github.com:jmurty/pelican.git@675d6c81#egg=pelican-dev
# Markdown for migrating and authoring posts in this markup language
pip install Markdown==2.3.1
# BeautifulSoup is required to migrate Wordpress posts
pip install BeautifulSoup
After installation I ran the pelican-quickstart
command to create initial
configuration files and some helper scripts.
Be warned though: the generated scripts have their own copies of configuration settings you choose while running the quickstart, so if you subsequently change your Pelican configuration files and run these scripts your config changes will have no effect. I found this annoying enough that I ended up removing the helper scripts and use the explicit Pelican commands instead.
Migrate Data from Wordpress
Migrate Your Posts
Migrating my existing Wordpress blog posts was only somewhat difficult and fiddly, to be honest it was easier than I expected:
- Manually install the additional Pandoc universal document converter tool per the Pelican Import documentation.
- Export and download your Wordpress posts and comments
as an XML file. I saved this file as
site.wordpress.xml
-
Run the
pelican-import
command to convert the Wordpress posts into Markdown-formatted files written to thecontent
directory:pelican-import --wpfile --dir-page \ --output content -markup markdown \ site.wordpress.xml
At this stage you will hopefully have a collection of Markdown files that
correspond very closely to your Wordpress posts. Take a look at some of the
files and note the metadata included at the top; fields like Title
and
Slug
capture important information about the original posts.
Unfortunately I found the conversion process was imperfect so I needed to look closely at many of the post files to check for quirks and then fix these issues across all posts.
I also decided to use Tag instead of Category groupings for my posts, which I should have done from the very beginning in Wordpress, but changing this was straight-forward with some global find-and-replace changes in the markdown files.
Migrate Your Comments
One drawback of static sites is that there is no way to do server-side processing of comments submitted by site visitors. At least, not without defeating the point of making your site static in the first place.
If you wish to allow comments on your static site you will need to use a javascript-based commenting system, such as the one provided by Disqus.
I am using Disqus on the now-static site, as you will see if you view or leave
comments below. It was quite easy to set this up: I created an account on
the service, added my site details, and set the DISQUS_SITENAME
setting in
Pelican's publishconf.py
configuration file.
Because I also wanted to keep the comments from my existing Wordpress blog site I also exported these comments by:
- installing the Disqus Comment System plugin on my Wordpress site.
- configuring my new Disqus site account in the plugin
- exporting the comments from Wordpress to Disqus via the Export Comments button.
- monitoring progress of the export on the Disqus dashboard.
The export process took a long time (days) and a few of my blog's comments didn't survive the process, which isn't a big deal for my site but might be a problem for others. Hopefully you will have better luck with this.
Generate the Static Site HTML
Now that your content is in place you can generate a static site to check how
it all looks. Here is the command to convert the markdown files in content/
to static files to the html/
directory:
# Add --debug flag to see exactly what is happening
pelican content/ -o html -s pelicanconf.py --delete-output-directory
You can now look directly at the generated files in the html/
directory, or
run Pelican in server mode so you can view your blog in a web browser at
http://localhost:8000/ using the develop_server.sh
helper
script generated by the "quickstart" process, or with the following explicit
commands:
# Run Python's built-in web server on html/ as a background job
cd html/
python -m SimpleHTTPServer &
Customise the Generated Site Layout
Once you can view the generated site in a browser you can customise a number of
things in your pelicanconf.py
configuration file to make the site work
exactly the way you want.
In my case I wanted the new static site to replace the original Wordpress one without breaking all the links, because Cool URIs don't change. To do this I configured Pelican very carefully to generate URL paths matching the datestamp + slug format I used in Wordpress, for example:
# Post/Article URL links have clean paths with date stamp + slug ...
ARTICLE_URL = '{date:%Y}/{date:%m}/{date:%d}/{slug}/'
# ... while the file is index.html to be auto-served from the dir location
ARTICLE_SAVE_AS = '{date:%Y}/{date:%m}/{date:%d}/{slug}/index.html'
I also customised where RSS/Atom feeds are saved, whether pages are displayed, and other things. Check out the Pelican documentation to see what you can do.
Theme
I created my own theme for the blog. You can tell because it's very simple, and very ugly (I'm no designer).
It was pretty easy to do this by following the Pelican theming documentation and stealing ideas and code snippets from the example themes available in Pelican Themes Github repository.
One particularly cool thing I will mention is how to use the excellent Solarized colour themes for code snippets, so at least that part of my blog isn't ugly. Generate the necessary CSS files like so:
# Install Pygments and solarized style
pip install Pygments pygments-style-solarized
# Generate CSS files of light and dark solarized colours
pygmentize -S solarizeddark -f html > solarizeddark.css
pygmentize -S solarizedlight -f html > solarizedlight.css
You can then add these CSS files to your template to get solarized code highlighting in your posts.
Note: I found I had to set the overall background colour to get things to look right, here's what I did:
/* Add this line to the top of solarizedlight.css */
pre { background: #fdf6e3; }
/* Add this line to the top of solarizeddark.css */
pre { background: #002b36 }
Pelican Plugins
The final Pelican-related tweaks I made were to add some existing plugins to do
useful work like generate a sitemap.xml
file and make extra information
available to help with site navigation, such as lists of next/previous and
related posts.
To use the plugins I added the pelican-plugins codebase to my project directory:
git submodule add https://github.com/getpelican/pelican-plugins
git submodule update --init
Then configured Pelican to find and use the relevant plugins:
PLUGIN_PATH = 'pelican-plugins'
PLUGINS = ['sitemap', 'neighbors', 'related_posts']
SITEMAP = {
'format': 'xml',
'changefreqs': {
'pages': 'daily',
}
}
Site Hosting with Amazon S3 in Website Hosting Mode
With a static blog site instead of a dynamic one, it was no longer necessary to host it on my VPS server. By moving the site entirely to a static-file-serving platform I could free up my VPS to the testing-ground I had originally intended.
With it's relatively recently-added support for website hosting, the Amazon S3 storage service made an attractive choice. It's pretty simple, extremely reliable, fairly priced, and I'm very familiar with it.
The process for setting this up was made a bit complex because I had some extra requirements:
- My blog should be accessible at both the james.murty.co root domain and the www.james.murty.co subdomain.
- The old RSS feed endpoint at http://www.james.murty.co/feed/ shouldn't change, so that subscribers continue receiving my blog posts (like this one).
To serve the site from S3 I needed to generate the site in a publishable form, set up my S3 account to store and serve it, and tweak my domain's DNS management to hook everything up correctly.
Generate Site for Publishing
By default the Pelican quickstart process produces two configuration files,
pelicanconf.py
and publishconf.py
. The former stores most of your settings
and is intended for use when developing or testing your site, while the latter
contains settings useful only for the published site.
In my case the publishconf.py
file has two extra settings:
- Relative URLs are disabled, to ensure in-post links work properly when read in an RSS feed reader.
- Disqus is configured to handle comments.
Before you publish your site, make sure you are using the correct
configuration file in the -s
switch to the pelican command. For example:
pelican content/ -o html -s publishconf.py --debug
Set Up S3 Buckets
Here are the steps I took to set up S3 hosting (using the S3 console website):
- Create S3 bucket names corresponding to my site's domain names:
james.murty.co
andwww.james.murty.co
- Upload the generated site files in the
html/
directory to the bucketjames.murty.co
. I used my Synchronize application, but any S3-compatible file copying program will do. - Visit the bucket's exact Endpoint URL as shown in the Static Website Hosting properties area to make sure everything looks and works properly.
To configure the www.james.murty.co
root domain bucket to redirect requests
to the authoritative james.murty.co
bucket:
- Edit the bucket Properties and open the Static Website Hosting section.
- Select Redirect all requests to another host name
- Enter the root domain:
james.murty.co
Because my original blog used /feed/
as the RSS feed endpoint, I also needed
to make this URL path point to the new RSS feed location /feeds/rss.xml
. This
was easy enough to do with a custom routing rule specified with the following
XML snippet in the Edit Redirection Rules section of the Enable Web Hosting
properties:
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>feed/</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>feeds/rss.xml</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
Set Up DNS using Amazon Route 53
To serve websites from S3 using a root domain, such as a domain without
a leading subdomain prefix like www
or blog
, you must use
Amazon Route 53 as a DNS server for the site. Unfortunately this
adds to the complexity and cost of S3-hosted websites, but I decided it is
worth the hassle.
The necessary steps are covered in detail in Amazon's website hosting instructions, but in brief here's what I did to set up the domain-to-bucket mappings:
- Create a hosted zone for your domain in Route 53
- Note the nameservers assigned to that hosted zone (called Delegation Set in Amazon's console)
- Create a record set for the hosted zone
- Add an A record alias (Type: A, Alias: Yes) to the record set to map my
root domain
james.murty.co
to the Amazon URL for the authoritative S3 bucket, in my casejames.murty.co.s3-website-us-east-1.amazonaws.com
.
The target bucket should be selectable in the Alias Target drop down list. - Add a CNAME record to the record set to map the
www.james.murty.co
subdomain to the root domainjames.murty.co
.
Because I was migrating existing domains, at this point I also copied some additional DNS settings from my original DNS provided over to Route 53, such as MX mail records.
Once you are happy that the Route 53 settings are correct, set up (or switch over) the nameservers at your domain name registrar to point to the Delegation Set endpoints you noted above.
Before long, your domain names should resolve to the appropriate S3 buckets and your static site should be available.
Comments