How to save your Windows Live Blog!

NOTE: if you did not choose to back up your Windows Live Space Blog before migrating to WordPress you can do it by going to http://myspaceurl/migration/space.zip where you replace myspaceurl with the your space address. It should look something like myspace.spaces.live.com. If you don’t know or remember your space address you can log in to profile.live.com and use the first part of the url that should be cid-{numbers and letters}.profile.live.com. The url to download your space should then be something like: http://cid-{numbersandletters}.spaces.live.com/migration/space.zip where you replace lettersandnumbers with the actual letters and numbers from your profile url.

If you are one of the individuals who are frustrated by Windows Live Teams move to force every Windows Live blog to WordPress, then you are not alone!

Follow these few simple steps to migrate your blog to blogspot.com instead:

  1. Download your Windows Live Blog using the backup option they provide to make a local copy of your blog (See note above)
  2. Extract the downloaded archive to a folder
  3. Create your blogspot.com account
  4. Make one new blog post at your new blogspot.com account where you attach all the images from your exported Windows Live blog (will be in the «img» folder in the folder you extracted to in step 2)
  5. Export your blogspot.com blog using Settings->Basic->export
  6. Download, extract/install and start my program and show it where your exported blogspot.com file from step 5 is, the folder you extracted to in step 2, and finally give a location to save the results, the output dir (anywhere you like)
  7. Hit the start button
  8. Go to blogspot.com Settings->basic->import blog  and select the output.xml file that my program generated in the output location of your choosing in step 6.

A little more background information and details of the migration process for those that may be interested…

I was using my Live Space Blog as a private blog for friends and family, and this worked well since my friends and family could authenticate using Windows Live ID when logging in to read my blog. This however is NOT possible in WordPress blog. Even though there exists a Windows Live ID SDK which someone has made a plugin for in the open source version of WordPress. You would think that the Windows Live Team would have thought about the private bloggers, but no, that was not the case. If you had a private blog, everyone who wants to read your blog would now have to create their own WordPress account.

To top that off, WordPress offers a maximum of 35 private readers. If you want more than 35 you need to pay a yearly fea.

I do not give up that easily. Luckily I took a backup using the download my blog option the Windows Live Team offered, so I decided to try and make my own option for blog migration!

I took a look at blogspot.com, which offers Google account for authentication. Lets face it, most people today have a Google account already, just like they have a Live ID, at least we can say that a LOT more people have Google accounts compared to WordPress accounts. blogspot.com also offers 100 private readers, which is also a lot better than 35.

Blogspot.com has functionality to both export and import your blog, using an XML format. The first thing I did was to make a blog entry on my new blogspot.com account, uploaded pictures and made some comments. Then I proceeded to export the blog, and got to work dissecting the xml file.

I split it up into three different templates:

  1. EmptyBlogTemplate.xml
  2. BlogEntryTemplate.xml
  3. BlogCommentTemplate.xml

In each of these files I replaced different text that refereed to the blog, names, ids and such with FIELDS I could search and replace with the content I wanted, and that would be data from my old blog.

I then proceeded to make a C# program that would parse my old blog, given the backup format Windows Live Team provided, which basically was just a bunch of HTML and JPG files. My C# program would have to read these html files, pick up the blog post names, text, images, post times, comments, comments time stamps, names and so forth, and produce a XML file that blogspot.com would import successfully.

One of the bigger challenges was to get the images across, and I opted for a manual step in this regard, because the images you post does not have a predeterminable url, but it has random parts. What I did was that I made a new blogpost on blogspot.com where I uploaded all the images from my old blog. This was rather easy to do because all the images from the old blog was in one folder called img in the backup. I tried to do them all in one go at first, but the image uploaded of the blog editor seemed to hit some limitation around 200 pictures, but if you added 200 at a time, it seemed to be ok.

Once all the images was uploaded I would post this blog entry and then go to Settings->Basic->Export blog, and save the file on my computer. Lets call it export.xml for the remainder of this article.

My C# program goes through export.xml and picks out all http url’s, and while parsing old blog entries (html files) that contains pictures the program will compare the image filename with the url’s from the export.xml. Luckily the url’s end with the exact filename, so it is possible to translate the old images into the new «random» url provided after uploading them. The formating of the image insertion in the blog is also taken care of by my program, as it differs quite a bit from the old blog format.

Comments in each old blog entry must also be parsed one by one, and a counter for the number of comments for a given blog entry must be added to the blog entry it self in the new xml format. The comments them self are also added to the xml file as entries of a sligltly different format, and contain references to the blog entry in which they are a comment to.

My program takes care of all this parsing, and the end result is that all your blog posts, along with images and comments are exported into a xml format that can be imported in your blogspot.com account.

My program will try to exteract the following information from the export.xml file:

  • Domain; typically myblog.blogspot.com
  • Blogtitle
  • Blog ID
  • Blog author profile URI
  • Blog author name
  • Blog author email
  • Image urls

After parsing the export.xml file the program will ask you to verify the correctness of this information. If the program failed to extract the information correctly, here is your chance to correct those mistakes before continuing.

When the program is finished you will find two files in the output directory that you choose:

  1. output.xml
  2. report.txt

output.xml is the file you may use to import your blog at blogspot.com, while report.txt is a file you can use for troubleshooting if something went wrong, it will contain some debugging information.

So now all you need to do to migrate to blogspot.com is to go to settings->basic->Import blog and browse to output.xml.

I provide precompiled version of this program, and also post the full source code, it is not pretty, it’s not polished…I admit..It’s only a few hours of work, but it works for me, and I hope it can be of use to someone else as well…

NOTE: Requires .NET framework installed. Also make sure you keep the 3 template files in the same directory as the executable file.

Known limitatons:

PS. I can not guarantee this program will work for you. It is provided as is, use at your own risk.

If you have any questions, pelase feel free to leave a comment!

v0.2 bug fix release

  • If old space blog «title» contained characters not suitable for a hyper link, that and all subsequent blog entries would not be imported by blogger.
  • Had a second go at detecting date formating. Now the program will attempt to find a date formating that works for all both blog and comment dates. If the date scheme is consistent throughout the blog it will succeed, giving you correct dates for all blog entries. If there is inconsistencies, it will use the windows default, and replace those dates that it can not parse with current date.

v0.1 first bug fix release

  • PostIDs would in some cases go negative adding a – sign in the file because I was not using unsigned variable
  • Date formating can be different in the space backup data. Application will now try to first use the windows default formating on the running computer, and if that does not work it tries a number of different formats
  • files that are not html files, and index.html is now excluded from the list of files the application tries to parse
Reklame
Publisert i Computers and Internet | Merket med , , , , , , , , , | 16 kommentarer