importing rss feed from msn spaces to movable type
Yueyue is migrating her blog from msn spaces to movable type . It's a real difficult thing. Msn spaces have a limited number of post in the rss feed. Yueyue didn't find any option to overcome this limitation. So it's difficult to have the entire rss feed for importing. Here's a discussion(Chinese) on this issue and gave a way: Get the rss feed then remove the posts in this feed, then get the rss feed again and plus to previous feed, then remove... untill get the entire feed. It works, but really takes time if you have a long list. Anyhow, you would have it! We go to Movable Type.
MT builds in import/export feature but it only works on his own format, rss x.0 is not supported yet. We can not import the msn rss feed directly. I had a solution for it. I write a macro file for UltraEdit that read a msn rss feed file and convert it to MT import/export format. After the convert ion, you could save the output as a new file for importing use.
AUTHOR, ALLOW COMMENTS, CONVERT BREAKS and ALLOW PINGS are set default value, you could change them before your importing. DATE might not match your requirement if you have different timezone settings between msn spaces and movable type. Another point, in this case the same charset setting(UTF-8 here) is taken for both side movable type and msn spaces. If you have different setting between them, must do additional charset convertion for the feed file. this could be done within UltraEdit. Unfortunaly some convertion only available in UltraEdit menu but not in Macro. So we have to do it manually.Output example:
AUTHOR: #authorname TITLE: importing rss feed from msn spaces to movable type ALLOW COMMENTS: 1 CONVERT BREAKS: 1 ALLOW PINGS: 1 PRIMARY CATEGORY: essay DATE: 11/23/2006 03:42:07 ----- BODY: <p>Yueyue is migrating her blog from ... ----- --------
UltraEdit tool is necessary for running this macro file. The macro could be used on migrating other rss 2.0 feed to movable type with little change.
msn_feed2mt.mac
InsertMode ColumnModeOff HexOff UnixReOff Top Find RegExp "<generator>Microsoft Spaces v[0-9.]+</generator>" IfNotFound ExitMacro EndIf Top Find RegExp "^n" Replace All "" Find RegExp "<^?xml*</cf:listinfo>" Replace All "" Find RegExp "<item><title>Photo Album:*</item>" Replace All "" Find RegExp "<item><title>Custom List:*</item>" Replace All "" Find RegExp "<item><title>Music List:*</item>" Replace All "" Find RegExp "<item><title>Blog List:*</item>" Replace All "" Find RegExp "</channel></rss>" Replace All "" Find RegExp "<link>*</link>" Replace All "" Find RegExp "<guid*</guid>" Replace All "" Find RegExp "<comments>*</comments>" Replace All "" Find RegExp "<slash:comments>*</slash:comments>" Replace All "" Find RegExp "<msn:type>*</msn:type>" Replace All "" Find RegExp "<live:type>*</live:type>" Replace All "" Find RegExp "<live:typelabel>*</live:typelabel>" Replace All "" Find RegExp "<dcterms:modified>*</dcterms:modified>" Replace All "" Find "<item>" Replace All "" Find "</item>" Replace All "" Find "<" Replace All "<" Find ">" Replace All ">" Find RegExp "<title>^(*^)</title><description>^(*^)</description><category>^(*^)</category><pubDate>^(*^)</pubDate>" Replace All "^nAUTHOR: #authorname^nTITLE: ^1^nALLOW COMMENTS: 1^nCONVERT BREAKS: 1^nALLOW PINGS: 1^nPRIMARY CATEGORY: ^3^nDATE: ^4^n-----^nBODY: ^n^2^n-----^n--------" Find RegExp "%DATE: *, ^([0-9]+^) ^([a-zA-Z]+^) ^([0-9]+^) ^(*^) GMT$" Replace All "DATE: ^2/^1/^3 ^4" Find RegExp "%DATE: Jan*/" Replace All "DATE: 01/" Find RegExp "%DATE: Feb*/" Replace All "DATE: 02" Find RegExp "%DATE: Mar*/" Replace All "DATE: 03/" Find RegExp "%DATE: Apr*/" Replace All "DATE: 04/" Find RegExp "%DATE: May/" Replace All "DATE: 05/" Find RegExp "%DATE: Jun*/" Replace All "DATE: 06/" Find RegExp "%DATE: Jul*/" Replace All "DATE: 07/" Find RegExp "%DATE: Aug*/" Replace All "DATE: 08/" Find RegExp "%DATE: Sep*/" Replace All "DATE: 09/" Find RegExp "%DATE: Oct*/" Replace All "DATE: 10/" Find RegExp "%DATE: Nov*/" Replace All "DATE: 11/" Find RegExp "%DATE: Dec*/" Replace All "DATE: 12/"
For getting rss feed on msn spaces, it needs to turn the option Syndicate this space On within msn spaces setting.
Yueyue have another place blogcn need to migrate. The rss feed from blogcn is more clear than msn spaces. But it's in GB2312. I made changes on the macro file as below that works for blogcn's rss feed. Following step must be taken after the macro process: File -> Conversions -> ASCII to UTF-8 (Unicode Editing), then save. Or you have other way to do the charset convertion.
InsertMode ColumnModeOff HexOff DosToUnix UnixReOff Top Find RegExp "<rss version=" IfNotFound ExitMacro EndIf Top Find RegExp "^p" Replace All "" Find RegExp "<^?xml*</dc:language>" Replace All "" Find RegExp "</channel></rss>" Replace All "" Find RegExp "<link>*</link>" Replace All "" Find RegExp "<guid*</guid>" Replace All "" Find RegExp "<comments>*</comments>" Replace All "" Find "<item>" Replace All "" Find "</item>" Replace All "" Find RegExp "<author>^(*^)</author>*<title><!^[CDATA^[^(*^)]]></title>*<pubDate>^(*^)</pubDate>*<description>*<!^[CDATA^[^(*^)]]>*</description>" Replace All "^nAUTHOR: ^1^nTITLE: ^2^nALLOW COMMENTS: 1^nCONVERT BREAKS: 1^nALLOW PINGS: 1^nPRIMARY CATEGORY: #blogcn^nDATE: ^3^n-----^nBODY: ^n^4^n-----^n--------" Find RegExp "%DATE: ^([0-9]+^)-^([0-9]+^)-^([0-9]+^) ^(*^)$" Replace All "DATE: ^2/^3/^1 ^4"


