How to Migrate from DNN to Astro
We recently migrated our primary marketing website, engagesoftware.com, from DNN Platform to Astro. You can read more about our motivation in the post Why We Moved Our Marketing Site From DNN to Astro. This post will cover the technical details of that migration.
Astro Layouts
In Astro, layouts are components which contain the “shell” of a page, generally including its common CSS and HTML, such as shared navigation and footers. DNN themes are a similar concept, so we started by copying the CSS and markup from our DNN theme and creating an Astro component out of it. We had a new theme and a legacy theme, so we ended up with a <BaseLayout> component that had common structure and logic, and then used that <BaseLayout> within the <Layout> and <LegacyLayout> components that wrapped each of the migrated pages.
Migrating Pages
For the main pages using our new theme, it was straightforward to copy the HTML into a <Layout> component and everything just worked. Over time, there were a few HTML patterns that I pulled out into their own components, to ensure consistency and simplify future editing, but nothing that had to be done to make the primary pages of the site work.
However, the long tail of the site was a bunch of old pages which we didn’t want to lose, since they still come up in search results. In order to migrate these other pages, I wrote a SQL query to find a listing of every DNN page that was publicly accessible but had not already been migrated. The query included the Tab Path (e.g. //WorkingWithEngage//CommittedToGrowth), which I could convert into a URL, and a list of Tab/Page URLs to use to setup redirects. After exporting that query to CSV, I wrote a small C# program (using LINQPad) which visited each page, took note of its final URL, and wrote a new Astro page with a path matching that URL. I used AngleSharp to process the downloaded HTML and extract the title, description, and contents of the <main> element on the page, wrapping that main content in a <LegacyLayout> component and discarding everything else (hoping that what we had in the layout would be sufficient).
After viewing these new pages, it became clear that this was close, but there were a variety of issues to clean up before I could call this complete. First, the images needed to be moved in addition to the HTML. For this step, I ended up using a simple regular expression to find any paths that started with /Portals/1/ and copy those from my local copy of the website to the public folder of the Astro site (while I typically prefer to parse HTML rather than use regular expressions, since I was looking for image paths, PDF links, and background images in CSS, the regular expression was the pragmatic choice this time).
However, it wasn’t as easy as just copying the files. Since we are moving from Windows-based hosting to Linux-based hosting, the file system is becoming case sensitive. Clicking around on the site made it clear that mismatched casing was going to be a major issue. So, in addition to using the above regular expression to find files to copy, I also used it to replace those paths in the HTML content with their lowercase versions.
The next major issue observed on the site was scripts that weren’t working. Astro automatically processes script content, which is usually what you want if you’re starting by writing Astro, but can get in the way if you’re migrating decade-old content. The simple fix here was to use AngleSharp to add the is:inline directive to each <script> element, which tells Astro to leave them alone.
The final step was coming to terms with the inadequacy of my “hope” mentioned above that ignoring the header and footer content would just be okay. In reality, there were some pages which added extra CSS or JavaScript, and I needed to find those instances and copy them into the <LegacyLayout> component so that everything kept looking and working like before.
In the end, the LINQPad script looked like this:
async Task Main()
{
var redirects = JsonSerializer.Deserialize<Dictionary<string, string>>(File.ReadAllText(@"D:\code\engagesoftware.com\redirects.json"));
using (var httpClient = new HttpClient())
using (var fileStream = File.OpenText(@"C:\Users\BrianDukes\Downloads\Public Engage Pages.csv"))
using (var reader = new CsvReader(fileStream, CultureInfo.InvariantCulture))
{
foreach (var page in reader.GetRecords<Page>())
{
if (string.IsNullOrWhiteSpace(page.TabPath)) {
continue;
}
var url = new Uri(new Uri("https://engagesoftware.com/"), page.TabPath.Replace("//", "/"));
var response = await httpClient.GetAsync(url);
if (response.StatusCode != System.Net.HttpStatusCode.OK)
{
response.ReasonPhrase.Dump(url.PathAndQuery);
continue;
}
var destination = response.RequestMessage.RequestUri.LocalPath;
foreach (var redirect in page.RedirectUrls)
{
if (!redirects.ContainsKey(redirect) && !string.Equals(redirect, destination, StringComparison.OrdinalIgnoreCase)) {
redirects.Add(redirect, destination);
}
}
var parser = new HtmlParser();
var document = await parser.ParseDocumentAsync(await response.Content.ReadAsStreamAsync());
var contents = $@"---
import LegacyLayout from ""@layouts/LegacyLayout.astro"";
---
<LegacyLayout
pageTitle=""{document.Title}""
description=""{document.GetElementById("MetaDescription").GetAttribute("content")}""
>
{CleanHtml(document.QuerySelector("main"))}
</LegacyLayout>
";
var astroFileName = Path.Combine(@"D:\code\engagesoftware.com\src\pages\", destination.TrimStart('/') + ".astro");
Directory.CreateDirectory(Path.GetDirectoryName(astroFileName));
File.WriteAllText(astroFileName, contents);
}
}
File.WriteAllText(@"D:\code\engagesoftware.com\redirects.json", JsonSerializer.Serialize(redirects));
}
string CleanHtml(IElement element)
{
var scriptTags = element.QuerySelectorAll("script");
foreach (var script in scriptTags)
{
script.Attributes.SetNamedItem(new AngleSharp.Dom.Attr("is:inline", null));
}
return Regex.Replace(element.OuterHtml, @"/Portals/1/([^'""]+)(/[^'""/]+)", match =>
{
var path = match.Groups[1].Value;
var filename = new Uri(new Uri("http://localhost"), match.Groups[2].Value.TrimEnd(';', ')')).LocalPath.TrimStart('/');
var filenameSuffix = match.Groups[2].Value.Substring(match.Groups[2].Value.TrimEnd(';', ')').Length);
var destinationDir = Path.Combine(@"D:\code\engagesoftware.com\public\portals\1\", path.ToLowerInvariant());
if (!Directory.Exists(destinationDir)) {
Directory.CreateDirectory(destinationDir);
}
if (!File.Exists(Path.Combine(destinationDir, filename)))
{
var sourcePath = Path.Combine(@"D:\wwwroot\engage.local\Website\Portals\1\", path, filename);
if (!File.Exists(sourcePath)) {
sourcePath.Dump("MISSING FILE");
}
else {
File.Copy(sourcePath, Path.Combine(destinationDir, filename));
}
}
return "/portals/1/" + path.ToLowerInvariant() + "/" + filename + filenameSuffix;
}, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
}
public class Page
{
public string TabPath { get; set; }
[Name("RedirectUrls")]
public string RedirectUrlsRaw { get; set; }
public IEnumerable<string> RedirectUrls => this.RedirectUrlsRaw.Trim('"').Split(new[] { "\", \"" }, StringSplitOptions.RemoveEmptyEntries);
}
Migrating Blog Posts
Now we had a site with more than 60 pages, but we only had a few recent blog posts. We needed to also migrate all of the old blog content, including old URLs with author, tag, and category IDs. For this site, we’re keeping all of the content in source control, rather than using an external CMS, so each blog post is a Markdown file. Similar to the approach above, I queried the authors, tags, categories, and posts, exporting those results as CSV files. For the posts, I then processed that CSV file and turned each post into a Markdown file (since HTML is valid Markdown, I didn’t need to do any transformations to get this to just work).
For the authors, tags, and categories, I added those CSV files to the source code, and created content collections from each of them. This gave us a centralized place to manage the full list of tags or authors, including their old ID, which I could then use to generate pages using those IDs which point to the new version of that page without an ID. Once the blog content was setup, we could use that content in a dynamic route to create each page. We also enabled an RSS feed (and configured a redirect from the old RSS URL to the new one), as well as a sitemap.
Conclusion
Moving a site with decades of history to a new platform was a daunting possibility, but once we got into the nuts and bolts of the process, it was clear that it was a positive move and that there was not going to be any part that was insurmountable for our team. We’re excited to continue connecting with the Astro community and extend our expertise into the world of embracing the constraints of simple, static sites in order to reap the benefits, without compromising on functionality.