The Web Snapshots add-on allows agencies to capture their website records in their archive each time pages on the site are updated. Using an agency’s XML or HTML sitemap, changes are captured and stored becoming searchable next to social media records.
Important Note
The Social Media Archiving solution (formerly ArchiveSocial) recommends using an XML sitemap that includes updated last modified dates when setting up Web Snapshots to more accurately capture changes to a website.
What is an XML sitemap?
An XML sitemap is a file that lists URLs for a site along with additional metadata about each URL. It is an easy way for webmasters to inform search engines about pages on their sites and is used by search engines to more intelligently crawl the site. XML sitemaps are used by the Social Media Archiving solution (formerly ArchiveSocial) to detect newly added or removed URLs and to use the last updated date for meaningful versioning.
These sitemaps should include:
- URL
- Location: Absolute, not relative, must begin with the protocol (such as HTTP) and end with a trailing slash
- When it was last updated
- How often it usually changes
- How important it is compared to other URLs on the site
What is an HTML sitemap?
An HTML sitemap is built to help humans navigate around the site. It is a collection of links but it does not have any more information about the links like the XML sitemap does. It is not recognized by search engines as a sitemap with a valid format. Social Media Archive (Formerly ArchiveSocial) can detect the URLs from the HTML sitemap but cannot rely on it for versioning information.
How do I find my sitemap?
- There are many free tools available for identifying your agency’s sitemap. The XML Sitemap Validator is a suggested tool
- If you have a Website Administrator they should know where to find the proper sitemap
In-Article Glossary
Review the Web Snapshot Glossary of Terms, a comprehensive explanation of the acronyms, abbreviations, and solution-specific terminology. The terms located in this section are listed alphabetically:
- HTML: Hypertext Markup Language
- HTTP: Hypertext Transfer Protocol
- URL: Uniform Resource Locator
- XML: Extensive Markup Language
Comments
Let us know what was helpful or not helpful about the article.0 comments
Please sign in to leave a comment.