Subscribing to a journal page

This document describes a convention to subscribe to a journal written in HTML, without using a full-fledged syndication technology like Atom or RSS. It's a lightweight alternative to the hAtom microformat. This convention will obviously be less powerful than more advanced technologies such as Atom and will not work as well for other use-cases than maintaining a journal. Nothing prevents authors from simultaneously publishing an Atom feed if they wish. This convention can ease the generation of said feeds.

The remainder of this document describes how to interpret a single text/html document as if it were an Atom feed with all required elements present. This is to demonstrate how simple automatic generation of Atom feed is possible.

Feed elements

The URL from which the text/html document is fetched serves as the feed's "id" element and the recommended "link" element.

The contents of the first h1 tag in the page serves as the feed's required "title" element. Authors are encouraged to use titles which provide their own context, e.g. "m15o's journal" rather than "My journal".

A feed's required "updated" element should be set equal to the most recent value from all the associated entry's required "updated" elements. If no entries can be extracted from the document, then the feed is empty, and the feed's "updated" element should be set equal to the time the document was fetched.

Entry elements

A feed's entry elements are derived from a subset of the journal's article tags, if any are present.

Each article tag with a child h2 whose first 10 characters correspond to a date in ISO 8601 format (i.e. YYYY-MM-DD) represents a single entry. article tags which do not meet this criteria are ignored.

An entry's required "title" element is equal to value of its h2 tag.

An entry's required "id" element is equal to the concatenation of the feed's "id", a # character, and the entry's title (e.g. feed_id#title). Any space character should be converted to "-".

An entry's required "updated" element is noon UTC on the day indicated by the 10 character date stamp at the beginning of the corresponding h2 line's label.

An entry's "content" element is equal to the inner HTML of the enclosing article tag, from which the h2 node has been removed. It should be of type "html".

Example

Here's an extract from this site's journal:

<h1>m15o's Journal</h1>

<article>
  <h2>2022-06-09</h2>
  <p>Just added a page about the technical <a href="stack.html">stack</a> I'm using to build my <a href="projects.html">projects</a>.</p>
</article>

<article>
  <h2>2022-06-08</h2>
  <p>Wrote a page about <a href="small-net.html">the small net</a>.</p>
</article>

Credits

This spec has been heavily inspired by gmisub.