SGML and XML

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. The two postulates are that markup should be:

  1. Declarative: description of the structure agnostic of processing specifics, deemed more future proof.
  2. Rigorous: defines data/objects that may be automatically processed / parsed.

SGML is a superset of XML, predating it. With features like implicitely closing tags, it is more user-friendly to manuscript, but harder to parse.

Office XML

MIME Type File Extensions
†.‡.text .odt, .fodt
†.‡.spreadsheet .odp, .fodp
†.‡.presentation .ods, .fods
†.‡.graphics .odg, .fodg
†.⹋.wordprocessingml.document .docx, .docm
†.⹋.presentationml.presentation .pptx, .pptm
†.⹋.spreadsheetml.sheet .xlsx, .xlsm

: application/vnd
: oasis-opendocument
: openxmlformats-officedocument

Some formats, including each first Office extension from the table pairs, alongside some like .epub (XML e-reader books), are really zip-archives of several XHTML files.
OpenDocument 3-letters extensions have 4-letters alternatives (e.g. .odt => .fodt) marking documents as encoded Flat i.e. in a single XML file.
Office Open XML alternatives are not equivalent. The subtitution of m marks the addition of Macro support (e.g. .docx => .docm).

XHTML and HTML

XHTML is a dialect of XML. It is specified as W3 XHTML2. The Living Standard HTML is a parallel specification, from the WHATWG a subset of SGML, but unlike XHTML not necessarily well-formed XML. The schism stems from this well-formed strictness vs. relaxed SGML pragmatic support from WHATWG vendors (Apple, Google, Mozilla, and Microsoft). NB. XHTML will be parsed and rendered correctly as HTML, as such there is no real drawback to writing XHTML, which helps parsing without taking away from the feature-set of HTML. For example, valid XML may be parsed via the Python standard library which ships with most Linux distributions, while HTML will require either some manual work with the HTML parser, or an external dependency, both of which are weaker alternatives.

DHTML and SHTML

Both formats are artifacts from the past: DHTML is the historical Microsoft name for Javascript+HTML+CSS, nowadays just known as HTML. SHTML is a file extension that lets the web server know the file should be processed as using SSI. The Server-side includes specifies markup files to include within other files, during some build step (not by the browser). It is mostly historical, being superseeded by modern server templating tools, Node modules, and similar static site generation techniques. SSIs leverages comments as a syntax:

    <!--#include virtual="top.shtml" -->
  

Sitemaps

Site page index for machines, specified by the sitemap.org Protocol. Search engines use them to speed-up page discovery on top of normal crawling.

  <?xml version="1.0" encoding="UTF-8"?>
  <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://www.example.com/</loc>
        <lastmod>2005-01-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
  </urlset> 
  

XBRL

XBRL is a mark-up dialect for reporting company financial statement informations. It was phased-in as a requirement for sec.gov EDGAR system reports in . iXBRL, or Inline XBRL, adapts the formal XBRL with the goal of being easily both machine and human readable. Unlike the prior distributed in XML documents, iXBRL is directly embedded inside the report HTML. For example: AAPL 10-K, . Parsing a financial statement for core metrics is simple, by extracting ix:nonFraction. It carries mandatory, useful attributes, like name, a GAAP key (e.g. us-gaap:NetIncomeLoss), and unitref, ISO-4217 currency code. The elements of the namespace are listed in the table.

ix:continuation
ix:denominator
ix:exclude
ix:footnote
ix:fraction
ix:header
ix:hidden
ix:nonFraction
ix:nonNumeric
ix:numerator
ix:references
ix:relationship
ix:resources
ix:tuple

RSS

Really Simple Syndication is a XML dialect to provide feeds for websites that have pages that change often. Channels list out item, with further fields given in the table.

titleThe title of the item.
linkThe URL of the item.
descriptionThe item synopsis.
authorEmail address of the item author.
categoryInclusion in (one or more) categories.
commentsURL of a page for comments related.
enclosureDescribes a media object attached.
guidA unique identifier for the item.
pubDateThe publication date of the item.
sourceThe RSS channel the item came from.
  <?xml version="1.0" encoding="UTF-8" ?>
  <rss version="2.0">    
  <channel>
    <title>Sample Channel</title>
    <link>https://example.com</link>
    <description>Channel Desc.</description>
    <item>
      <title>Sample Item</title>
      <link>https://example.com</link>
      <description>Item Desc.</description>
    </item>
    <item>...</item>
  </channel>    
  </rss> 
  

AppStream

AppStream files provide metadata for desktop applications and software in Linux distributions, for package managers, software centers, catalogs, like GNOME Software or KDE Discover. Located in /usr/share/appdata and /usr/share/metainfo. Suffixed .appdata.xml. Specification by freedesktop.org.

  <?xml version="1.0" encoding="UTF-8"?>
  <component type="desktop">
    <id>samplesoft.desktop</id>
    <update_contact>
      info@example.com</update_contact>
    <metadata_license>CC0</metadata_license>
    <project_license>CC0</project_license>
    <name>Sample Software</name>
    <summary>...</summary>
    <description><p>...</p></description>
    <url type="homepage">https://example.com<url>
    <screenshots><screenshot type="default">
        <image>https://example.com/thumbnail.jpg
        </image> <caption>...</caption>
    </screenshot></screenshots><translation/>
    <developer_name>Sample Dev.</developer_name>
    <url type="bugtracker">
      https://example.com/bugs</url>
    <url type="help">
      https://example.com/help</url>
  </component>