The Standard Generalized Markup Language
is a standard for defining generalized markup languages for documents.
The two postulates are that markup should be:
Declarative: description of the structure
agnostic of processing specifics,
deemed more future proof.
Rigorous: defines data/objects that may
be automatically processed / parsed.
SGML is a superset of XML, predating it.
With features like implicitely closing tags,
it is more user-friendly to manuscript,
but harder to parse.
Some formats, including each first Office extension from the table pairs,
alongside some like .epub (XML e-reader books),
are really zip-archives of several XHTML files.
OpenDocument 3-letters extensions have
4-letters alternatives
(e.g. .odt => .fodt)
marking documents as encoded Flat
i.e. in a single XML file.
Office Open XML alternatives are not equivalent.
The subtitution of m marks the addition
of Macro support (e.g. .docx => .docm).
XHTML and HTML
XHTML is a dialect of XML.
It is specified as W3 XHTML2. The Living Standard HTML is a parallel specification,
from the WHATWG
a subset of SGML, but unlike XHTML not necessarily
well-formed XML.
The schism stems from this well-formed strictness vs. relaxed SGML pragmatic support from
WHATWG vendors (Apple, Google, Mozilla, and Microsoft).
NB. XHTML will be parsed and rendered correctly as HTML,
as such there is no real drawback to writing XHTML, which helps parsing
without taking away from the feature-set of HTML.
For example, valid XML may be parsed via the Python standard library
which ships with most Linux distributions,
while HTML will require either some manual work with the HTML parser,
or an external dependency, both of which are weaker alternatives.
DHTML and SHTML
Both formats are artifacts from the past:
DHTML is the historical
Microsoft name for Javascript+HTML+CSS, nowadays just known as HTML.
SHTML
is a file extension that lets the web server know the file
should be processed as using SSI.
The Server-side includes specifies markup files to include within other files,
during some build step (not by the browser).
It is mostly historical, being superseeded by modern server templating tools,
Node modules, and similar static site generation techniques.
SSIs leverages comments as a syntax:
<!--#include virtual="top.shtml" -->
Sitemaps
Site page index for machines, specified by the sitemap.org Protocol.
Search engines use them to speed-up page discovery on top of normal crawling.
XBRL
is a mark-up dialect for reporting company financial statement informations.
It was phased-in as a requirement for sec.gov EDGAR system reports in .
iXBRL, or Inline XBRL,
adapts the formal XBRL with the goal of being easily both machine and human readable.
Unlike the prior distributed in XML documents,
iXBRL is directly embedded inside the report HTML.
For example: AAPL 10-K, .
Parsing a financial statement for core metrics is simple,
by extracting ix:nonFraction.
It carries mandatory, useful attributes, like
name, a GAAP key (e.g.
us-gaap:NetIncomeLoss), and
unitref, ISO-4217 currency code.
The elements of the namespace are listed in the table.
ix:continuation
ix:denominator
ix:exclude
ix:footnote
ix:fraction
ix:header
ix:hidden
ix:nonFraction
ix:nonNumeric
ix:numerator
ix:references
ix:relationship
ix:resources
ix:tuple
RSS
Really Simple Syndication is a XML dialect
to provide feeds for websites that have pages that
change often. Channels list out item,
with further fields given in the table.
AppStream files provide metadata for desktop applications
and software in Linux distributions, for package managers,
software centers, catalogs, like GNOME Software or KDE Discover.
Located in /usr/share/appdata and /usr/share/metainfo.
Suffixed .appdata.xml.
Specification by freedesktop.org.