Using org-roam
has helped me organize my thoughts
and jot down whatever comes to mind in the moment, freeing my feeble
mind to care for what's most important in the day. As I've been using it
to take more notes, I'd like some of those notes (like this one) to
become blog posts.
I've found pandoc to be a really
good way to export, mainly for reasons of simplicity. The only issue
with using org-roam
and pandoc together is
that org-roam
's internal links don't
translate to pandoc html pages. That's where pandoc's filters come into the
picture.
Exporting a single page
To test exporting a single page with custom css, I've saved bettermotherfuckingwebsite.com's
css declarations in a file aptly called style.css
and use
pandoc to export a single page.
$ pandoc -f org -t html5 --css=style.css --standalone note.org -o note.html
And lo and behold, it already looks like how I want it to look like.
But a proper website needs a header and a footer, so we create two files
header.html
and footer.html
and add them to
the final page.
$ pandoc -f org -t html5 --css=style.css --include-before-body=header.html --include-after-body=footer.html --standalone note.org -o note.html
And now we have each page following a proper website template with header, body, and footer.
Sprinkle of internal links
To include links properly, I'll be using pandoc's lua filters to look up
the link in org-roam
's sqlite database and
modify pandoc's AST to replace the id:xxx
link with a
proper href
.
Before I can decide what filter to write, I need to see pandoc's generated AST.
$ pandoc --standalone -t native note.org
which gives me the following output
Pandoc (Meta {unMeta = fromList [("title",MetaInlines [Str "The",Space,Str "Grand",Space,Str "Unified",Space,Str "Theory",Space,Str "of",Space,Str "Everything"])]})
[Header 1 ("setting-up-org-roam",[],[]) [Str "Setting",Space,Str "up",Space,Code ("",[],[]) "org-roam"]
...
What we're interested in is
Link ("",[],[]) [Str "school"] ("id:e0e3eed4-d1ec-4e76-9244-cfbf22ba5a6f","")
Which according the module documentation and Text.Pandoc.Definition means it's a link item type]] with no attributes, alt text of "school", and target of "id:…".
function Link(elem)
return pandoc.Str(elem.target)
end
Switching gears to python
org-roam
stores note references with IDs in an sqlite
database that by default sits under
$HOME/.emacs.d/org-roam.db
. To access this, I'd need the
sql extension for lua which is not installed on many systems. Python has
both json and sqlite as part of its batteries-included standard library,
so I'll use that instead.
We can use pandoc's json
api and write the filter which parses, modifies, and prints json.
But there's a better way! pandocfilters
and panflute
modules are available for python which takes care of the plumbing for
us. They are also available on pypi
which means they can be installed easily with pip. I've chosen to work
with panflute
for no particular reason.
The filters can be used with the --filter
argument.
$ pandoc -f org -t html5 --standalone --filter myfilter.py note.org -o note.html
so the final line will be
$ pandoc -f org -t html5 --css=style.css --include-before-body=header.html --include-after-body=footer.html --standalone --filter myfilter note.org -o note.html
Filtering effectively
I've named the filter sanitize_links.py
.
#!/usr/bin/env python3
import panflute as pf
import sqlite3
import pathlib
import sys
import os
import pprint
import urllib
#### CHANGE THESE ####
= "~/.emacs.d/org-roam.db"
ORG_ROAM_DB_PATH #### END CHANGE ####
= None
db
def sanitize_link(elem, doc):
if type(elem) != pf.Link:
return None
if not elem.url.startswith("id:"):
return None
= elem.url.split(":")[1]
file_id
= db.cursor()
cur f"select id, file, title from nodes where id = '\"{file_id}\"';")
cur.execute(= cur.fetchone()
data
# data contains string that are quoted, we need to remove the quotes
= data[0][1:-1]
file_id = urllib.parse.quote(os.path.splitext(os.path.basename(data[1][1:-1]))[0])
file_name
= f"{file_name}.html"
elem.url return elem
def main(doc=None):
return pf.run_filter(sanitize_link, doc=doc)
if __name__ == "__main__":
= sqlite3.connect(os.path.abspath(ORG_ROAM_DB_PATH))
db main()
A note on versions!
I'm using Ubuntu 20.04 LTS which means some of the packages are outdated. It appears older pandoc versions didn't have great error messages making debugging difficult. Since I've updated pandoc with packages available on their release page, I've had better luck.
Worth noting the python3-pandocfilters
package in in
repos is also outdated, so using pip
is recommended.
Publishing the right files
Some of my notes are to be published, but some I'd like to keep
private. To do that, I have set up my notes to have a tag of "publish"
for ones I want to, well, publish, by adding it to filetags
.
#+filetags: publish
Then my build.sh
script filters files
that have a publish tag. Here's the entirety my of build.sh
script. A Makefile
would be more appropriate.
#!/bin/sh
CSS=org.css
mkdir -p html/
rm -f html/*
for note in $(grep -iRE '^#\+filetags:.*?publish' --color=never --files-with-matches); do
echo "processing ${note}"
pandoc -s -t html5 -f org --css="$CSS" --include-before-body=header.html --include-after-body=footer.html --filter fix_roam_links.py "$note" -o html/"$(echo $note | sed -e 's/\.org$/\.html/')"
done
index_file=$(grep -iR -l 'grand unified theory of everything' html | head -n 1)
echo "setting index file"
cp "$index_file" html/index.html
echo "copying $CSS"
cp "$CSS" html/
The output files go to html
directory.
And I publish by simply rsync'ing the files to my public directory.
Here's the one-liner for upload.sh
.
#!/bin/sh
rsync --progress html/* server:/srv/www/
Now it's time to add the publish tag to this file! With this setup,
every time I add a new post, all I need to do is add a link to it to the
homepage and run ./build.sh && ./upload.sh
.
Footnotes
I change the title of my notes frequently, which means the filename and title go out of sync. To prevent this, I have come to appreciate having date and IDs as filenames. Here's a one-liner that converts the default filenames to "<date>-<id>.org" format.
for f in *.org; do mv "$f" "$(echo $f | grep -Po '^\d+')-$(grep ID $f | tr -s '\t ' ' ' | cut -d' ' -f2)"; done
Having said that, my notes are now only named by date and time to
make it easier for org-roam
to generate
filenames.