Turn any RSS feed into a newsletter or notification bot
feedletter is a service that
- watches RSS (or Atom!) feeds with great care
- works great with feeds generated by static-site generators!
- distinguishes between new items and older stuff or stuff already seen that flakily reappears
- awaits "finalization" of items, meaning their stabilization (and nondeletion) over specified time intervals
- lets you define a wide variety of subscriptions to those feeds
- Over different media
- Post to Mastodon
- SMS (coming soon!)
- etc
- In different arrangements
- each item as newsletter
- daily or weekly digests
- compilations of every
nposts - etc
- Over different media
- which are formatted via rich, customizable
untemplates - which are managed via a web API for easy subscription, confirmation, and unsubscription by users
The application requres a Java 17+ JVM and a Postgres database.
Typical installations will proxy the web API behind e.g. nginx, and run the daemon as a systemd service.
A (very) detailed tutorial on setting up, configuring, and customizing a feedletter instance is available here.
- A
feedis added - One or more
subscribables is defined against the feed - One or more
destinations subscribe to the feed. items are observed in the feed, and are added in theUnassignedstate- Each
itemis assigned, in a single transaction, to all the collections (assignables) to which they will ever belong.
(Steps 4 and 5 can repeat arbitrarily as new items come in.)
- Separately, collections (
assignables) are periodically marked "complete" and, in the same transaction forwarded to subscribers. - Complete
assignables are deleted, along with theirassignments items that are...- Already assigned
- No longer belong to not-yet-completed
assignablescan drop their cached contents, and then move into theClearedstate.
I want to sketch the not-so-obvious db schema I've adopted for this project while I still understand it.
First there are feeds:
CREATE TABLE feed(
id INTEGER,
url VARCHAR(1024),
min_delay_minutes INTEGER NOT NULL,
await_stabilization_minutes INTEGER NOT NULL,
max_delay_minutes INTEGER NOT NULL,
assign_every_minutes INTEGER NOT NULL,
added TIMESTAMP NOT NULL,
last_assigned TIMESTAMP NOT NULL, -- we'll start at added
PRIMARY KEY(id)
)Feeds must be defined before subscriptions can be created against them. They are defined by a URL, and they define what it means for a feed to "finalize", in the sense of being ready for notification.
Feeds are permanent and basically unchanging until (when someday I implement this) they are manually removed.
(last_assigned changes, but so far it's
just informational, has no role in the application.)
Next there are items:
CREATE TABLE item(
feed_id INTEGER,
guid VARCHAR(1024),
single_item_rss TEXT,
content_hash INTEGER, -- ItemContent.contentHash
link VARCHAR(1024),
first_seen TIMESTAMP NOT NULL,
last_checked TIMESTAMP NOT NULL,
stable_since TIMESTAMP NOT NULL,
assignability ItemAssignability NOT NULL,
PRIMARY KEY(feed_id, guid),
FOREIGN KEY(feed_id) REFERENCES feed(id)
)""".stripMarginfeed_idandguididentify an item.single_item_rsscaches the RSS item. We want to cache this, in case by the time we get around to notifying, the item is no longer available in the feed.content_hashis a hash based on the prior five fields. We use it to identify whether an item has changed.linkmay eventually be used as a neurotic double-check so we never notify the same human-perceived item twicefirst_seen,last_checked, andstable_sinceare pretty self-explanatory timestamps, We use these to calculate whether an item has stabilized and so can be "assigned". (See below.)assignability: items can be in one of four statesUnassigned— The item has not yet been assigned to the collections (including single member collections) to which it will eventually belong, but is eligible for assignment.Assigned— The item hash been assigned to all the collections (including single member collections) to which it will eventually belong. The application may not be done assigning to those collections, and the items may not yet be distributed to subscribers.Cleared— This is the terminal state for an item. The item has been assigned to all collections, and have already been distributed to subscribers. The cache fields (title,author,article,publication_date, andlink) should all be cleared in this state.Cleareditems are not deleted, but retained indefinitely, so that we don't renotify if the item (an item with the sameguid) reappears in the feed.Excluded— Items which are marked to always be ignored. Items are markeExcludedonly upon initial insert. Items can be manually updated fromExcludedtoUnassigned(timestamps should be reset to the tie of the update), to causeExcludedposts to be published.
Next there is subscribable, which represents the definition of a subscription by which parties will be
notified of items or collections of items.
CREATE TABLE subscribable(
subscribable_name VARCHAR(64),
feed_id INTEGER NOT NULL,
subscription_manager_json JSONB NOT NULL,
last_completed_wti VARCHAR(1024),
PRIMARY KEY (subscribable_name),
FOREIGN KEY (feed_id) REFERENCES feed(id)
)A subscribable maps a name to a feed and a SubscriptionManager. For our purposes here,
the main role of a SubscriptionManager (a serialization of a Scala ADT) is to
- Generate for items a
within_type_id, which is really just a collection identifier. All items in a collection of items that will be distributed will share the samewithin_type_id. - Determine whether a collection (identified by its
within_type_id) is "complete" — that is, no further items need by assigned the samewithin_type_id. - When a collection has been notified, it is deleted from the database. However, some
SubscriptionManagersneed to maintain a sequence ofwithin_type_ididentifiers. So for each subscribable, thelast_completed_wtiis retained.
SubscriptionManager determines how collections are compiled, to what kind of destination (e-mail,
Mastodon, mobile message, whatever) notifications will be sent, and how they will be formatted.
Names are scoped on a per-feed-URL basis. Users subscribe to a (feed_url, subscribable_name)
pair.
Next there is assignable, which represents a collection. They essentially map
subscribables (subscription definitions) to within_type_ids (the collections
generated by the subscription definition and notified to subscribers).
CREATE TABLE assignable(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
opened TIMESTAMP NOT NULL,
PRIMARY KEY(subscribable_name, within_type_id),
FOREIGN KEY(subscribable_name) REFERENCES subscribable(subscribable_name)
)opened is the timestamp of the first assignment to the collection.
Once an assignable has been notified ("completed"), it is simply deleted from the database.
For each subscribable, the within_type_id of only the most recently completed
assignable is retained (see subscribable table above).
Next there is assignment, which represents an item in an assignable, i.e. a collection.
It's pretty self-explanatory I think.
CREATE TABLE assignment(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
guid VARCHAR(1024),
PRIMARY KEY( subscribable_name, within_type_id, guid ),
FOREIGN KEY( subscribable_name, within_type_id ) REFERENCES assignable( subscribable_name, within_type_id )
)Next there is subscription, which just maps a destination to a subscribable.
the destination is JSON blob that can refer to a variety of things: e-mail addresses, SMS numbers, mastodon instances, etc.
Each SubscriptionManager works with a destination subtype.
CREATE TABLE subscription(
subscription_id BIGINT,
destination_json JSONB NOT NULL,
destination_unique VARCHAR(1024) NOT NULL,
subscribable_name VARCHAR(64) NOT NULL,
confirmed BOOLEAN NOT NULL,
added TIMESTAMP NOT NULL,
PRIMARY KEY( subscription_id ),
FOREIGN KEY( subscribable_name ) REFERENCES subscribable( subscribable_name )
)Since destinations can have ornamentation (an e-mail address, for example, might have a personal part (e.g. Buffy in "Buffy [email protected]"), it's not sufficient to prevent multiple subscriptions to insist that the JSON entities be unique. So destinations declare a unique core, whose uniqueness within a subscription the database enforces:
CREATE UNIQUE INDEX destination_unique_subscribable_name ON subscription(destination_unique, subscribable_name)That's it for the base schema! There are also tables that convert destinations specific to subscription types into their various queues for notification. I'm omitting those for now.
There are two layers of templating in feedletter:
Many notifications are rendered via untemplates. However, what untemplates render goes to all subscribers of a subscribable. We generate one "form letter" for all recipients, and store it only once.
But since we may want to customize our notifications in a per-recipient basis, the output of the untemplates
can take the form of a trivial template
with case-insensitive, percentage-delimited %Fields% that get filled in separately for each recipient.
We try to refer to the former, initial, shared templates as untemplates (because that's the technology that underlies them), and the last-minute substitution templates that are generated by the untemplates as mere templates.