With a bit of python, lynx, and tidy I was able to pull very clean plain text versions of my WordPress posts. The sparse HTML can be found at http://tokyogringo.myjp.net and the markdown text version can be found on my gopher site at gopher://sdf.org:70/0/users/tokyogringo/

How did I do it? This site has full text RSS for everyone’s enjoyment. No one has to actually visit https://www.prjorgensen.com in order to consume the high value content I generate. The feed contains everything needed for this plain text life. How to make use of it?

I fumbled through my first in a long time python script relying heavily on the very powerful feedparser module.

This Just In: python’s documentation is terse almost to the point of incomprehension While accurate, the documentation does not help beginning (and maybe middling) python coders get to solving problems. Oddly, the Reddits and StackExchange sites are also of limited utility as the answers there often point back to or copy the documentation.

Anyway, taking a very Unix approach I decided not to do everything in python. I know tidy for making valid HTML. I know lynx for terminal-based web browsing, and the ‘-dump’ option produces markdown versions of web pages.

Once I got the script to the point of providing the website data in a reliable and eventually parse-able way, then I turned to getting all my posts.

I cranked the RSS feed of prjorgensen.com up to 20,000 to make sure the feed briefly included all of my posts. I moved my parsing script to my MacBook Pro because I didn’t want to choke the sdf.org servers with my madness. I installed modules and localized the script to run on the MBP.

I ran the script. I checked my email. I then got up to … hmmm. The script finished in under two minutes. Suddenly I had all of my posts back to 2011 in both very clean HTML and in plain text. I synced them to their proper home. I reset my website feed back to a more reasonable number.

There are any number of improvements I can make:

  • My script does not grab images
  • I capture categories and tags from WordPress but don’t do anything useful with them
  • I need to include modifying my gophermap and my index.html (as appropriate)
  • A full text RSS feed of the plain HTML site
  • A full text RSS feed of the gopher site
  • Maybe use a static web site generator like Jekyll for the plain HTML site
  • Maybe use this for tokyogringo.com and PVCSec.com? If so, then I need to handle …
  • Media enclosures

Watch this space for the link to my script on GitHub. Which is here!

El entierro by Francisco Proaño

I want to start by clearly stating that I am not “rage quitting” my social media. I am reducing my footprint and reliance on them. To that end …

I removed the share buttons for social media from my site. I do not like how they call home even if you chose not to share my post. It’s rude. And the buttons row on my posts is ugly. I might reverse my decision at any point.

I removed the automatic post sharing on social media. Blasting my posts across the Social Media with no tailoring is inconsiderate at best. While all my written gems are exactly that, it is not my place – after this post – to demand your attention on every utterance. I might reverse my decision at any point.

How will I follow your crazy shenanigans“, you ask?

The best way to track my site is to subscribe to the full text RSS feed.If you’re not interested in every one of my whims, each category has its own RSS feed as well. If you’re my parents and siblings, your email updates will not change.

If you, Dear Reader, would like an email newsletter of some kind, please leave a comment on this post to let me know.

Another note for the more technically interested in website hosting, I am playing with removing CloudFlare from my site. I experienced odd behaviors I thought my CDN caused. Turns out that my site was no longer redirecting to the HTTPS page, maybe due to the service.

In other news, all requests are once again going to the HTTPS version of my site. I don’t know when, why, or how my old .htaccess file changed, but it is again serving up the correct version of my site.

If you have questions, comments, or concerns, please leave a comment on this post.

Go here: When posting I get ‘Lisp error: (wrong-type-argument listp t)’ #216 for the history on this issue.

For fun, here are some Unicode characters: ” ‘ & 🗾 😄

Here is my current slim emacs config to get org2blog working:

(setq load-prefer-newer t)

(package-initialize)
(add-to-list 'package-archives
             '("gnu" . "http://elpa.gnu.org/packages/"))
(add-to-list 'package-archives
             '("melpa" . "http://melpa.org/packages/") t)
  (package-refresh-contents)
  (unless (package-installed-p 'package+)
    (package-install 'package+))

(add-to-list 'load-path "~/src/org-mode/lisp")
(add-to-list 'load-path "~/src/metaweblog")
(add-to-list 'load-path "~/src/org2blog")
(add-to-list 'load-path "~/src/xml-rpc-el")
(add-to-list 'load-path "~/src/pretty-mode")
(add-to-list 'load-path "~/src/use-package")

(require 'org)
(global-set-key "\C-cl" 'org-store-link)
(global-set-key "\C-ca" 'org-agenda)
(global-set-key "\C-cc" 'org-capture)
(global-set-key "\C-cb" 'org-iswitchb)

(require 'xml-rpc)

(require 'metaweblog)

(require 'org2blog-autoloads)

(require 'auth-source)

(setq
 auth-sources '(
                "~/.authinfo.gpg"
                )
 epa-file-cache-passphrase-for-symmetric-encryption t
 auth-source-debug 'trivia
 )

(setq
 org2blog/wp-blog-alist
 `(
   ("PRJ"
    :url "https://www.prjorgensen.com/xmlrpc.php"
    :username ,(car (auth-source-user-and-password "prjorgensen.com"))
    :password ,(cadr (auth-source-user-and-password "prjorgensen.com"))
    :default-title "Hello, World!"
    :default-categories ("Uncategorized" "org2blog")
    )
   )
 )

(require 'use-package)

(use-package htmlize
             :ensure t)

I created local git clones for xml-rpc-el, org-mode, org2blog, metaweblog, pretty-mode, and use-package.

This is done in order to post a draft of this blog. Then I will publish it.

Wish me good times!

Also on:

While Medium.com works for many it does not work for me. My several month test of the platform left me less than impressed.

While it is nice when someone else takes care of the plumbing, I am not comfortable with the exchange.

Thus I bring my musings and ditherings and blatherings and occasional cogent insights back to my personal site.

Share and Enjoy!

Shibuya 1-chome, Tokyo, Japan

Also on:

I find myself spending time consuming content – the Web via RSS feeds (yes, they’re still in use), streaming video, and on-line radio (real radio stations and not streaming audio).

I don’t find myself creating content – on my various web sites or presentations or in social media – as much as I’d like. Sure, I podcast & tweet & G+ update & so on.

“What’s the ratio?”

I asked myself this today out of nowhere. It’s a simple question: “What’s my ratio of content consumed to content created?”

What’s reasonable? 50/50 is absurd. Maybe 10% my content to 90% consumption is a workable fraction?

Even a 1/9 ratio is absurd based on my RSS feeds. I receive over 1000 posts a day. There’s no way I can generate 10 posts per day.

Of those 1K posts, there are maybe 200 in which I show an interest. Of those, how many are tweet-able because I find them interesting but not interesting enough to write a post? Let’s say 180.

That being the case, is my issue more process than content?

I don’t have an answer. Not yet, anyway.

Thoughts?

Welcome to Detroit-Arsenal.com, a site about Detroit and the Metro area including our cousins across the Detroit River in lovely Windsor, Ontario, Canada.

The title comes from the nickname Detroit earned during World War II: The Arsenal of Democracy. It also just so happens that I’m an Arsenal football (soccer) fan.

Posts here will cover happenings in the area. Content will cross from Gravy Boat Beer and some of my other sites.

Please let me know what you think about the site. Comments are always welcome.

Thanks!

Augustus