×
all 41 comments

[–]wolfredditor 3 points4 points  (0 children)

This is serendipitous.

I spent several pleasant hours over the last weekend reading and rereading parts of the collected, uncollectected and scattered works of dredmorbius in the lair, on ello, and at the other place including several posts on this general topic.

So let's rethink what's possible, taking into account a bunch of changes in both technology and better practices in Open Source collaboration. Together they make what used to be hard much easier, and what used to be impossible, quite achievable.

For example: consider the difference between organizing material with a PoIC versus a web interface. Web wins for searchability and portability. But PoIC wins because it so nicely matches the human interface of eye, hand, and flexible use of space--both on the card and on the sorting table.

It was either/or before displays started getting touch-sensitive and became both bigger and cheaper and smaller and handier--with cameras for taking pictures of book pages among other things. Now the dichotomy between text and PoICs might be a false one. One might build something that combines PoICs with character recognition for searchability (while keeping the image), turns clipped articles into cards and lets you sort through them and lets you swipe and drag them into piles on any device bigger than a smartphone--and even there.

u/gwern has a public Evernote notebook where he been saving webpages and segments of pages since 2009 here (currently nearly 38,000 notes).

There's no tagging, but Evernote does a decent job of searching and for many topics of interest to me gwernrank is better than pagerank.

I've added Gwern's notebook to my own badly organized Evernote notebooks with deserve to be flattened. (I think that Google has showed us that one-big pile + search is better than laboriously constructing a Yahoo). Evernote lets me search my notebooks OR gwern's, but not both.

I'm imagining an open source project that builds something better than Pocket and Evernote and the other things out there. One that incorporates the ideas of workflow and makes use of new web technologies for building (and rebuilding) interfaces that take full advantage of new hardware options. And it's one that lets people collaborate--combine their reading collections and also their tags and annotations. And then there could be agents running behind the scene to help organize material. It's going to get cheaper and easier to use.

And it can be delivered on the web or self-hosted.

Wallabag is an Open Source project in this domain. It lets you self-host, and import from Pocket, Readability and Instapaper (also Firefox and Chrome, which I assume may mean bookmarks).

I pulled their docker image, and spun it up and was using it with about ten minutes of active time (scanning the docs and copy-pasting two commands) and a little time waiting for the pull to complete. It seems capable and full of good ideas, but right now I would not swap it for Evernote, and since it's written in PHP I would not want to hack on it.

But there are a couple of open source implementations for the core algorithms for grabbing content -- not to mention Wallabag if something wants to translate the PHP into something comprehensible Versions of Readable/readability here and here here. (The problem with the bookmarklets is that they won't work with https, but there are other ways to inject the code. I think something quite workable could be spun up quickly with the right design and a bunch of interested people.

So: I'm interested in collaborating on the design and pitching in on the implementation. You seem to be a guy who has though more about this than I have, and I'm happy to follow your lead, if you want to lead.

On the Dev side, I've been putting together a bunch of tools with a VERY smooth workflow for prototyping and building hot-swapable client and server-side components that could be stitched together (or mixed and matched to serve individual needs and preferences). I've been using gomix a new service from the guys who built StackOverflow and Trello for some of my experiments.

[–]akaleeroy 2 points3 points  (10 children)

I've identified many of the same pain points over the years snooping around for answers to this problem. I've seen people's frustrations, ideas and concerns converge to an eerie extent with my own (such as this time), confirming that this is space is indeed wide open. Despite the cornucopian mantra of exponential technological improvement, basic infrastructure remains poorly addressed.

The difference is in my journey I didn't even bother exploring these Sillycon Valley services. I had a Springpad account once, which was pretty good until it went under. Evernote didn't do the trick either. As pretty as the UI looks, its whole perspective just didn't lend itself to covering my needs in a convincing way.

I decided an "expert system" is needed, and as I became enamoured with the UX and philosophy of Sublime Text, I started to draft requirements for how such a system would be built. Years later, whaddaya know, that research is mired in a swamp of unorganized files, URLs and text on my hard drive.

Meanwhile I cobbled together just enough features to keep me unenthusiastic about startup offerings. Several bookmarklets (oh how serenely people overlook: browser extensions are a constant RAM hog!), a clipboard manager, Sublime Text extensions for Markdown and filesystem management, various little scripts and OS customizations.

Regarding readability, just 20 hours ago I shared this with a friend:

Readable Bookmarklet - dark theme for bedtime reading

It's not a Pocket alternative. It doesn't save (unless you print to PDF), it doesn't tag, doesn't do anything except let you read in peace. I got a laser pico-projector and realized I could use it to read articles off the ceiling, lying in bed. (I also use eReader Prestigio on Android to listen to .epubs with Google's TTS. Anything to limit the number of hours staring close-range.)

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (9 children)

The Readability Bookmarklet was actually among my starting points on this journey. I'd started with that several years ago, then the service. The Bookmarklet started as a freestanding bit of code (Javascript, I think) for figuring out how to parse websites. The problem is generally an impossible one, though it devolves to two general cases:

  1. Website formats which are known and parseable.
  2. Website formats which aren't.

Expanding the set of the first, and knocking off members of the latter, should be possible with time. The additional component of change over time is another stickler.

If I could wire up the bookmarklet (or something equivalent) to dump pages to a standard decrufted format, that would be a start. The ability to add in additional metadata (or restructure what's there) in an at least semi-automated way would be another plus. I've already done this manually to a considerable extent. It's a surprisingly effective way to actively read a document....

I'm inclined to see the Web turn to some simpler format, something like the ePub format (a simplified subset of HTML and CSS), or better, just a set of standard page templates. These could be added to with time, but the idea would be to very strongly encourage standardisation.

As for solution systems: I keep eyeing Emacs, Org-mode, Pandoc, and some related items. Possibly Lisp as well.

Emacs because it's fundamentally a text-processing engine, and the information I'm trying to access it itself fundamentally text. Org-mode as a notational engine. Pandoc as the closest thing I've found to a universal document interchange format.

I've also been making use of the PoIC system, a Pile of Index Cards, as previously noted. That's got some pretty strong positives, though figuring out how to manage working sets is a bit of a challenge. I'm looking for some sort of vertical wall-mounted card-file holder, say, a set of 3x8 to 4x16 slots, each good for 50-100 cards, into which I could slot active projects. With a hinge mount and a set of possibly four double-sided "pages", that would give me up to 512 active slots for current work, which is probably as much as I could keep in my head at a time.

(I'm also thinking a lot about data scale, and human interactions with same. What happens when you've got 1 of something, 2, 4, 8, 16, 32, ... Data has weight, of a fashion, and the interactions vary.)

There's also Zotero, though I keep finding its GUI design getting in my way.

[–]akaleeroy 2 points3 points  (1 child)

I'd just add that this bookmarklet, Readable, performed better than Readability. Over possibly thousands of pages I've only had a major content-parsing hiccup twice! And it was understandable – those pages were really a mess.

Standardised formats for the web seem ever on the cusp of their big breakthrough, but so far only as a side-effect of social media giants. Maybe with shit hitting the fan there can be a renewed interest in them from the frugal and practical perspective that would make a difference. A Victory Garden -type effort to grow critical knowledge. Then there'd be evolutionary pressure to dispense with visual fluff and computational overhead, and people would more easily fall into adhering to standards and style guides when they see they're geared for effectively transmitting knowledge.

And web text templates aren't that much of a challenge. Rich media is. Were this crisis to also impose a need for mesh networks instead of the lavish Internet we currently use, then we ought better charter some new low-bandwidth standards already:

Media Huge Better Best?
Images JPEG BPG SVG sketches?
Animated how-tos GIF WEBM, h265 animated SVG sketches?
3D files STL compressed 3D formats OpenSCAD?
Audio MP3 OGG/OPUS TTS'd text?
Charts Excel Graphing libraries CSV data + high-level language interpreted locally by browser?

So my ideas about cutting down bandwidth rely on authoring in standardized high-level languages that browsers know how to interpret. Get the interpreter first then get the smallest possible set of instructions over the wire/air and voila! you're follow along a complex tutorial that would have taken GBs of YouTube streaming.

If there was a library of 3D character models with rigging plus models of a bunch of common household items you could codify 80% of practical /r/LifeProTips with short animation scripts. With a bit of helping text maybe. Bring in OpenSCAD and you can do even more, still within a few KB. Sure it won't be high-fidelity but it would get the job done.

[–]akaleeroy 0 points1 point  (0 children)

Oh and as motivation for this vision:

Animated 3D models because they're the most expressive (orbit them, pause them, explode them). Expressive because slogging through text descriptions of visual things in books you learn too slow. And, finally, learning fast because the decline of industrial civilisation will present us with an ever-changing landscape of new problems needing an ever-changing list of skills. And we'll get to study them under stress, working several jobs or on the move as climate refugees.

[–]prolesarefree 0 points1 point  (6 children)

Making mention of the PoIC system (as noted elsewhere) is proving to be invaluable, although - as I am visually orientated, my preference of using color coded cards is seen versus trying to decide on a "system of marking."

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (5 children)

Advantage of colour-coded cards: readily visually distinguishable.

Disadvantage: not extraordinarily extensible, and you need that colour card available when needed.

I'm using coloured cards as various "cover" indicators: topics, writing projects, planning themes, etc. That seems to work fairly well.

I don't really have a marking convention as yet, as I've been waiting for that structure to emerge. Though I do try to date every card.

Generally:

<Title>                                   <Date>

<Text>



<Source / Reference>

[–]prolesarefree 0 points1 point  (4 children)

I am still thinking about the formatting (your example above is close to my own). One thing I also considered comes from observations/experiences using Evernote Moleskine (now an abandoned effort).

Part of the observable downside of PoIC is the lack of transportability [i.e. If I had a readable or scanned (OCR) copy on OwnCloud, then I could ...? ]

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (3 children)

I think of the PoIC as a capture device, with organisational capabilities. It's not especially a mobile reference.

You can do the Hipster PDA based on a smaller stack (and the PoIC has its own alternative). I just stuff a set of 50-100 cards in my usual satchel and use that for notes on the road. That's not a high-use case, and I can add more cards if needed).

What I most find myself wanting are blank, unlabled tabbed dividers. My organisation schema is generally not alphabetic. Rather, it's ... whatever ideosyncratic model I've latched on for the current project.

(There are exceptions: organisations and people/biographies tend to be alphabetic.)

There are gummed tabs. I might even start using mini post-its for temporary references. Preferably multicolored, not so much for a structured use as simply to create another dimension of distinction amongst items. "Oh yeah, the green one...".

What would you have within a mobile PoIC if you had one?

[–]prolesarefree 0 points1 point  (2 children)

As far as mobile (or portable) goes, I tend to think that some sort of duplication method would be helpful. OCR is one approach (similar to what Evernote/Moleskine attempts).

However, another option I am considering is making use of QR Code generation. The "universality" of both generating and reading strikes me as well established (but personally untested on any scale). Storage of QR generated images , and retrieval at some later time might be [highly?] effective as well - depending on the naming methodology deployed (?).

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (1 child)

How would you make use of a QR code system? Unless you've got some way of printing those and affixing them to items, I don't particularly see a win.

QR Codes seem to match the use case of "present data or a data hook uniformly on a large number of objects", e.g., something you'd apply in a manufacturing or large-batch printing process. Not ... something to affix to individually-created index cards. Or Post-Its ;-)

[–]prolesarefree 1 point2 points  (0 children)

Now that a few days have passed (in retrospect), I am not really sure "what I was thinking" when it came to QR codes - with the exception of being able to encode larger files/data and store the info (as a sticker/or 'pre-printed') on a PoIC (card).

[–]neurocroc 2 points3 points  (1 child)

I understand your frustrations. I actually solved this problem for myself. Perhaps you will find this method useful too.

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (0 children)

Nice! I'll poke around at this, though I'm not sure it's an integrated or integrable solution.

I've actually suggested to Pocket that they utilise labels, and possibly a standard labelling schema, to provide mind-maps of user's interest space. I know that I would find this phenomenally useful.

Increasing I think that an established model such as the Library of Congress Classification System actually makes a hell of a lot of sense. Whilst a first response might be that it's excessive or overkill or insufficiently flexible, I see it as an established, considered, and time-tested option, which starts relatively shallow and goes as deep as need be, covering millions of catalogued items. That's a pretty solid proof-of-concept.

[–]prolesarefree 1 point2 points  (1 child)

Re: "The more I use Pocket, the worse it gets."

Unfortunately, this is the onus of most software solutions (as they fail to meet the desired "need" in one capacity or another). Yet your question of "the problem ... people working on it ... commercialization ... or ?" is only compounded by individual habit(s) and/or preferences, no?

Mind you, I only come to that observation after searching out "color-coded" index cards (to replace Post-It notes) and then attempting to devise a[n easy] system of transferring "information" (usually in Markup) into a printable template which can be combined with hand written notes/addendum.

I sure would be interested in "How to devise an email storage/reference system." (?)

fn. I used to rely heavily on Pocket - but "...not so much now."

[–]dredmorbiusIdditor in Chief[S] 1 point2 points  (0 children)

On the nature of the problem: Information management and research is intrinsically idiosyncratic. If you're researching something, you're starting from mystery and attempting to find order. So your system's got to support that. If it imposes too much order of its own, or too much by way of administrative overhead (whether in the form of paper-shuffling, heading to and from libraries, or finding oneself yak-shaving computerised systems with too much frequency), well, that's no good.

I suspect that of the people who attempt to solve such problems, simple regression to the mean and uneven skills-distribution means that the best researchers are ... involved with their research, and the best programmers ... aren't good system designers, and the good systems-designers understand neither programming nor reasearch sufficiently well.

See Ted Nelson and Project Xanadu....

The best systems are kind of clunky, but mostly work, offer a lot of flexibility, get out of the way, provide a great many places to affix tape, or baling wire, or paste on labels.

What I'm wondering more generally is if a generalised system is simply impossible because there are too many moving parts, and too few common standards and tools -- you're going to annoy someone no matter what choices you make.

On an email-based system: it helps a lot if you're familiar with mutt, or the mh system.

Mutt is a console-mode, full-screen (in the same sense that vim or emacs are full screen) application for reading and posting email. mh is a set of commandline tools for interacting with a particular mail archive.

The useful thing about email, or Usenet (you could also create this as some sort of locally-accessed news-spool) is that you have:

  • An intrinsic document structure -- the mailbox. I'd strongly suggest maildir, though other options exist.
  • Intrinsic and extensible metadata on the messages themselves. RFC 2822 and friends give you a From: To: Date: and Subject:, and you can add pretty much anything you want as an X-Header.
  • There is an extensive set of tools for searching, indexing, and manipulating the data.
  • Mutt in particular has exceptionally good filtering and searching tools itself, and scales to tens or hundreds of thousands of messages, potentially far more than that. I've definitely worked with it at those scales, including effectively dealing with very large document corpora.
  • Message bodies can be any arbitrary type, and can have multiple components. The wiring to, say, dump a WAR archive into an email format should be fairly straightforward.
  • Since Maildir exists as files-on-disk, shell or webserver tricks could be used to present a Web interface to all of this (leveraging the tools described above).
  • Mutt also offers threading. So annotations, drafts, or articles could be created simply as replies to parent articles.
  • Messages themselves can be edited in place.
  • Editing is done through your local system editor of preference (emacs, vim, etc.), so you've got full power there as well.
  • Similarly, it's possible to send a message (or multiple messages) to another destination. Sending multiple messages within a single Mutt message effectively creates a mini-thread within that message consisting of individual email messages themselves.
    A particular writing project becomes another email address. These could be maintained locally within a single system, inside an organisation, or more generally available.

Versioning strikes me as a potential sticking point, though there are a few options here. A message type of versioned document or such might itself be backed up against a git repo (or the whole mailspool could be).

[–]ephrion 1 point2 points  (8 children)

If I were a Pocket product manager I'd love this post. If I were looking to create a competing product, I'd love it even more.

[–]dredmorbiusIdditor in Chief[S] 2 points3 points  (7 children)

Pocket ... have been made aware of its existence. Others are more than welcome to read this as well.

The less coding I have to do to get what I want, the better ;-)

[–][deleted] 0 points1 point  (6 children)

Very curious about what Pocket had to say about this.
Maybe with the recent change in leadership (Mozilla taking over) and it (supposedly) going open source, it could be possible to implement much of what has been discussed here in a fork that caters to more uses than just a dumping ground.
That is something that we should keep in mind, right?

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (0 children)

Nothing, to date.

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (4 children)

I've re-submitted the issue. Still no word, not even an acknowledgement (other than the auto-ack).

[–][deleted] 1 point2 points  (3 children)

Would a petition or any other show of numbers be of any use?
Maybe if they knew just how many people are interested in this?

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (2 children)

That's a thought. Any suggestions as to what that petition might list, and/or where to post it?

I'm not entirely sure that Change.org is an appropriate venue.

An alternative would be for a significant purchasing entity to express an interest. I cannot move the market much myself, but a research organisation or large company looking for a references-management system might.

[–][deleted] 0 points1 point  (1 child)

Sorry about the delayed reply. While I too don't know what channel to use, could you please consider looking into Wallabag as an alternative.

Not in its current state but what we could mould it into. The developers are a lot more accessbile than Pocket's and are pretty active too. I know that this would mean a significant change in your workflow and a serious investment of time and energy but if this leads to a lasting and true solution, wouldn't it be worth pursuing?

It might be also be easier to rally people around this project.

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (0 children)

Thanks for the reminder. Taking a look again....

[–]woozalia 1 point2 points  (15 children)

The clincher for me, when I first heard about this, is that it isn't software you can self-serve and modify.

I have too many specific ideas about what I want to do with stashed links to wait for some centralized service to get around to not-ever-implementing them (or opening an API to make it possible for me to retrieve my data later). Been there...

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (14 children)

Do I ever hear you.

It's a place for me to stash things, for now. I can at least extract the list of URLs.

I may lose forever the tags, the content itself (if it's fallen off the Web), and access to the particular formatting.

I am so coming to hate the Web for written matter, generally.

[–]woozalia 0 points1 point  (13 children)

A link-stasher is high on my list of tools, because I need one for Issuepedia v2 (and HTYP v2, for that matter). Keeping track of links has been one of my areas of lowest efficiency for some time now.

Looking just at the tool level, I think it's #2 or maybe #3 in line if you count the revised event logger (which is pretty much working):

  1. event logging (serves as revision-tracker for content)
  2. wiki-like system (a simple wiki page is just a form of semantic record: each page has a title and content, and possibly children)
  3. link tracker (can be built on the same semantic data system)

I already have a hierarchical topic system (for tagging links), though I need to figure out how it's going to work for multiple users. I'd like each user to be able to maintain their own hierarchy (so you're not stuck with a hierarchy you hate -- that lacks important distinctions, leaves no room for certain things, has topics that are poorly described, etc.) but also for common hierarchies to emerge in order to minimize duplication of effort and also so different users can file things under the same topic.

...and then there's working out how to optimize the calculation of indexing within parent topics (i.e. if item X belongs to C which belongs to B which belongs to A, we'd like to be able to find X in a complete listing of A's members).

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (12 children)

If you build that Link-Tracker: LoCCS and MARC.

Do not roll your own.

[–]woozalia 0 points1 point  (3 children)

I very much doubt that those taxonomies will include all the things I need -- but they might be a good place to start.

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (2 children)

I'm not finding this response particularly reassuring, either.

[–]woozalia 0 points1 point  (1 child)

Do you have taxonophobia? Ontological anxiety?

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (0 children)

The thought that one person, in a limited number of person-hours, might improve on a century-plus of work, seems perhaps implausible.

That that person thinks so, seriously: worrisome.

[–]woozalia 0 points1 point  (6 children)

There's also OpenCyc's knowledge base: http://opencyc.org/ (click on "About Cyc", it's the first entry).

(Dangit, can't reply to myself to keep these in order...)

Also, there's the fact that I already have rolled no less than three taxonomies, for different purposes -- two of them via the MediaWiki category system on Issuepedia and HTYP, the other being topics for the VBZ catalog.

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (5 children)

I'm not finding this response particularly reassuring.

I'd need to actuslly see OpenCyc's KB to assess it. Not a fuzzy PNG.

[–]woozalia 0 points1 point  (4 children)

I downloaded it once, about a decade ago, but then didn't have time to do anything with it. I didn't see a download link over there when I was verifying the URL; I wonder if they've made Cyc less "Open".

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (3 children)

It appears that way.

Do you have the earlier copy anywhere?

[–]woozalia 1 point2 points  (0 children)

Possibly. I think I'd need to write my long-needed inter-media file-indexing program to find it, though. It was several generations of hard drives ago, and it may or may not have been backed up to a more recent drive.

[–]woozalia 1 point2 points  (1 child)

I found where they're keeping the goods now: http://dev.cyc.com/downloads/index.html

They have a whole "developer center", with APIs and stuff.

[–]dredmorbiusIdditor in Chief[S] 0 points1 point  (0 children)

Thanks!

[–]PrasantaShee 0 points1 point  (0 children)

In addition to pocketcloud, you may also try R-HUB remote support servers for all your remote desktop needs. It works from behind the firewall, hence better security.