Francois Marier: Keeping up with noisy blog aggregators using PlanetFilter |
I follow a few blog aggregators (or "planets") and it's always a struggle to keep up with the amount of posts that some of these get. The best strategy I have found so far to is to filter them so that I remove the blogs I am not interested in, which is why I wrote PlanetFilter.
In my opinion, the first step in starting a new free software project should
be to look for a reason not to do it
So I started by looking for
another approach and by asking people around me how they dealt with the
firehoses that are Planet Debian and
Planet Mozilla.
It seems like a lot of people choose to "randomly sample" planet feeds and only read a fraction of the posts that are sent through there. Personally however, I find there are a lot of authors whose posts I never want to miss so this option doesn't work for me.
A better option that other people have suggested is to avoid subscribing to the planet feeds, but rather to subscribe to each of the author feeds separately and prune them as you go. Unfortunately, this whitelist approach is a high maintenance one since planets constantly add and remove feeds. I decided that I wanted to follow a blacklist approach instead.
PlanetFilter is a local application that you can configure to fetch your favorite planets and filter the posts you see.
If you get it via Debian or
Ubuntu, it comes with a
cronjob that looks at all configuration files in /etc/planetfilter.d/ and
outputs filtered feeds in /var/cache/planetfilter/.
You can either:
file:///var/cache/planetfilter/planetname.xml to your local feed readerhttp://localhost/planetname.xml) using a
webserver, orThe software will fetch new posts every hour and overwrite the local copy of each feed.
A basic configuration file looks like this:
[feed]
url = http://planet.debian.org/atom.xml
[blacklist]
There are currently two ways of filtering posts out. The main one is by author name:
[blacklist]
authors =
Alice Jones
John Doe
and the other one is by title:
[blacklist]
titles =
This week in review
Wednesday meeting for
In both cases, if a blog entry contains one of the blacklisted authors or titles, it will be discarded from the generated feed.
Since blog updates happen asynchronously in the background, they can work very well over Tor.
In order to set that up in the Debian version of planetfilter:
Set the following in /etc/polipo/config:
proxyAddress = "127.0.0.1"
proxyPort = 8008
allowedClients = 127.0.0.1
allowedPorts = 1-65535
proxyName = "localhost"
cacheIsShared = false
socksParentProxy = "localhost:9050"
socksProxyType = socks5
chunkHighMark = 67108864
diskCacheRoot = ""
localDocumentRoot = ""
disableLocalInterface = true
disableConfiguration = true
dnsQueryIPv6 = no
dnsUseGethostbyname = yes
disableVia = true
censoredHeaders = from,accept-language,x-pad,link
censorReferer = maybe
Tell planetfilter to use the polipo proxy by adding the following to
/etc/default/planetfilter:
export http_proxy="localhost:8008"
export https_proxy="localhost:8008"
The source code is available on repo.or.cz.
I've been using this for over a month and it's been working quite well for me. If you give it a go and run into any problems, please file a bug!
I'm also interested in any suggestions you may have.
http://feeding.cloud.geek.nz/posts/keeping-up-with-noisy-blog-aggregators-using-planetfilter/
| Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |