|
|
|
|
|
|
23 Feb 2005 @ 21:34, by ming. Internet
Last week Google offered to host part of Wikipedia's content. Yesterday Wikipedia was brought off line for some hours by a power failure and subsequent database corruption.
Now, I pay attention to those things not just because Wikipedia is a great resource that needs to be supported. But also because I'm working on a clone of it, and I've been busy downloading from it recently.
The intriguing thing is first of all that that even is a possible and an acceptable thing to do. Lots of people have put a lot of work into Wikipedia, and it is generously offered as a free service by a non-profit foundation. And not only that, you can go and download the whole database, and the software needed for running a local copy of it on another server. Because it is all based on free licenses and open source. And, for that matter, it is in many ways a good thing if people serve up copies of it. It takes a load off of their servers, and potentially others might find new things to do with it.
Anyway, that's what I'm trying to do. At this point I've mostly succeeded in making it show what it should show, which is a good start.
Even though the parts in principle are freely available, it is still a pretty massive undertaking. Just the database of current English language articles is around 2GB. And then there are the pictures. They offer in one download the pictures that are considered in the "Commons" section, i.e. they're public domain. That's around 2GB there too. But most of the pictures are considered Fair Use, meaning they're just being used without particularly having gotten a license for it. So, they can't just share them the same way. But I can still go and download them, of course, just one at a time. I set up a program to pick up about 1 per second. That is considered decent bahavior in that kind of matters. Might sound like a lot, but it shouldn't be much of a burden for the server. For example, the Ask Jeeves/Taoma web spider hits my own server about once per second, all day, every day, and that's perfectly alright with me. Anyway, the Wikipedia picture pickup took about a week like that, adding up to something like 20GB.
Okay, that's the data. But what the database contains is wiki markup. And what the wikipedia/mediawiki system uses is pretty damned extensive markup, with loads of features, templates, etc. Which needs to be interpreted to show it as a webpage. My first attempt was to try the mediawiki software which wikipedia runs on. Which I can freely download and easily enough install. But picking out pieces of it is quite a different matter. It is enormously complex, and everything is tied to everything else. I tried just picking out the parsing module. Which happened to be missing some other modules, which were missing some other modules, and pretty soon it became unwieldy, and I just didn't understand it. Then I looked for some of the other pieces of software one can download which are meant to produce static copies of wikipedia. They're very nice work, but either didn't quite do it quite like I wanted it, or didn't work for me, or were missing something important, like the pictures. So I ended up mostly doing it from scratch, based on the wikipedia specs for the markup. Although I also learned a number of things from wiki2static, a perl program which does an excellent job in parsing the wikipedia markup, in a way I actually can understand. It still became a very sizable undertaking. I had a bit of a head start in that I've previously made my own wiki program, which actually uses a subset of wikipedia's markup.
As it says on the wikipedia download site:These dumps are not suitable for viewing in a web browser or text editor unless you do a little preprocessing on them first. A "little preprocessing", ha, that's good. Well, a few thousands lines of code and a few days of non-stop server time seems to do it.
Anyway, it is a little mindblowing how cool it is that masses of valuable information is freely shared, and that with "a little preprocessing" one can use them in different contexts, build on top of them, and do new and different things, without having to reinvent the wheel first.
But the people who make these things available in the first place need support. Volunteers, contributors, bandwidth, money. More >
|
|
|
|
2 Feb 2005 @ 18:37, by ming. Internet
I'm doing various little programming contract jobs at the moment. And it is remarkable to notice how much effort apparently is being spent on trying to abuse various shared internet resources. I.e. getting around the way something was intended to be used, for the sake of self-promotion. Like, somebody just asked me to do a program doing what Blogburner is doing. I said no, and gave the guy a piece of my mind, but I'm sure he'll find somebody else to do it. "Blog and Ping" they call it. It is essentially that you automatically set up a number of fake blogs at a site like Blogger and you automatically post a large number of regular web pages to them, pinging the blog update sites as you do it, pretending that you just posted something new on your blog. Of course exploiting the somewhat favored status that blogs have in search engines, and attracting traffic. Under false pretenses.
And that's just one of many similar project proposals I see passing by. There are obviously many people getting various kinds of spamming programs made. You know, stuff like spidering the web for forums and then auto-posting ads to them. Or automatic programs that sign up for masses of free accounts in various places. Or Search Engine Optimization programs that create masses of fake webpages to try to show better in the search engines. I don't take any of that kind of jobs, but it is a bit disturbing to see how many of them there are.
It is maybe even surprising how well the net holds up and how the many freely shared resources that are available can be viable. Another example. You know, there's the whois system that one uses to check the registration information for a domain, who owns it, when it expires, etc. Now, there's a business in trying to grab attractive domain names that for one reason or another expire. So there are people who set up servers that do hundreds of thousands of whois lookups every hour, in order to catch domains right when they expire, in order to re-register them for somebody else. Or any of a number of variations on that scheme. To do that you'll want to do maybe 100 whois lookups every second. And most whois servers will try to stop you from doing that, but having some kind of quota of how many you can do, which is much less. So, you spread the queries over many IP numbers and many proxy servers, in order to fool them. And the result is inevitably that a large amount of free resources are being spent, in order for somebody to have a little business niche.
At the same time I can see that part of what makes the net work in good ways is indeed that one can build on somebody else's work with few barriers. That one can quote other people's articles, borrow their pictures, play their music, link to their sites, use their web services, etc. And add a little value as one does so. And I suppose the benefit of generative sharing will outweigh the problems with self-serving abuse of what is shared. But it seems it also involves an continuous struggle to try to hinder abusive use of freely accessible resources.
Like, in my blog here. An increasing number of visits are phoney, having bogus referrer information, just to make a site show up in my referrer logs. No very good solution to that, other than if I spend server resources on spidering all the sites to see if they really have a link to here. More >
|
|
|
|
25 Jan 2005 @ 20:25, by ming. Internet
So, I continue to have a bit of fun with that webcam thing I did. In part because there still are several thousand people coming by looking at it every day. So I add a few improvements once in a while.
Mikel Maron made the nice suggestion that one could establish the more precise location of the different cams collaboratively, and then one could maybe do fun things like having them pop up on a world map or something. So, I added forms for people to correct or expand the information on each location. Like, if they know the city, or the name of the building, company, bridge, or whatever, they can type it in. And while I was at it, I added a comment feature.
OK, so, presto, instant collaboration. Within a couple of hours lots of helpful (or maybe bored) visitors had figured out where a bunch of these places were, and they had typed them in.
But, at the same time, what is going on is that these webcams seem terribly interesting to Chinese or Japanese speaking people. 70,000 people came from just one Japanese softcore porn news site who for some reason linked to it.
But then there's a slight, eh, communication problem here. Or language problem. Or character set problem. See, I've set it up so that the forms where you leave comments or update the info can take Unicode characters. So if somebody wants to type a comment in Japanese, they should be able to do that. And some people do. But the explanatory text on my page is in English. And it seems that a large number of people don't really have any clue what any of it says, but they have a certain compulsion to type things into any field that they see. So, if there's a button that leads to a form where you can correct the city of the camera, they'll click on it, and they'll enter (I suppose) their own information. Or they say Hi or something. See, I find it very mysterious what they actually are writing. It is for sure nothing like English. But it isn't what will appear as Chinese or Japanese characters either. Rather, it looks to me like what one would type if one was just entering some random test garbage, by quickly running one's fingers over a few adjacent keys. But the strange thing is that dozens and dozens of different people (with different IPs) are entering either very similar, or exactly the same, text. This kind of thing:
Facility: fdsfdfdsdsfd
City: dsfdsf
Yeah, I can type that with 3 fingers without moving my hand from the keyboard. But why would multiple people type exactly the same thing?? Does it say something common in Chinese?
Now, we have a bit of a cryptographic puzzle here. Notice that "Facility" (the name of the field) has twice as many letters as "City". And "fdsfdfdsdsfd" has twice as many letters as "dsfdsf". Consider the possibility that somebody might think they're supposed to enter the exact word they see into the field. Like some kind of access verification. And they use some kind of foreign character input method that encodes Latin characters as one and a half bytes. If so, I can't quite seem to decode the system.
Or, are we dealing with some kind of Input Method Editor (IME) that lets people form Chinese symbols by repetitive use of keys on a QWERTY keyboard? Anybody knows?
This is a bit like receiving signals from some alien civilization. Where's the signal in the noise? How might these folks have encoded their symbols, and what strange things might they be referring to? Are they friendly? dsfdsf?
Otherwise, if anybody here actually speaks Chinese or Japanese, could you give me a translation, preferably into the proper character set, of a sentence like: "This is the information for the camera location. Please do not enter your own personal information here!" More >
|
|
|
|
2 Dec 2004 @ 14:56, by ming. Internet
As envisioned by the Rand Corporation 50 years ago. Yeah, they were pretty spot on. Except for I don't have that big steering wheel thing.
... Later: see comments. It is actually a fake photoshopped picture, pieced together from a submarine control panel and a few other items from other sources. But, hey, splendid work. More >
|
|
|
|
30 Nov 2004 @ 15:16, by ming. Internet
Robin Good suggests a directory of freely re-distributable RSS feeds. Which is a fine idea. I'm not aware of any being in existence. Well, there are some nice directories of feeds, like NewsIsFree and Syndic8 where one can subscribe oneself to thousands of feeds and make one's own personal news portal. But can one mix and match from them to offer one's own feeds? Is the content really licensed for re-distribution? Mostly that's left vague. One might assume that if anybody is offering an RSS or Atom feed it is because they don't mind that one does whatever one feels like with them, but that isn't generally the case. The content is still in principle copyrighted, and various kinds of licenses might be implied.
Robin had a bit of an argument recently with another news site, as he took the liberty of creating an RSS feed of their articles as a service. Articles which are really just assembled from other public news sources. And they felt he was somehow bereaving them of income by stealing their content without asking. But why shouldn't he?
The answer should really be a directory of feeds with clear Creative Commons types of licenses. I.e. people would state explicitly whether it is public domain, whether they need credit, whether it can be used for non-commercial purposes, etc. Which cuts off a lot of red tape as you right away will know what you can do with it. And it opens the door for better tools for constructing custom feeds out of other feeds. The Algebra of Feeds like Seb Paquet called it. More >
|
|
|
|
7 Nov 2004 @ 09:16, by swanny. Internet
CENTER STAGE
AHHHHHhhhhhhhh......
All the internets a stage..... an escape from the fetters of reality......
I...... we here be but some distant illusion and deception to enchant
and deceive and steal thy affection and coin and then leave thee
with some cold kiss..........
Our time eclipsed by time itself.
Enchant us you say.......
Save us from our dreary and dull existences and temp us with song and
dance and wine and woman and visions divine.
For what I ask?
Mere coin and affection...... mere time and attention......
For I to will fall to the axe of time and soon be no more.....
What would you have me do...... The Grand Hari Kari..... for it seems you
crave drama and insanity. This seems no place or stage of reason......
For by its very nature and inception seems stood to deceive and delude in the
blink of an eye every and all to whom it beckons.
For master it has none. Who is the master of this stage...... Let them speak
but now that we might hear........ Who has called this stage into existence....
Pray speak now............. that we might "know"......
.............................
..........................
................. What is its purpose ...... this stage ........ this stage.....
Would but this stage could but speak and reveal its designs then what story
and tale might it tell....... STAGE!!!!!! Speak stage that all might hear and know.....
Oh stage oh stage where fore art thou stage.....
Be you but dumb wood and nail and wire knotted and bell and whistles.....
Speak you then for the world doth listen........listen...... nay not all......
if everyone were listening ........ Stage..... I....... we seek from thee perhaps
a gift or two ...... Hath thee then gifts for us........ and L O V E.........
Oh stage.... oh stage hath thee a name at least then .......
then pray tell it to us now....... Stage what be thy name......
We are humanity and thou art stage but why dost thou not speak......
thou stigmergic philosophers stone and object of...... this discourse....... here there
and everywhere with feet on ground and stage on ground...... alas.....
Ahhhhhhhhhhhhhhhhhhhhh
sighs and exits.......
November 7th 2004
swanlake
Earth More >
|
|
|
|
17 Aug 2004 @ 12:53, by ming. Internet
"Microcontent" seems to be one of the buzzwords now. So, what is that, really?
Jakob Nielsen, interface guru, used it (first?) in 1998 about stuff like titles, headlines and subject lines. The idea being that first you might see just a clickable title, or a subject line of an e-mail, that you then might or might not decide to open. So, that title needs to be representative of the full thing, or you might not click it, or you'll be disappointed when you do. Microcontent (the title) needs to match macrocontent (the page, e-mail, article).
Now, that doesn't quite seem to be how "microcontent" is used nowadays. OK, on to 2002, Anil Dash says this, talking about a client for microcontent: Microcontent is information published in short form, with its length dictated by the constraint of a single main topic and by the physical and technical limitations of the software and devices that we use to view digital content today. We've discovered in the last few years that navigating the web in meme-sized chunks is the natural idiom of the Internet. So it's time to create a tool that's designed for the job of viewing, managing, and publishing microcontent. This tool is the microcontent client. For the purposes of this application, we're not talking about microcontent in the strict Jakob Nielsen definition that's now a few years old, which focused on making documents easy to skim.
Today, microcontent is being used as a more general term indicating content that conveys one primary idea or concept, is accessible through a single definitive URL or permalink, and is appropriately written and formatted for presentation in email clients, web browsers, or on handheld devices as needed. A day's weather forcast, the arrival and departure times for an airplane flight, an abstract from a long publication, or a single instant message can all be examples of microcontent. Oh, and an absolutely excellent article it is. It calls for the building of a client, a program that will allow us to consume and create microcontent easily. Not just aggregate it, but allow us to use it in meaningful ways. I.e. seeing the information how we want to see it, without having to put up with different sites' different user interface quirks. Good examples he gives at the time is Sherlock or Watson on Macs. You can browse pictures, movies, flight schedules, ebay auctions and more, all from the same interface, and without having to go to the sites they actually come from. But we're still not quite talking open standards for all that.
What is needed is the semantic web, of course. Where all content has a uniform format, and is flagged with pieces of meaning that can be accessed and collected by machines. Isn't there yet. Many smart people are playing with pieces of it, like Jon Udell, or Sam Ruby. Or, look at Syncato. All stuff mostly for hardcore techies at this point. But the target is of course to eventually let regular people easily do what they find meaningful with any data that's available on the net. More >
|
|
|
|
28 Jul 2004 @ 14:59, by ov. Internet
SAGE stands for Simulation and Advanced Gaming Environments (SAGE) for Learning. This is a recently launched collaborative research initiative by Simon Fraser University in Vancouver Canada. Most of the initial work is for the purpose of collecting a foundation of information for using the latest computer technology and applying it towards the creation of learning environments. The expectations are that this will create a new industry in which Canada will be able to play a leading role.
SAGE is a wisdom project. We have evolved beyond facts, beyond knowledge, and now we seek wisdom on a mass basis. Wisdom to know what to do, and why and how. Wisdom which can be learned but not taught, earned but not bought, elusive but essential. Perhaps in virtual environments we can try things on for size and see how they fit before we commit. This is an option we really haven't had before.
SAGE is more than an academic exercise, it involves the academics but also the students and the general public as well. There is a theory side, a participatory side, and the observation of both. This initiative has the support of the universites, the government and industry. There are a lot of qualified people that are being paid to work on this project, it is not just a good idea being held afloat by a handful of volunteers, although I suspect there will be lots of volunteers involved, as well as by those that want experience for future monetary gain. There should be high interest in this project since the content involves health care which is probably the most popular political issue for Canadians.
Vancouver already has an established industry in electronic gaming. There is also a large number of people working in the film industry and the nickname of Hollywood North is well earned. Vancouver itself, along with the high concentration of creative people that have moved here, is probably the biggest asset for this project.
We already have some experience in new education forms such as high school classes where every student was provided with a laptop, which had wifi connection, and classroom projects were collaborative efforts; parents, teachers and students were so impressed with the results that there are plans to introduce this method into more Vancouver schools.
The SAGE project is hyper multi-media. It not only includes all aspects of online communication but a large off line component as well. It also involves that overlapping area where the mass population engages in collective dialogue. For example I heard about this project last week on CBC Radio where David Kaufman the project leader was being interviewed and was also taking telephone calls from the audience. The project brochure (at SAGE link) lists a multitude of well defined objectives, and seeing as how it is in that most disagreeable online format of pdf Adobe Acrobat it will most probably be printed out, and find its way into coffee shops around town, along with newspaper articles, and through word of mouth. Hyper multi-media is not restricted to a communication medium but is diffused throughout the culture.
Vancouver is engaged with experimenting in numerous forms of participatory democracy, and all of this comes together in a culture of creativity and creation.
It wasn't that long ago that I would get very discouraged by the fact that online worlds like Everquest and Ultima would have millions of users paying a monthly fee to engage in escapism, and yet it was damn hard to find more than a handful that would cooperate in an intellectual discussion in a web conference. It will be interesting to see if this latest Vancouver experiment will provide more than mere stimulation. If it can happen anywhere it will be here.
|
|
|
|
18 Jun 2004 @ 18:55, by ming. Internet
Richard MacManus wrote a couple of articles about synchronicity and the web: Statis and Synchronicity and A Theory of Synchronicity for the Web. Synchronicity is a term made famous by the psychiatrist Carl Jung. He defined synchronicity as an "occurrence of a meaningful coincidence in time". Further, it as "an acausal connecting principle". Which is to say that a connection occurs through the sharing of a common meaning, not because one event caused the other. Jung went so far as to boldly state that "synchronicity could thus be added as a fourth principle to the triad of space, time, and causality".
Synchronicity has come to mean a variety of things. Laurence Boldt claims that synchronicity reflects the "underlying interconnectedness of all things within the Universe" [my emphasis]. An attractive theory for those of us addicted to Web culture! Stephen J. Davis states that synchronicity is "a very personal and subjective observation of this inter-connected universe of which we are but a small part". Another keyword that pops up in writings about synchronicity is "flow" - which of course reminds me of the Web's Information Flow. When used to describe synchronicity, it's all about the "flow of life". For example, this quote:
"When we are in the flow we experience more synchronous events, more pleasure and less pain. The flow of coincidences is our path to higher ground." So, yes, we need more synchronicity and more serendipity. He doesn't really say how that actually might work, but nevertheless it is an important subject.
We could use a synchronicity engine, really. Some tools that increase synchronicity.
Randomness is one way of going about it, even though it isn't enough in itself. If you look at some random, unexpected content frequently, you're likely to run into something unexpected that really fits for you. Random links used to be popular, but probably give you too much junk most of the time.
Collaborative Filtering might suggest new things to you that you didn't know about, but that fit your interest areas. E.g. Amazon will suggest a book to you that you maybe didn't know about, which has been bought by other people who've bought similar books as you. That's useful of course, but it is rarely what we would call synchronicity.
Blogging and the reading of many news feeds tends to increase synchronicity. You only look at a small sub-section of the world, as you read blog feeds you've already picked as being somehow interesting. You don't control what people write about, and you scan whatever it happens to be. And sometimes themes form unexpectedly. Several people write about the same things at the same time. Which might appear mysteriously meaningful and timely. OK, sometimes it is merely because they happened to read the same article and comment on it. The blog world is a bit inbred, as many people comment on the same things, and mainly scan each other's feeds and standard news sources for input.
Sometimes the most stimulating posts are either when somebody picks some unnoticed or old item or when they write about their own life, without referring to any news item. Looking around for unnoticed or new snippets of information is likely to increase synchronicity, as the item might appear timely and relevant for a bunch of other people, but also unexpected.
I like using semi-random content on some sites I've done. Quotes, web links, pictures, etc. The combinations of what pops up often seems meaningful to people. Like the quote was selected just for them.
It is like the old creativity technique of blindly finding two words in the dictionary, and then pretending that they relate to a particular situation or problem at hand, and looking for the meaningful connection between them. It is very often there, and it is often useful. That's a way of generating synchronicity.
There needs to be a wide-range freedom of motion for synchronicity to be more likely. If I only change between 3 quotes on my webpage, none of them will seem very synchronistic to most people. But if I have a few hundred, and they're good quotes in the first place, many people will find them strangely relevant.
Synchronicity is also increased the more different items I practically can manage to be shown. Again, if I see only one quote per day, chances are fewer that it will be really meaningful than if I could stand paying attention to 100. But I maybe can't. There's a sweet spot somewhere, where you're presented with enough diversity, but not so much that it becomes a blur.
If I go to a party with 10 people, and it turns out that two of us are wearing the same shirt, that's a coincidence I'll notice, even if it is not very meaningful. If we talk, and find out we were wearing the shirt for the same unlikely reason, then it begins being meaningful. But if there were 1000 people, and one of them was wearing the same shirt as me, that would just be statistics at work.
As to the net, the question is how to provide me with an increased number of coincidental fits, in a number that is great enough to be useful, and small enough to be remarkable.
There's probably some strange way of calculating the generative diversity in a volume of information, blog postings or whatever. And then maybe the synchronicity potential. You know, the information has to be sufficiently relevant to me in the first place, for me to bother paying attention to it. But sufficiently diverse and unexpected to supply me with new fits that I couldn't have guessed on my own.
In any stream of data one can measure the amount of information, at least theoretically. If I tell you 000000000000010000, then the information is in the part that is different. The 1 is the interesting part. The rest can easily be compressed into a very small space.
Same with the stream of postings in blog world, theoretically. How much of it is really people talking about the same things, and saying very similar things about them? How much of it is really new? How much of it is information? How much of it is knowledge being transferred, i.e. you actually get something you can do something with?
Synchronicity is often that you send out a signal you weren't aware of, and you get a response. If you're aware of it, it is something else. If I search for something on google, and I find it, it isn't terribly surprising any longer, and it isn't synchronicity. But it might be when I get an answer to something I didn't quite know I was asking.
I vaguely hear somebody mention a book at another table in a restaurant. I walk into a bookstore five minutes later, and there it is on the shelf, and when I open it, I realize it is very interesting and relevant to me. That's a synchronicity.
Aha, that gives some inkling of how we technologically can help it happen. Something needs to capture way more channels of information about you than you normally bother paying conscious attention to. At least not at the same time. What people have been saying around you recently; what clothes you're wearing; what's on your bookshelf; all the people you know; all the subjects you're interested in; all the projects you're working on. And something needs to be matching all these items with other people's items, and items in your surroundings, as a background process.
There's no reason you shouldn't be able to have access to sufficiently extensive and automatic information sharing that you can walk out on the street and something says "Beep! That person walking on the other side of the street is out to buy a washer. You have one for sale. Why don't you talk with him?"
We're simply talking about some kind of location-aware device that knows who's close by, in the real world, or in an online setting. And then some way of representing a large number of needs and wants and what's available. That's the harder part. Expressing a lot of fairly fuzzy human resources and resource requirements in a finite enough way that they can be automatically matched. Even if they might not have been deliberately voiced.
In principle the objective is simple. You'd carry a lot of informational receptors in your space. They will link up with matching reciprocal receptors that are available in your environment. If done right, it is a technology-assisted way of being in the flow all the time.
What most people want is out there, and probably close by. What most people offer is needed somewhere, probably close by.
We could very well get used to having things matched up effortlessly, rather than having to spend a lot of energy looking for things that aren't there. And lot of things would just be working, by lightning speed.
It can take several frustrating hours to look for a suitable plane flight that is cheap and actually available. There's no good reason you shouldn't get the information that you eventaully end up with, but right away, in the first try. It can take hours looking for the right product for some purpose. It can be a good deal of work selling some item, as you need to locate good places, and there are several of them, and you aren't in any way guaranteed to find the people who really want your item. All of that kind of thing could simply be an automatic underlying substrate of connectivity, that connects those things that fit, and lets you know about it, and which doesn't waste your time with all the things that don't fit.
The Synchronicity Engine. We need it soon. More >
|
|
<< Newer entries Page: 1 2 3 4 5 Older entries >> |
|