Improving Automatic Translation Software, using Human text-corrections in Automatic Translation

Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.Image via Wikipedia

A few years ago (2003-2004) I made a detailed proposal to several Greek I.T. companies: The proposal was to «improve the performance of automatic translation software» through additional «boot-strapping code» in Prolog, which could learn from the mistakes of Automatic Translation by creating automatic generalizations in NLP (Natural Language Processing) using ILP (Inductive Logic programming).

Fed with human corrections, the proposed software would continually compare human corrections with automatic translation mistakes, so that it would learn (using A.I. techniques such as ILP) how to correct the latter, improving (the final outcome of) Automatic Translation, with time.

The human text-corrections could be stored in a so-called «Translation Memory System» (such as Trados), while the main NLP/ILP program could be a plugin or extension for already-existing Automatic Translation Software (such as SysTran).

Unfortunately, there wasn’t a single Greek I.T. company, willing to risk (my proposed salary of) 10-15 thousand euros (spread over a period of 6 to 12 months) to try out this simple and (in my humble opinion) powerful software-idea. Only one Greek company  (software importers and software developers themselves) told me they might be interested, but that… they would need to see a working software demo of the idea, first! Well, I replied that the proposed initial phase of 6 to 12 months (for which I requested a decent salary) was precisely intended to cover this development phase, to arrive at an «alpha-phase software» implementing the idea. If I could support myself without their help during these few months, I wouldn’t ask for their assistance in the first place! -Instead, I would probably try to sell the «already working demo», or start (once again in life) an I.T. company to distribute the product. But I couldn’t live without any income, during the proposed «initial phase». Well, they immediately lost all interest in the idea, as a result; they wanted to do business with zero risks, zero investment and… zero innovation!
-Ah well, since they were a Greek company, I was not -in the least bit- surprised;
always expecting the worst in Greece:

  • During the last 3 years, more than 5000 colleagues of mine, Greek I.T. professionals, researchers and scientists have left Greece, emigrating to… saner countries, seeking high-tech jobs; they left because of the extreme narrow-mindedness, imbecility, corruption and complacency that predominate over here:

  • Greece is an entire country of -largely- useless, narrow-minded people, most of them living in their own Closed Little World, envying and demeaning each other. Of course, there ARE noteworthy exceptions -typically among those who live (or have lived) elsewhere.

  • Nevertheless, my proposed «automatic learning software» would be profitable in Greek language automatic translation, since (1) Greek is a relatively rare language, that most A.I./NLP developers do not understand, and (2) there is simply zero competition (for high-tech Greek language software innovations), in a country overwhelmed by so much unbelievable amounts of stupidity, backward-thinking mentality and sheer technological incompetence. Any Greek company willing to invest in this simple idea would become quickly successful, in the Greek language translation market, since my own translation skills in Greek (as well as English) are adequate for the project. The success of the project depends on two types of skill, ideally simultaneous: (1) all the necessary software development skills, and (2) perfect knowledge of at least two languages, for the first language-pair to work properly.

In fact, I wouldn’t even dream of implementing this innovation (in «only 6 to 12 months») for languages I don’t understand, such as German: My idea is guaranteed to produce good results only for Greek-to-English translations (and vice-versa), which is why (to my utmost… disgust) I approached Greek companies (rather than foreign) in the first place. After a while, E-mail sent to companies all over the world was either ignored, or answered with polite replies like «no thanks, we” re not interested in Greek at the moment».

  • Well, if you happen to be an I.T. company or a Greek investor of… rare open-mindedness, just send me an e-mail, to omadeon@hotmail.com, and I’ll be glad to explain it further.

  • After the initial alpha-phase, my estimate is another short period (of about six months) to produce a final, fully debugged, nicely packaged shop-ready software product, probably marketed with a nice title like «Automatic Translation Intelligent Companion» (etc).
  • However, when your profits-per-month begin to exceed significantly the cost of my salary, I’d be pleased to get a small percentage of your profits, without any salary (a negotiable, modest percentage, somewhere between 15 and 25% of your net income from the proposed product);
  • offering you unlimited support, as well.
Related articles

.

Ted Nelson (the creator of HyperText) criticizes today’s Web, proposing humanistic innovations

I became a fan of Ted Nelson“s visions and his «Xanadu project» through his book «Computer Lib/Dream Machines» many years ago. Here is the book’s front cover:

cl-cover7a.jpg
Ted Nelson is the man who coined the term «hypertext«:

Whether the World Wide Web was my idea is a matter of controversy; but no one questions that I coined the words «hypertext,» «hypermedia,» «micropayment» and «dildonics,» among others.Ted Nelson in «MY LIFE AND WORK, VERY BRIEF«

Ted Nelson is universally acknowledged as the father of HyperText. However, he still maintains a highly critical stance towards modern implementations of his original vision, such as the HTML language:

«HTML is precisely what we were trying to PREVENT— ever-breaking links, links going outward only, quotes you can’t follow to their origins, no version management, no rights management». Ted Nelson in a Wikipedia quote

Nelson at at Keio University, Japan 1999 (image Belinda Barnet).
tn_image02.jpg
Ted Nelson in 2002
«…the objective is to create a unified quotable world– that is, a pool of transquotable documents on the Internet, and that means including documents whose content is sold as well as free. Thus a key aspect of the plan is to build a microsale system, so that sold content may be offered under transcopyright by authors and publishers.» (Ted Nelson in «Transcopyright«)

Here is what Ted Nelson said to a BBC interviewer a few years ago (Red emphasis is mine):

«The World Wide Web is not what we were trying to create. The links only go one way. There’s no permanent publishing. There is no way you can write a marginal note that other people can see on what’s in front of you. There is no way that you can quote freely..(*)

  • (*) Now, this statement was made in 2001, and it is partly outdated: Today (2007) Web 2.0 «social tagging» and annotations are possible, and also shareable. However, most of Ted Nelson’s criticisms are still quite valid.

My own attempts (in the late eighties) to implement Nelson’s visions, made me accidentally hit across similarities between his ideas and today’s Semantic Web, about which only very recently I became aware: -Adhering closely to Nelson’s principles, I ended up with a data-structure for hypertext implementation, surprisingly similar to the foundations of the Semantic Web:

  • It was a rare, sunny British Sunday, in the summer of 1988, when I devoured Ted Nelson’s book -beginning to end- while enjoying the sunshine, sitting on a park-bench in Oxford (UK), where I had just started working in I.T. for a local company. On the evening of that same day, I hacked some hypertext code in a Prolog interpreter, running in an Amstrad late-eighties” heavy-duty «laptop», with floppies but no hard disk.
  • My first Hypertext program (of 1988) was very crude by today’s standards: It was just a simple text-editor, where (by pressing certain keys) hyper-links could be inserted, edited, or deleted. The phrases chosen as links appeared in bold. By pressing Space they were replaced by other documents, implemented as clauses of a Prolog Database stored in RAM (since floppies were too slow and there was no hard disk!)
  • A couple of years later a new program emerged, a hypertext editor with multiple windows, implemented in PDC Prolog (as well as Assembly Language for speed): «Hyperion» was a DOS program, but it still works: You can download Hyperion-1 here. If you can read Greek (or tolerate… machine-translation pigeon-English) there is a Greek article about Hyperion and HyperText here.
  • «Hyperion-1″ was finally released a few months later, through a computer magazine’s companion disk. It had multiple windows with scroll-bars, user-defined colours and multiple types of hyperlinks embedded in each high-lighted phrase. In fact, the data-structure of Hyperion closely resembled today’s «RDF triples» (in the Semantic Web), with three arguments in each hyperlink definition: (1) Source phrase / HyperKey, (2) Target-document and (3) Relation between them (user-defined, optionally using graphics or user-defined transformations of the Target-document)…
  • Today -in 2007- the Semantic Web addresses some (but not all) of the fundamental issues raised by Ted Nelson and his associates for a number of years, ever since HTTP (Hypertext Transfer Protocol) was invented. E.g. «Where the World Wide Web Went Wrong«.

(to be continued)

MONDRAGÓN CORPORACIÓN COOPERATIVA (MCC): an international People’s Cooperative Corporation

Autonomous Basque Country.Basque region, via Wikipedia

I stumbled upon this remarkable Alternative Company fairly recently, in a Green politics newsletter: «MONDRAGÓN CORPORACIÓN COOPERATIVA» (MCC) is a competitive, rapidly expanding big corporation, the seventh largest in Spain and the largest in the Basque region, where it began. It has now expanded to a multinational corporation employing 82 thousand people. About half of MCC’s «employees» are partners and co-owners of the company itself, empowered with democratic rights and profit-sharing, as well as direct participation in the company’s decisions through the General Assembly, where the principle of «one person = one vote» is used.

UPDATE: In view of the current international crisis (in October 2008), this post has been brought again to this blog’s front page. Moreover, here are some new hot links of relevance to Cooperativism (and the current Crisis): Συνέχεια ανάγνωσης

This blog’s Banner and Avatar, fully documented!

This blog’s banner is documented in a new web-page (click here to see it).

My «avatar» is a tiny detail of the blog’s banner (the corner on the right-hand side): – A small part of an oil-painting called «The Astral Visitor», digitally modified to emanate an aura of White Light, with an additional source of White Light (or Energy) located in the region of the heart (the «Emotional Chakra»). Music symbols have also been added, on the bottom of each side:

omadeon-128.jpg
  • This post was inspired by a friend, the Greek blogger HarisHeiz (crazycows.wordpress.com) who posted detailed explanations of his own avatar and blog-banner here, while his friend Manos documented his own avatar here (quite humourously). Of course, my banner and avatar are far from perfect; so, feel free to criticize or contribute your own ideas about them!

A minimal tribute to the «impiteous Tata Sisters»

swearing in cartoon

Image via Wikipedia

The existence of extremely popular Greek bloggers who are also… extremely filthy, tacky, full of gossip, slander and all kinds of excrement, has caused the appearance of antidote-blogs, often misconstrued as «hate-blogs». Usually, they are not hate-blogs: On the contrary, they are heroic activists against the filth of certain other blogs (usually not readily detectable)! :)

E.g. there exists an unusual blog, full of vitality, honesty and strong language (wherever necessary): The «Impiteous Tata Sisters blog» (of «minimal situationism«). I had a good laugh, reading what the Tata Sisters come up with, from time to time; agreeing with them most of the time! Unfortunately, it’s a Greek-speaking blog. However, it’s worth reading if you can understand Greek and if your mind is open ; you can also try automatic translations of this blog’s posts in English (quite ghastly in translation errors sometimes, but sometimes also even funnier, as «Automatic Translation Babel-Pigeon English»). From now on, I decided to include this militant, straight-talking «Tata Sisters Blog» in my blogroll. I also created for it a convenient mini-banner for easy reference (and free promotion through my own blog):

tata_smaller.jpg

Pocket-size Version

tata.jpg

«Big Mac«(c) Dinner-Size Version

Reblog this post [with Zemanta]

Warning: «We have a concern about some of the content on your blog. Please contact us A.S.A.P.»

This posting is addressed to WordPress Administrators, as well as to the international blogging community. Our experience in blogging (using the Greek language) has been a wonderful learning experience, in some ways. However, in other ways, it has also been a disappointing experience, as regards the human element (e.g. quite a few other Greek bloggers). Personally, I feel disillusioned to discover that a significant number of Greek bloggers has exhibited a dangerously over-enthusiastic misuse and abuse of all rights granted to them by blog-providers: Moderation, for example, applied in excess quite frequently by certain Greek blogs, not in order to rid themselves of insults and slanderous comments, but for the malicious suppression of viewpoints they dislike, as well as people they do not like for personal reasons. For example, some Greek candidates for the 2007 elections have used moderation to silence their critics or their personal enemies, in blog-discussions misleadingly presented as open, free, fair and democratic. Συνέχεια ανάγνωσης

Βράβευσε τα… παράνομα καρτέλ ο ΣΕΒ!!! (ε, ας βραβεύσει κι η μπλογκόσφαιρα τα… αστέρια της)

Σύμφωνα με δημοσίευμα του Zalmoxis (δείτε το εδώ) ο Σύλλογος Ελλήνων Βιομηχάνων απένειμε επιχειρηματικά βραβεία σε πέντε εταιρείες που πρωταγωνίστησαν σε παρανομίες (και διώχθηκαν μάλιστα για σύσταση καρτέλ, κλπ) ενώ στο μεταξύ η Ελληνική Μπλογκόσφαιρα προβληματίζεται («σαν το χοίρο που πάει να κλάσει» έλεγε η γιαγιά μου) για τα «Ιστολογικά Όσκαρ 2007″, που πολλοί ορέχτηκαν αλλά όλοι… μισούν. Συνέχεια ανάγνωσης