The Höhere Graphische Bundes-Lehr- und Versuchsanstalt (HGBLuVA) ("Higher Federal Institution for Graphic Education and Research"), now commonly known as "die Graphische", founded in 1888 in Vienna, is a vocational college for professions in visual communication and media technology in Austria. == History == === Opening === Originally set up as a photographic research institute by the President of the Photographic Society, the graphic teaching and research institute (GLV) was created through the incorporation of the photographic school (a department for photographic reproduction processes connected to the Salzburg State Building School) and the Hörwarter general drawing school in Vienna. Since its foundation, it has made an important contribution to the establishment and development of the graphic professions. According to a resolution of March 14, 1887, the City Council of Vienna made three floors of the municipal building in Vienna VII, Westbahnstraße 25, available to the former Schottenfelder Realschule for the establishment of a teaching and research institute for photography and reproduction processes. The k. k. Lehr- und Versuchsanstalt für Photographie und Reproductionsverfahren, founded and directed (1888–1923) by Josef Maria Eder, previously of the Technologische Gewerbemuseum (Museum of Applied Technology), for which he established a Section for Photography and Reproduction Techniques, and the Vienna State Trade School where, recently qualified as a university lecturer, he began teaching chemistry and physics in 1881. It opened on March 1, 1888 with 108 students. In the next school year the number of students rose to 174. In 1890, Eder placed a Wothly solar camera (an early means of enlarging negatives) on the roof. In the context of the history of vocational schools and the applied arts, pioneering educational reforms in Austria from the 1870s created institutions like it outside the format of the classical university, it being a special variation on the “state trade school” (“Staats-Gewerbeschule”). Eder based his institution on earlier foreign models such as the Conservatoire des arts et métiers in Paris (founded 1794), that housed a museum of history and technology and hosted with evening lectures and demonstrations, with lectures in photography commencing in 1891. From 1897 onwards the name Graphische Lehr- und Versuchsanstalt came into being . In 1906, Emperor Franz Joseph granted the school the designation “Imperial and Royal” in the title, and the Republic of Austria confirmed this distinction when the school's Federal Chancellery approved the use of the national coat of arms. === The beginnings === The GLV was instituted on August 27, 1887 "by the highest resolution to approve the activation of this teaching and research institute in Vienna on March 1, 1888". The aim of the institute was the “training of specialist photographers, retouchers, collotype printers, photolithographers, etc., the instruction of artists, scholars and technicians who want to learn photography as an auxiliary science, furthermore the testing of equipment, chemicals and the implementation of independent scientific investigations in the areas of Photochemistry and Related Subjects”. The school consisted of two departments; the Institute for Photography and Reproduction Processes and the Research Institute, and in 1891 the Board of Book Printers and Type Founders pointed out the urgent need to add a department for book printers to the school. In 1897 an additional section for the book and illustration trade was opened, the school called "KK Graphische Lehr- und Versuchsanstalt" was then divided into four sections: Section I: Institute for Photography and Reproduction (corresponds to the former Institute for Photography and Reproduction Processes) Section II: College for the book and illustration trade Section III: Research institute for photochemistry and graphic printing processes (corresponds to the original research institute) Section IV: Collections: graphic collection, library and equipment collection The first original lithographs by famous artists such as Luigi Kasimir and Tina Blau are thanks to the special course for lithography and lithography introduced in 1905 and 'algraphy' - a planographic printing process from an aluminum plate instead of the stone used in lithography - was first taught in Austria in 1896 at the GLV. The specialty course for lithography and lithography existed until 1913/14, after which a specialist course for xylography (wood engraving and woodcuts) was offered. In 1908 the graphic arts department was set up on the top floor of the neighbouring house at Westbahnstraße 27 connected by a spiral staircase still in existence in the courtyard at the current location on Leyserstraße. === Women in the graphic teaching and research institute === From 1908 women were also officially admitted. For the period from 1888 to 1918/19, a total of 718 female students at the Graphische are recorded in the largely preserved class lists. Due to changes and new requirements in the job description, the proportion of women continued to grow, so that in some classes it exceeded two thirds. === The Graphics Department === In 1916, the school statute was changed: all-day lessons with photography internship in the 1st and 2nd years as well as training for disabled people were introduced and a drawing school was added. After the First World War, the school was renamed several times: In 1919 the name was "Deutsch-Österreichische Graphische Lehr- und Versuchsanstalt"; changed in 1920 to "Staatliche Graphische Lehr- und Versuchsanstalt" and in 1923 to "Graphic Education and Research Institute". === The school in the time of National Socialism === The "annexation of Austria by Germany" resulted in organisational restructuring: semesters were introduced and the GLV was made a subordinate level of a university of the graphic arts administered in Leipzig. In 1939 the school became a state graphic teaching and research institute . Up to this point, two thirds of all Austrian postage stamps had been designed and engraved in the Graphische. === Post-war period === In 1945 the period of study at the technical school was extended to four years. In 1948, “manual graphics” became “commercial graphics” followed by an honours year. In 1959, a department A was developed: a three-class specialist department for photography with a master class, and a department B: a specialist department for commercial graphics with four classes and an honours year. Through further school reforms, the university entrance qualification was acquired with the completion of the now five-year course and honours qualification. In 1967, due to a lack of space, the Westbahnstrasse was moved to the new Carl Appel building in Leyserstrasse. === The new building, 1963 === On May 22, 1963, the foundation stone of the new campus was laid in the 14th district in the Breitenseer Strasse, Leyserstrasse and Spallartgasse area (Kommandogebäude Theodor Körner). In 1967 the move to the new building began and in 1968 the official opening coincided with the 80th anniversary of the school. In 1963/64 the first year of the five-year high school for reprography and printing technology began. There was also a four-year technical school. With the advent of personal computers and their use in the graphics industry, change comes first in typesetting and later in image processing, and in 1984 the advent of desktop publishing brought a revolution that permanently challenged the distinction between photographer, typesetter, layout artist and printer. In 1988, the Graphische celebrated its 100th anniversary. The rapid development of technology shaped school events in the 1980s, as did the rapid advance of offset printing - albeit at the expense of Letterpress printing. In reproduction technology, scanner technology for the production of colour separations displaced reprography. === Renovation, 2006 === Due to renovation work on the building in Leyserstraße, the management and the photography, multimedia and graphics departments moved to an alternative location in Vienna's first district at Schellinggasse 13. After the work was completed, the school was relocated in February 2008. == Notable teachers and students ==
Inauthentic text
An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with Spam blogs. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text. Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press or Flarf poetry. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low. With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky coined the phrase "Colorless green ideas sleep furiously" giving an example of grammatically correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning. The first group to use the expression in this regard can be found below from Indiana University. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.
Hardware trojan
A hardware trojan (HT) is a malicious modification of the circuitry of an integrated circuit. A hardware trojan is completely characterized by its physical representation and its behavior. The payload of an HT is the entire activity that the Trojan executes when it is triggered. In general, trojans try to bypass or disable the security fence of a system: for example, leaking confidential information by radio emission. HTs also could disable, damage or destroy the entire chip or components of it. Hardware trojans may be introduced as hidden front-doors that are inserted while designing a computer chip, by using a pre-made application-specific integrated circuit (ASIC) semiconductor intellectual property core (IP core) that have been purchased from a non-reputable source, or inserted internally by a rogue employee, either acting on their own, or on behalf of rogue special interest groups, or state sponsored spying and espionage. One paper published by IEEE in 2015 explains how a hardware design containing a trojan could leak a cryptographic key leaked over an antenna or network connection, provided that the correct "easter egg" trigger is applied to activate the data leak. In high security governmental IT departments, hardware trojans are a well known problem when buying hardware such as: a KVM switch, keyboards, mice, network cards, or other network equipment. This is especially the case when purchasing such equipment from non-reputable sources that could have placed hardware trojans to leak keyboard passwords, or provide remote unauthorized entry. == Background == In a diverse global economy, outsourcing of production tasks is a common way to lower a product's cost. Embedded hardware devices are not always produced by the firms that design and/or sell them, nor in the same country where they will be used. Outsourced manufacturing can raise doubt about the evidence for the integrity of the manufactured product (i.e., one's certainty that the end-product has no design modifications compared to its original design). Anyone with access to the manufacturing process could, in theory, introduce some change to the final product. For complex products, small changes with large effects can be difficult to detect. The threat of a serious, malicious, design alteration can be especially relevant to government agencies. Resolving doubt about hardware integrity is one way to reduce technology vulnerabilities in the military, finance, energy and political sectors of an economy. Since fabrication of integrated circuits in untrustworthy factories is common, advanced detection techniques have emerged to discover when an adversary has hidden additional components in, or otherwise sabotaged, the circuit's function. == Characterization of hardware trojans == An HT can be characterized by several methods such as by its physical representation, activation phase and its action phase. Alternative methods characterize the HT by trigger, payload and stealth. === Physical characteristics === One of this physical trojan characteristics is the type. The type of a trojan can be either functional or parametric. A trojan is functional if the adversary adds or deletes any transistors or gates to the original chip design. The other kind of trojan, the parametric trojan, modifies the original circuitry, e.g. thinning of wires, weakening of flip-flops or transistors, subjecting the chip to radiation, or using focused ion-beams (FIB) to reduce the reliability of a chip. The size of a trojan is its physical extension or the number of components it is made of. Because a trojan can consist of many components, the designer can distribute the parts of a malicious logic on the chip. The additional logic can occupy the chip wherever it is needed to modify, add, or remove a function. Malicious components can be scattered, called loose distribution, or consist of only few components, called tight distribution, so the area is small where the malicious logic occupies the layout of the chip. In some cases, high-effort adversaries in may regenerate the layout so that the placement of the components of the IC is altered. In rare cases the chip dimension is altered. These changes are structural alterations. === Activation characteristics === The typical trojan is condition-based: It is triggered by sensors, internal logic states, a particular input pattern or an internal counter value. Condition-based trojans are detectable with power traces to some degree when inactive. That is due to the leakage currents generated by the trigger or counter circuit activating the trojan. Hardware trojans can be triggered in different ways. A trojan can be internally activated, which means it monitors one or more signals inside the IC. The malicious circuitry could wait for a count down logic an attacker added to the chip, so that the trojan awakes after a specific time-span. The opposite is externally activated. There can be malicious logic inside a chip, that uses an antenna or other sensors the adversary can reach from outside the chip. For example, a trojan could be inside the control system of a cruising missile. The owner of the missile does not know, that the enemy will be able to switch off the rockets by radio. A trojan which is always-on can be a reduced wire. A chip that is modified in this way produces errors or fails every time the wire is used intensely. Always-on circuits are hard to detect with power trace. In this context combinational trojans and sequential trojans are distinguished. A combinational trojan monitors internal signals until a specific condition happens. A sequential trojan is also an internally activated condition-based circuit, but it monitors the internal signals and searches for sequences not for a specific state or condition like the combinational trojans do. ==== Cryptographic key extraction ==== Extraction of secret keys by means of a hardware trojan without detecting the trojan requires that the trojan uses a random signal or some cryptographic implementation itself. To avoid storing a cryptographic key in the trojan itself and reduction, a physical unclonable function can be used. Physical unclonable functions are small in size and can have an identical layout while the cryptographic properties are different. === Action characteristics === A HT could modify the chip's function or could change the chip's parametric properties (e.g. provokes a process delay). Confidential information can also be transmitted to the adversary (transmission of key information). === Peripheral device hardware trojans === A relatively new threat vector to networks and network endpoints is a HT appearing as a physical peripheral device that is designed to interact with the network endpoint using the approved peripheral device's communication protocol. For example, a USB keyboard that hides all malicious processing cycles from the target network endpoint to which it is attached by communicating with the target network endpoint using unintended USB channels. Once sensitive data is ex-filtrated from the target network endpoint to the HT, the HT can process the data and decide what to do with the data: store the data to memory for later physical retrieval of the HT or possibly ex-filtrate the data to the internet using wireless or using the compromised network endpoint as a pivot. == Potential of threat == A common trojan is passive most of the time-span an altered device is in use. If a trojan is activated the device functionality can be changed, the device can be destroyed or disabled, the device can leak confidential information or the HT may tear down the security and safety of the device. Trojans are stealthy, to avoid detection of the trojan the precondition for activation is a very rare event. Traditional testing techniques are not sufficient. A manufacturing fault happens at a random position while malicious changes are well placed to avoid detection. == Detection == === Physical inspection === First, the molding coat is cut to reveal the circuitry. Then, the engineer repeatedly scans the surface while grinding the layers of the chip. There are several operations to scan the circuitry. Typical visual inspection methods are: scanning optical microscopy (SOM), scanning electron microscopy (SEM), pico-second imaging circuit analysis (PICA), voltage contrast imaging (VCI), light induced voltage alteration (LIVA) or charge induced voltage alteration (CIVA). To compare the floor plan of the chip has to be compared with the image of the actual chip. This is still quite challenging to do. To detect Trojan hardware which include (crypto) keys which are different, an image diff can be taken to reveal the different structure on the chip. The only known hardware Trojan using unique crypto keys but having the same structure is. This property enhances the undetectability of the trojan. === Functional testing === This detection method stimulates the input ports of a chip and monitors the output
AMiner (database)
AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data. == Overview == AMiner (ArnetMiner) is designed to search and perform data mining operations against academic publications on the Internet, using social network analysis to identify connections between researchers, conferences, and publications. This allows it to provide services such as expert finding, geographic search, trend analysis, reviewer recommendation, association search, course search, academic performance evaluation, and topic modeling. AMiner was created as a research project in social influence analysis, social network ranking, and social network extraction. A number of peer-reviewed papers have been published arising from the development of the system. It has been in operation for more than three years, and has indexed 130,000,000 researchers and more than 265 million publications. The research was funded by the Chinese National High-tech R&D Program and the National Science Foundation of China. AMiner is commonly used in academia to identify relationships between and draw statistical correlations about research and researchers. It has attracted more than 10 million independent IP accesses from 220 countries and regions. The product has been used in Elsevier's SciVerse platform, and academic conferences such as SIGKDD, ICDM, PKDD, WSDM. == Operation == AMiner automatically extracts the researcher profile from the web. It collects and identifies the relevant pages, then uses a unified approach to extract data from the identified documents. It also extracts publications from online digital libraries using heuristic rules. It integrates the extracted researchers’ profiles and the extracted publications. It employs the researcher name as the identifier. A probabilistic framework has been proposed to deal with the name ambiguity problem in the integration. The integrated data is stored into a researcher network knowledge base (RNKB). The principal other product in the area are Google Scholar, Elsevier's Scirus, and the open source project CiteSeer. == History == It was initiated and created by professor Jie Tang from Tsinghua University, China. It was first launched in March 2006. The following provide a list of updates in the past years: March 2006, Version 0.1, Functions include researcher profiling, expert search, conference search, and publication search. The system was developed in Perl; August 2006, Version 1.0, The system was re-implemented in Java; July 2007, Version 2.0, New functions include researcher interest mining, association search, survey paper finding (unavailable now); April 2008, Version 3.0, New functions include query understanding, new GUI, and search log analysis; November 2008, Version 4.0, New functions include graph search, topic modeling, NSF/NSFC funding information extraction; April 2009, Version 5.0, New functions include Profile edition, open API service, Bole search, course search (unavailable now); December 2009, Version 6.0, New functions include academic performance evaluation, user feedback, conference analysis; May 2010, Version 7.0, New functions include name disambiguation, paper-reviewer recommendation, ArnetPage creation; March 2012, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform. June 2014, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform. December 2015, a completely new version got online. May 2017, professional version got online. April 2018, New functions include Trend Analysis, a deep learning based Name Disambiguation == Resources == AMiner published several datasets for academic research purpose, including Open Academic Graph, DBLP+citation (a data set augmenting citations into the DBLP data from Digital Bibliography & Library Project), Name Disambiguation, Social Tie Analysis. For more available datasets and source codes for research, please refer to.
Blend4Web
Blend4Web is a free and open source framework for creating and displaying interactive 3D computer graphics in web browsers. == Overview == The Blend4Web framework leverages Blender to edit 3D scenes. Content rendering relies on WebGL, Web Audio, WebVR, and other web standards, without the use of plug-ins. It is dual-licensed. The framework is distributed under the free and open source GPLv3 and, a non-free license - with the source code being hosted on GitHub. A 3D scene can be prepared in Blender and then exported as a pair of JSON and binary files to load in a web application. It can also be exported as a single, self-contained HTML file, in which exported data, the web player GUI, and the engine itself are packed. The HTML option is considered to be the simplest way. The resulting file, which has a minimum size of 1 MB, can be embedded in a web page using a standard iframe HTML element. Blend4Web-powered web applications can be deployed on social networking websites such as Facebook. The Blend4Web toolchain consists of JavaScript libraries, the Blender add-on, and a set of tools for tweaking 3D scene parameters, debugging, and optimization. Developed by Moscow-based company Triumph in 2010, Blend4Web was publicly released on March 28, 2014. At the end of 2017, the project founders Yuri and Alex Kovelenov quit Triumph to start the development of a new WebGL framework Verge3D. In October 2019, an "Absolutely new Blend4Web" was announced, planned to make developing 3D apps easier and to add a new marketplace where people can offer their 3D models. == Features == The framework has a number of components typically found in game engines, including a positional audio system, physics engine (a fork of Bullet ported to JavaScript), animation system, and an abstraction layer for game logic programming. Up to 8 different types of animations can be assigned to a single object, including skeletal and per-vertex animation. The speed and the direction of animation (forward/backward play), as well as particle system parameters (size, initial velocity, and count), can be changed through the API. Among other supported features are: scene data dynamic loading and unloading, subsurface scattering simulation, and image-based lighting. Some out-of-box options exist for rendering extended outdoor environments, including foliage-wind interaction, water, atmosphere, and sunlight simulation. One example demonstrating these effects is "The Farm" tech demo, which also features multiple animated NPCs and the ability to walk, interact with objects and drive a vehicle in first-person mode. Being based on the cross-browser WebGL API, Blend4Web runs in the majority of web browsers, including mobile ones. There are some caveats for browsers with experimental WebGL support, such as Internet Explorer. There are also applications developed to run on Tizen-powered devices such as the Samsung Gear S2 smartwatch. Other features include: draw call batching, hidden surface determination, threaded physics simulation and ocean simulation. In version 14.09, Blend4Web introduced the possibility of adding interactivity to 3D scenes using a visual programming tool. The tool is reminiscent of the BGE's logic editor as it uses logic blocks that are placed inside Blender. It plays back animation tracks authored by an artist when the user interacts with predefined 3D objects. Since version 15.03, Blend4Web has supported attaching HTML elements (such as information windows) to 3D objects ("annotations") and copying objects in run time ("instancing"). The following post-processing effects are supported: glow, bloom, depth of field, crepuscular rays, motion blur, and screen space ambient occlusion. == Virtual reality and augmented reality == Virtual reality devices have been supported since the end of 2015. Specifically, Oculus Rift head-mounted display works over experimental WebVR API. The software also now includes preliminary support for gamepads, based on the Gamepad API. In 2017, the option to author augmented reality content was added. The system is based on the open-source tracking library ARToolKit and uses the WebRTC protocols. Starting from version 17.08, finger tracking is supported through the Leap Motion device. == Blender integration == The Blender add-on is written in Python and C and can be compiled for the Linux x86/x64, OS X x64, and MS Windows x86/x64 platforms. A Blend4Web-specific profile can be activated in the add-on settings. When switching to this profile, the Blender interface changes so that it only reveals settings relevant to Blend4Web. Blend4Web supports a set of Blender-specific features such as the node material editor (a tool for visual shader programming) and the particle system. There is basic support for Blender's non-linear animation (NLA) editor for creating simple scenarios. Blend4Web is based on Blender's real-time GLSL rendering engine, which users are recommended to use in order to enable WYSIWYG editing. == Notable uses == NASA developed an interactive web application called Experience Curiosity to celebrate the 3rd anniversary of the Curiosity rover landing on Mars. This Blend4Web-based app makes it possible to operate the rover, control its cameras and the robotic arm, and reproduce some of the prominent events of the Mars Science Laboratory mission. The application got presented at the beginning of the WebGL section at SIGGRAPH 2015. Experience Curiosity was ported to Verge3D for Blender in 2018 with several performance improvements and bug fixes. A General Motors authorized dealer in the United Arab Emirates has placed a functional Chevrolet Camaro 3D configurator on its website. Greenpeace created interactive 3D infographics to back Greenpeace's Detox campaign in Russia. Tallink featured an interactive 3D presentation of its MS Megastar vessel to allow visitors to browse details of the ship.
Information extraction
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction. Recent advances in NLP techniques have allowed for significantly improved performance compared to previous years. An example is the extraction from newswire reports of corporate mergers, such as denoted by the formal relation: MergerBetween ( c o m p a n y 1 , c o m p a n y 2 , d a t e ) {\displaystyle \operatorname {MergerBetween} (\mathrm {company} _{1},\mathrm {company} _{2},\mathrm {date} )} , from an online news sentence such as: "Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template. == History == Information extraction dates back to the late 1970s in the early days of NLP. An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders. Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: MUC-1 (1987), MUC-3 (1989): Naval operations messages. MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. MUC-5 (1993): Joint ventures and microelectronics domain. MUC-6 (1995): News articles on management changes. MUC-7 (1998): Satellite launch reports. Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism. == Present significance == The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents and advocates that more of the content be made available as a web of data. Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. == Tasks and subtasks == Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include: Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama) Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". Relationship extraction: identification of relations between entities, such as: PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as: Table extraction: finding and extracting tables from documents. Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences Language and vocabulary analysis Terminology extraction: finding the relevant terms for a given corpus Audio extraction Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasingly interesting topic in research, and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources. == World Wide Web applications == IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people
KKday
KKday is an online travel e-commerce platform focused on connecting independent travelers with authentic, curated local experiences, tours, activities, and attraction tickets. == History == KKday was founded in 2014 in Taipei, Taiwan, by CEO Ming Chen, who previously started and led both Star Travel and Ezfly to IPO. In March of 2016, the company raised US$4.5 million in a Series A round led by AppWorks Ventures with participation by 91Capital. The raise allowed KKday to open offices and expand into Hong Kong, Japan, South Korea and Singapore by 2016. By the end of 2016, KKday offered over 6,000 travel experiences across 53 countries and 174 cities, marking early international expansion with its official launch in Singapore in October 2016, accompanied by promotional campaigns to attract regional users. Expansion into Malaysia, Thailand, Vietnam and the Philippines continued throughout 2017 and into 2018, with the company opening offices in Indonesia and mainland China. KKday rapidly expanded its inventory, reaching over 10,000 experiences in more than 500 cities across 80 countries by 2018, with key markets in Taiwan, Hong Kong, and South Korea. In February 2018, KKday raised $10.5 million in a funding round led by Japanese travel giant H.I.S., allowing integration with larger travel networks and further global growth. Forbes reports that by the end of 2018, the company operated in 11 countries and regions, employed around 400 staff, and recorded over 4 million weekly website views with more than 1 million app downloads. A combination of a Japanese and South Korean trade dispute, along with the Covid-19 pandemic in 2020, lead KKday to pivot quickly toward domestic staycations and local experiences while initially raising $70m in their Series C which, was later extended to $95m. The Series C funds were partially used to accelerate and expand Rezio. Launched in 2019, Rezio is KKday's B2B SaaS booking management platform for travel providers, allowing them to track inventory, manage reservations and sell tickets. FineDayClub was launched in 2020 by KKday as a personalized luxury subscription travel service to cater to high end clients. KKday’s CFO, Jenny Tsai pivoted to lead KKday’s new venture. KKday was able to successfully navigate and adapt to travel patterns during the Covid-19 pandemic by reducing user acquisition costs by two thirds and focusing on domestic travel experiences to drive bookings and revenue. KKday was particularly successful in Vietnam, with bookings increased by 2,000% through 2022 and the company's travel operator platform Rezio, onboarding over 1,200 operators inside the country. In 2021, KKday acquired Activity Japan, a domestic focused travel company, founded by Kimiharu Obuchi in 2014. The successful acquisition, a key factor in KKday’s rapid expansion in the Japanese market, was facilitated by H.I.S., a common early investor in both platforms. In 2023 KKday inked a partnership with Rail Europe to create an all-in-one platform for 150 rail lines over 33 European countries with the intent of increasing ridership across Europe. In late 2024, KKday completed its Series D at $70M, bringing the total amount of capital raised to over $250M. The funds are to be earmarked for continued global expansion, artificial intelligence integration and enhanced partnerships, similar to the partnership with Tablelog, which now allows users to book restaurant reservations at 42,000 restaurants in Japan through the platform. == Platform == KKDay is an e-commerce online travel agency operating in 92 countries with over 350,000 travel experiences available for booking. The company started with focus on authentic local travel experiences in the Asian Pacific market and has expanded to a more global focus. KKday connects travelers with travel services and experiences such as attraction tickets, theme parks, cultural experiences, and seasonal events. KKday has positioned itself as an all-in-one travel super app with booking for hotels, rental cars, flights, sim cards, rail passes, dining and tickets. === Rezio === Rezio is a cloud-based SaaS booking management platform developed by KKday specifically for tour operators, activity providers, and attractions in the travel industry. It serves as an all-in-one system designed to help these businesses digitize their operations, particularly those previously relying on offline processes. Features include a mobile app for on-the-go order management, customer information checks, and voucher scanning, as well as channel management, analytics for customer data, and integrations with multiple OTAs and payment providers. Unlike KKday, which is an OTA marketplace for consumer exposure (with commissions), Rezio focuses on backend operations for suppliers, allowing brand independence, operational efficiency, and direct customer relationships while optionally connecting to OTAs like KKday. Rezio supports over 5,000 merchants, 30,000 experiences, and 10 million travelers worldwide, with a strong presence in Asia. One of the brands successful implementations was at the Nikko Toshogu Shrine where Rezio was implemented to help with long lines and wait times due to over-tourism. The shrine was able to implement the inventory management features to allow online booking and cashless payments onsite. === FineDayClub === FineDayClub is a membership-based travel concierge service launched in late 2020 by KKday. It is aimed at families, and organizations seeking customized travel experiences. It offers one-on-one advisory services. === ActivityJapan === ActivityJapan is a Japanese comprehensive online travel site that specializes in authentic Japanese travel experiences. It was purchased by KKday in 2021 but continues to operate independently.