Structure mining is a type of data mining in which a semi-structured data source is scanned, and elements of its structure are discovered and highlighted. A semi-structured data source is one that does not use the traditional database structure of tables, but does have a semantic element that separates information via tags and markers. Structure mining can be used to mine databases, websites and many other forms of computer information to discover elements of the structure. It helps users either understand how pieces interact with one another or how to find information under certain tags. This mining also can be used to predict what an item is, based on rules written by the user.
There are many different types of data mining, and most are concerned with mining a traditionally structured source. This includes any source that uses the tables and nodes typical of most databases. In structure mining, only semi-structured data are used. In this instance, the data are from websites or simple databases that have a structure but not one that conforms to traditional database rules. The data need tags or markers that set each item apart to be properly mined.
By reading the semi-structured data set, structure mining is able to discover how the structure interacts. For example, each website has a navigational model, and it is this model that determines how the pages interact. By mining the structure, the user can discover how this navigation works, which can help in creating a similar navigating schema.
Structure mining also can be used to find items by writing rules into the mining program. For example, if there is a book data set, the user can write a rule that any books without an index should return as fiction, and those with an index should return as non-fiction. Most fiction books lack an index, so this rule will predict with high accuracy what the data are. This assists users when looking at a semi-structured set that has an organizational method but not one that fits what the user is looking for.
After figuring out the structure of the semi-structured unit, the user will typically compare it to another semi-structured unit. If the user has a business website, he or she can mine another business website for navigation and links, and see how his or her website is similar. By comparing the mined information, the user may find ways to increase the structure’s efficiency.