MY PROPOSAL WILL IMPROVE BING*The application model that orders the search engine results according to categories.*Ph.D. Boris Pevzner *****@***.com*Phone ***-***-***-****(‘Google’ is used as an example only)**. It is known that the quality of a search according to key words is significantly affected by the algorithms of ordering output websites. As rule, an user takes account (opens links) only the sites located on first page of output search result. Those systems that use categorization allow the user to navigate the possible choices more accurately. But modern search engines have no categories. The main function of the application is to assign category codes to output sites (Flws). The algorithm is based on the principles of learning machine and data mining. Our application is autonomous independent object. It is connected with Google (or other search engine) through two standard points-enter (key words) and output (results-links) The developers of the application will not work with Google codes.* It should note, that the categories are not search parameters and they are used to make easier the selection of the required Web site. The technology is based on the algorithm may be used for marketing , for filtering of data from textual news flow and for the evaluation of Internet user’s interests.**. Creating learning set of each category.*We selected some generic categories (computer, medicine, agriculture, technology, linguistics, science, education and so on). *By means of google searches we got the set of websites of each category ( two first pages- ** websites). Then text zone of a website is extracted by ‘BeautifulSoap’( BeautifulSoup is a Python library for pulling data out of HTML and XML files ) and then repeated words and stop words are removed from this ‘clear’ website (Flws)* After that all ‘clear’ sites are merged into unit file Flunit. It is the file * corresponding to one of category. Flunit is transformed into a category vector Vcat,* It is the learning set of the category. * **. Algorithm of categorization *The algorithm is based on the principles of learning machine and data mining.* *.*. Main objects are the category vectors (Vcat ) and the site vectors to which a * code will be assigned (Vws ) . * Vcat consists of all different words of Flunit. The dimension of Vcat is the number of * different words of Flunit. It is the number of coordinates. Dimensionality of the * coordinate is frequency of a word into Flunit. A website of output result (Flws) is * transformed into a website vector Vws.* Vws consists of all different words of Flunit (Vcat). To each word of Vws is assigned * its site frequency, rest words (relatively Vcat ) have frequency * *.* The dimension of Vcat is equal the dimension of Vws. If a word wasn’t in a website* *.*. Our software calculates cosine of an angle * between Vcat., in one hand, and * Vws, in other hand, for all categories and all output web sites. * Cosine between two angles is calculated in this way* ,* where xi,yi,zi are vector coordinates. * We have the following sequence of pairs for each category Vcat. *-*V*ws …cos ***-*V*ws … cos ***-* **-* **-*V**ws … cos **** *.*. The software orders these pairs for reducing cos *.* *.*. If cos * is within *-*.*, then this web site is assigned the code of category. * For checking of cosine as a proximity measure we made a small test. We had Vcat . We * googled according to query ‘data processing’ and got first two pages of output * result. We selected the Website (website*) of first page and the Website of second * (website*). Our program created two vectors Vwb* and Vwb*. Cosine(A*) between Vcat * and Vwb* **.*****, cosine(A*)**.**. The program is assigned to Vwb* at the code of * category ‘computer’ and it doesn’t do the same to Vwb** Number*Words of category*vector*Frequency in Vcat*Frequency in Vws*Frequency in Vws***Category vector*Website* vector* Website* vector* ** W** ** ** ** ** W* * ** ** ** ** W** ** ** ** ** W** ** ** ** ** W* * ** ** ** ** W** ** ** ** ** W** ** ** ** *** ** ** ** cosine ****.****** *.*** *.*. The result of output search can have the following form on the screen of * display: * Categories number of output sites, belonged to the category* computer N** medicine N** agriculture N** *.*. User selects need category (ies) and open links corresponded the * selected category. * *. Conclusion*-*Developed technology makes ease a user contact with Google.*The user can more targeted to open the need website and*can access output far sites ( from second page) quite easily. *-*The technology may be used for marketing, for filtering of data from textual news flow and for the evaluation of Internet user interests. *-*The technology may be used in library catalogues. *-*Main requirement of technological software is high speed of data processing.
GetHuman-mtplbor did not yet indicate what Bing should do to make this right.