recently in some thoughts about search engine ranking, there are some immature ideas, to share out the discussion.

in the available data, I think that once Google PR value is the most close to reflect the weights of the page data, although a year has stopped updating. Love stand, CHINAZ and three party website of the Shanghai love weight is guided past traffic from Shanghai love the results as the standard to a certain extent, although the weight based on the reaction of domain name, but the results of the way that the problem is not too accurate, to mention >

page ranking, two part is particularly important, a page in the search engine is the weight fraction of inside, there is a important part of speech segmentation appear in the position of the page (title, the most effective area, I think the start) this two part basically determines the fraction in the position, wherein the weight fraction of the page proportion will be more (prefer to think that is the biggest, and other factors are much larger than the other part of speech word retrieval), such as adjectives, prepositions such words complete match appears will get a small part of the bonus.

search engine (here to search for flour design reference) in page crawling, content extraction, segmentation based inverted index, the segmentation of the search keywords, and extract the word as a noun, string and other important word to word index intersection set, which will get a includes the basic correlation page file collection, later will enter the page ranking part.

page weight fraction is also related to the time domain (site factors here I think is not just because the link exists time factors), in Shanghai love algorithm, the influence of weight the weight of the domain name and domain name on the page will be much larger than Google.

when considering the ranking algorithm of search engine, basically can determine most of the page is no chance on display in the search engine, this is because of the weight of possession of high priority in the search engine algorithm.

The weight fraction of the

completed these steps, will enter the last anti spam module, get rid of those more than a certain degree of cheating in the sandbox and integral page page, and then get the final ranking results.


page will be significantly different in different search engines tend to overall, are closely related with the link, link correlation relationship into effective link (website content, export link anchor text, the target site is highly correlated) and high trust links (from the web site, gov, edu etc. the pr value, the two link website) can refer to a specific Hilltop algorithm and Trustrank algorithm, the Hilltop algorithm idea is very interesting, in a patent version of the algorithm described in the page on the first set of search results again link correlation calculation, to further accurately in the highest range of words weight page retrieval.

