⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 30.txt

📁 This complete matlab for neural network
💻 TXT
字号:
发信人: GzLi (笑梨), 信区: DataMining
标  题: [合集]请教一个web mining底问题(思路,方法……)
发信站: 南京大学小百合站 (Sat May 10 18:20:39 2003)

cczhu (congcongzhu) 于Wed May  7 05:37:45 2003)
提到:

问题是这样的:给出一个站点,在Internet上搜索这个站点的镜像。

有什么思路或者算法呢?

我想这可能属于web/data mining的问题。不知道有没有人做过类似的工作。

如何入手呢?谢谢!

NAOMIELIE (雁来红) 于Wed May  7 14:27:06 2003)
提到:

I am not sure what's the motivation of the research.


Here are something related, maybe,

1. Crawler:

I think Hector Garcia-Molina with his phd student Cho (now in UCLA)

have working on this topic for a long time. How to find a fresh

site to crawle (crawling for stable site is just waste of resource)


2. Distributed Web cache

This research is to answer the question about how to find something

already exists that can be used with low cost.


Maybe something about replication in federal database is also related,

but I know nothing about it.


You may need to read papers on WWW in recent years,

I feel that it is not a new problem.

Anyway, you need to give the application scenario first...


【 在 cczhu 的大作中提到: 】

: 问题是这样的:给出一个站点,在Internet上搜索这个站点的镜像。

: 有什么思路或者算法呢?

: 我想这可能属于web/data mining的问题。不知道有没有人做过类似的工作。

: 如何入手呢?谢谢!



cczhu (congcongzhu) 于Wed May  7 14:32:46 2003)
提到:

谢谢!

好想有人以前对我提过一个类似distributed web cache的方法,

只是觉得Internet太大了,不知道效果怎样。


【 在 NAOMIELIE 的大作中提到: 】

: I am not sure what's the motivation of the research.

: Here are something related, maybe,

: 1. Crawler:

: 2. Distributed Web cache

NAOMIELIE (雁来红) 于Wed May  7 14:45:50 2003)
提到:

Two existing p2p-based web cache systems:

Squirrel (PODC'2002) by Rice Univ. and Microsoft

BuddyWeb (IEEE Networking P2P workshop) by Fudan and NUS


For survey about web caching:

Greg Barish and Katia Obraczka:

World Wide Web Caching: Trends and Techniques.

IEEE Communication Magzine Internet Technology Series, May 2000


Jia Wang:

A Survey of Web Caching Schemes for the Internet

ACM Computer Communication Review, 29(5), pp.36-56. 1999


I think the design of squirrel is for Internet scale,

though currently it only works in LAN environments.


【 在 cczhu 的大作中提到: 】

: 谢谢!

: 好想有人以前对我提过一个类似distributed web cache的方法,

: 只是觉得Internet太大了,不知道效果怎样。

: 

: 【 在 NAOMIELIE 的大作中提到: 】



⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -