hoder.org

September 14, 2008

craiglist mysql schematics

Filed under: mysql — admin @ 6:49 pm

Craigslist 的数据库架构    每月超过 1000 万人使用该站服务,月浏览量超过 30 亿次,(Craigslist每月新增的帖子近 10 亿条??)网站的网页数量在以每年近百倍的速度增长。Craigslist 至今却只有 18 名员工(现在可能会多一些了)。Tim O’reilly 采访了 Craigslist 的 Eric Scheide ,于是通过这篇 Database War Stories #5: craigslist 我们能了解一下 Craigslist 的数据库架构以及数据量信息。

数据库软件使用 MySQL 。为充分发挥 MySQL 的能力,数据库都使用 64 位 Linux 服务器, 14 块 本地磁盘(72*14=1T ?), 16G 内存。

不同的服务使用不同方式的数据库集群。
论坛
1 主(master) 1 从(slave)。Slave 大多用于备份. myIsam 表. 索引达到 17G。最大的表接近 4200 万行。
分类信息
1 主 12 从。 Slave 各有个的用途. 当前数据包括索引有 114 G , 最大表有 5600 万行(该表数据会定期归档)。 使用 myIsam。分类信息量有多大? “Craigslist每月新增的帖子近 10 亿条”,这句话似乎似乎有些夸张,Eric Scheide 说昨日就超过 330000 条数据,如果这样估计的话,每个月的新帖子信息大约在 1 亿多一些。
归档数据库
1 主 1 从. 放置所有超过 3 个月的帖子。与分类信息库结构相似但是更大, 数据有 238G, 最大表有 9600 万行。大量使用 Merge 表,便于管理。
搜索数据库
4 个 集群用了 16 台服务器。活动的帖子根据 地区/种类划分,并使用 myIsam 全文索引,每个只包含一个子集数据。该索引方案目前还能撑住,未来几年恐怕就不成了。
Authdb
1 主 1 从,很小。

目前 Craigslist 在 Alexa 上的排名是 30,上面的数据只是反映采访当时(April 28, 2006)的情况,毕竟,Craigslist 数据量还在每年 200% 的速度增长。

Craigslist 采用的数据解决方案从软硬件上来看还是低成本的。优秀的 MySQL 数据库管理员对于 Web 2.0 项目是一个关键因素。

 

http://www.diybl.com/course/7_databases/mysql/myxl/2007614/52429.html

如何将Access和Excel导入到Mysql中

 

http://info.codepub.com/2008/08/info-21401.html

构建支持Master/Slave读写分离的数据库操作类

PHP+MYSQL 简单实现中文分词全文索引

http://info.codepub.com/2008/07/info-20442.html

 

http://unix-cd.com/vc/www/26/2008-07/10189.html

September 13, 2008

memcached

Filed under: Uncategorized — admin @ 12:45 pm

http://www.example.net.cn/archives/example/index.html

http://hi.baidu.com/xproduct/blog/item/850c19f44c1eb76cddc47480.html

 http://www.yesadmin.com/647/139291/index.html

mysql主从数据库同步

MySQL数据库磁盘优化

  • Squid是Linux下一个缓存Internet数据的代理服务器软件
  • 所谓死锁<DeadLock>: 是指两个或两个以上的进程在执行过程中,因争夺资源而造成的一种互相等待的现象,若无外力作用,它们都将无法推进下去.此时称系统处于死锁状态或系统产生了死锁,这些永远在互相等待的进程称为死锁进程.

    September 11, 2008

    Stop PHP nobody Spammers

    Filed under: PHP, Uncategorized, email, freebsd — admin @ 5:14 pm

    Stop PHP nobody Spammers
    http://www.webhostgear.com/232.html
    Update: May 25, 2005:
    - Added Logrotation details
    - Added Sample Log Output

    PHP and Apache has a history of not being able to track which users are sending out mail through the PHP mail function from the nobody user causing leaks in formmail scripts and malicious users to spam from your server without you knowing who or where.

    Watching your exim_mainlog doesn’t exactly help, you see th email going out but you can’t track from which user or script is sending it. This is a quick and dirty way to get around the nobody spam problem on your Linux server.

    If you check out your PHP.ini file you’ll notice that your mail program is set to: /usr/sbin/sendmail and 99.99% of PHP scripts will just use the built in mail(); function for PHP - so everything will go through /usr/sbin/sendmail =)

    Requirements:
    We assume you’re using Apache 1.3x, PHP 4.3x and Exim. This may work on other systems but we’re only tested it on a Cpanel/WHM Red Hat Enterprise system.

    Time:
    10 Minutes, Root access required.

    Step 1)
    Login to your server and su - to root.

    Step 2)
    Turn off exim while we do this so it doesn’t freak out.
    /etc/init.d/exim stop

    Article provided by WebHostGear.com

    Step 3)
    Backup your original /usr/sbin/sendmail file. On systems using Exim MTA, the sendmail file is just basically a pointer to Exim itself.
    mv /usr/sbin/sendmail /usr/sbin/sendmail.hidden 

    Step 4)
    Create the spam monitoring script for the new sendmail.
    pico /usr/sbin/sendmail

    Paste in the following:

    #!/usr/local/bin/perl

     

    # use strict;
     use Env;
     my $date = `date`;
     chomp $date;
     open (INFO, “>>/var/log/spam_log”) || die “Failed to open file ::$!”;
     my $uid = $>;
     my @info = getpwuid($uid);
     if($REMOTE_ADDR) {
             print INFO “$date - $REMOTE_ADDR ran $SCRIPT_NAME at $SERVER_NAME n”;
     }
     else {

            print INFO “$date - $PWD -  @infon”;

     }
     my $mailprog = ‘/usr/sbin/sendmail.hidden’;
     foreach  (@ARGV) {
             $arg=”$arg” . ” $_”;
     }

     open (MAIL,”|$mailprog $arg”) || die “cannot open $mailprog: $!n”;
     while (<STDIN> ) {
             print MAIL;
     }
     close (INFO);
     close (MAIL); 

    Step 5)
    Change the new sendmail permissions
    chmod +x /usr/sbin/sendmail

    Step 6)
    Create a new log file to keep a history of all mail going out of the server using web scripts
    touch /var/log/spam_logchmod 0777 /var/log/spam_log

     

     

    Step 7)
    Start Exim up again.
    /etc/init.d/exim start 

    Step 8)
    Monitor your spam_log file for spam, try using any formmail or script that uses a mail function - a message board, a contact script.
    tail - f /var/log/spam_log 

    Sample Log Output

    Mon Apr 11 07:12:21 EDT 2005 - /home/username/public_html/directory/subdirectory -  nobody x 99 99   Nobody / /sbin/nologin

    Log Rotation Details
    Your spam_log file isn’t set to be rotated so it might get to be very large quickly. Keep an eye on it and consider adding it to your logrotation.

    pico /etc/logrotate.conf

    FIND:
    # no packages own wtmp — we’ll rotate them here
    /var/log/wtmp {
        monthly
        create 0664 root utmp
        rotate 1
    }

    ADD BELOW:

    # SPAM LOG rotation
    /var/log/spam_log {
        monthly
        create 0777 root root
        rotate 1
    }

    Notes:
    You may also want to chattr + i /usr/sbin/sendmail so it doesn’t get overwritten.

    service restart

    Filed under: freebsd — admin @ 4:58 pm

    location:

    /etc/init.d/mysql start|stop

    exim

    Turn off exim while we do this so it doesn’t freak out.
    /etc/init.d/exim stop

    ?

    How about httpd?

    September 2, 2008

    /usr/local/bin/p7zip

    Filed under: Uncategorized — admin @ 6:41 pm

    /usr/local/bin/p7zip

     

    Powered by hoder.org