Some useful programs → Explore with me!

Table of Contents

SpamCheck
Processing WebServer Log Files

SpamCheck

A number of services monitor spammers, and inform clients whether a host attempting to connect to them is a known spammer or not. These real-time blackhole lists need to respond to queries extremely quickly, and process a very high load. Thousands, maybe millions, of hosts query them repeatedly to find out whether an IP address attempting a connection is or is not a known spammer.

The nature of the problem requires that the response be fast, and ideally it should be cacheable. Furthermore, the load should be distributed across many servers, ideally ones located around the world. Although this could conceivably be done using a web server, SOAP, UDP, a custom protocol, or some other mechanism, this service is in fact cleverly implemented using DNS and DNS alone.

To find out if a certain IP address is a known spammer, reverse the bytes of the address, add the domain of the blackhole service, and look it up. If the address is found, it’s a spammer. If it isn’t, it’s not. For instance, if you want to ask sbl.spamhaus.org if 207.87.34.17 is a spammer, you would look up the hostname 17.34.87.207.sbl.spam‐haus.org. (Note that despite the numeric component, this is a hostname ASCII string, not a dotted quad IP address.)

If the DNS query succeeds (and, more specifically, if it returns the address 127.0.0.2), then the host is known to be a spammer. If the lookup fails—that is, it throws an
UnknownHostException—it isn’t. Example code below implements this check.

package Chapter2;

import java.net.InetAddress;
import java.net.UnknownHostException;

public class SpamCheck {
    public static final String BlackHole = "spamhaus.org/sbl";

    public static void main(String[] args) {
        String[] spamList = { "207.34.56.23", "125.12.32.4", "130.130.130.130" };
        for (String spam : spamList) {
            if (isSpam(spam)) {
                System.out.println("Spam found! " + spam);
            } else {
                System.out.println("Spam not found!");
            }
        }

    }

    public static boolean isSpam(String url) {
        try {
            InetAddress address = InetAddress.getByName(url);
            byte[] ip = address.getAddress();
            String query = BlackHole;
            for (byte octet : ip) {
                int unsignedByte = octet < 0 ? octet + 256 : octet;
                query = unsignedByte + "." + query;
            }
            InetAddress.getByName(query);
            return true;
        } catch (UnknownHostException e) {
            return false;
        }
    }

}

source code found here : https://github.com/chandan-g-bhagat/network-programming/blob/main/Chapter2/SpamCheck.java

If you use this technique, be careful to stay on top of changes to blackhole list policies and addresses. For obvious reasons, blackhole servers are frequent targets of DDOS and other attacks, so you want to be careful that if the blackhole server changes its address or simply stops responding to any queries, you don’t begin blocking all traffic.

Further note that different blackhole lists can follow slightly different protocols. For example, a few lists return 127.0.0.1 for spamming IPs instead of 127.0.0.2.

Processing WebServer Log Files

Web server logs track the hosts that access a website. By default, the log reports the IP addresses of the sites that connect to the server. However, you can often get more information from the names of those sites than from their IP addresses. Most web servers have an option to store hostnames instead of IP addresses, but this can hurt performance because the server needs to make a DNS request for each hit. It is much more efficient to log the IP addresses and convert them to hostnames at a later time, when the server isn’t busy or even on another machine completely. Example 4-10 is a program called Weblog that reads a web server logfile and prints each line with IP addresses converted to hostnames. Most web servers have standardized on the common logfile format. A typical line in the common logfile format looks like this:

205.160.186.76 unknown – [17/Jun/2013:22:53:58 -0500] “GET /bgs/greenbg.gif HTTP 1.0” 200 50

This line indicates that a web browser at IP address 205.160.186.76 requested the file /bgs/greenbg.gif from this web server at 11:53 P.M (and 58 seconds) on June 17, 2013. The file was found (response code 200) and 50 bytes of data were successfully transferred to the browser.

The first field is the IP address or, if DNS resolution is turned on, the hostname from which the connection was made. This is followed by a space. Therefore, for our purposes, parsing the logfile is easy: everything before the first space is the IP address, and everything after it does not need to be changed.

The dotted quad format IP address is converted into a hostname using the usual methods of java.net.InetAddress.

package Chapter2;

import java.io.*;
import java.net.*;

public class Weblog {

    public static void main(String[] args) {
        try (
                FileInputStream fin = new FileInputStream(args[0]);
                Reader in = new InputStreamReader(fin);
                BufferedReader bin = new BufferedReader(in);) {
            for (String entry = bin.readLine(); entry != null; entry = bin.readLine()) {
                // separate out the IP address
                int index = entry.indexOf(' ');
                String ip = entry.substring(0, index);
                String theRest = entry.substring(index);
                // Ask DNS for the hostname and print it out
                try {
                    InetAddress address = InetAddress.getByName(ip);
                    System.out.println(address.getHostName() + theRest);
                } catch (UnknownHostException ex) {
                    System.err.println(entry);
                }
            }
        } catch (IOException ex) {
            System.out.println("Exception: " + ex);
        }
    }
}

Source code : https://github.com/chandan-g-bhagat/network-programming/blob/main/Chapter2/Weblog.java

The name of the file to be processed is passed to Weblog as the first argument on the command line. A FileInputStream fin is opened from this file and an InputStream
Reader is chained to fin. This InputStreamReader is buffered by chaining it to an instance of the BufferedReader class. The file is processed line by line in a for loop.

Each pass through the loop places one line in the String variable entry. entry is then split into two substrings: ip, which contains everything before the first space, and
theRest, which is everything from the first space to the end of the string. The position of the first space is determined by entry.indexOf(” “). The substring ip is converted
to an InetAddress object using getByName(). getHostName() then looks up the hostname. Finally, the hostname and everything else on the line (theRest) are printed on
System.out. Output can be sent to a new file through the standard means for redirecting output.

Weblog is more efficient than you might expect. Most web browsers generate multiple logfile entries per page served, because there’s an entry in the log not just for the page itself but for each graphic on the page. And many visitors request multiple pages while visiting a site. DNS lookups are expensive and it simply doesn’t make sense to look up each site every time it appears in the logfile. The InetAddress class caches requested addresses. If the same address is requested again, it can be retrieved from the cache much more quickly than from DNS.

Nonetheless, this program could certainly be faster. In my initial tests, it took more than a second per log entry. (Exact numbers depend on the speed of your network connection, the speed of the local and remote DNS servers, and network congestion when the program is run.) The program spends a huge amount of time sitting and waiting for DNS requests to return. Of course, this is exactly the problem multithreading is designed to solve. One main thread can read the logfile and pass off individual entries to other threads for processing.

A thread pool is absolutely necessary here. Over the space of a few days, even lowvolume web servers can generate a logfile with hundreds of thousands of lines. Trying to process such a logfile by spawning a new thread for each entry would rapidly bring even the strongest virtual machine to its knees, especially because the main thread can read logfile entries much faster than individual threads can resolve domain names and die. Consequently, reusing threads is essential. The number of threads is stored in a tunable parameter, numberOfThreads, so that it can be adjusted to fit the VM and network stack. (Launching too many simultaneous DNS requests can also cause problems.)

This program is now divided into two classes. The first class, LookupTask, shown in Example below, is a Callable that parses a logfile entry, looks up a single address, and
replaces that address with the corresponding hostname. This doesn’t seem like a lot of work and CPU-wise, it isn’t. However, because it involves a network connection, and possibly a hierarchical series of network connections between many different DNS servers, it has a lot of downtime that can be put to better use by other threads.

package Chapter2;

import java.net.*;
import java.util.concurrent.Callable;

public class LookupTask implements Callable<String> {
    private String line;

    public LookupTask(String line) {
        this.line = line;
    }

    @Override
    public String call() {
        try {
            // separate out the IP address
            int index = line.indexOf(' ');
            String address = line.substring(0, index);
            String theRest = line.substring(index);
            String hostname = InetAddress.getByName(address).getHostName();
            return hostname + " " + theRest;
        } catch (Exception ex) {
            return line;
        }
    }
}

The second class, PooledWeblog, shown in Example 4-12, contains the main() method that reads the file and creates one LookupTask per line. Each task is submitted to an executor that can run multiple (though not all) tasks in parallel and in sequence.

The Future that is returned from the submit() method is stored in a queue, along with the original line (in case something goes wrong in the asynchronous thread). A loop reads values out of the queue and prints them. This maintains the original order of the logfile.

package Chapter2;

import java.io.*;
import java.util.*;
import java.util.concurrent.*;

// Requires Java 7 for try-with-resources and multi-catch
public class PooledWeblog {
 private final static int NUM_THREADS = 4;
 public static void main(String[] args) throws IOException {
     ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
     Queue<LogEntry> results = new LinkedList<LogEntry>();
     try (BufferedReader in = new BufferedReader(
             new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));) {
         for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
             LookupTask task = new LookupTask(entry);
             Future<String> future = executor.submit(task);
             LogEntry result = new LogEntry(entry, future);
             results.add(result);
         }
     }
     // Start printing the results. This blocks each time a result isn't ready.
     for (LogEntry result : results) {
         try {
             System.out.println(result.future.get());
         } catch (InterruptedException | ExecutionException ex) {
             System.out.println(result.original);
         }
     }
     executor.shutdown();
 }

 private static class LogEntry {
     String original;
     Future<String> future;

     LogEntry(String original, Future<String> future) {
         this.original = original;
         this.future = future;
     }
 }
}

Network Programming

Some useful programs

SpamCheck

Processing WebServer Log Files

You may like

Leave a Reply Cancel reply

How to whitelist website on AdBlocker?

Network Programming

SpamCheck

Processing WebServer Log Files

You may like

How can we help?

Leave a Reply Cancel reply

How to whitelist website on AdBlocker?