On Port Scanning

In a fit of boredom over the long holiday break I decided to dust off my Python skills and write a few toy programs, including a very simple port scanning application. A port scanner is a program used to probe a host for open ports. They’re often used by network admins to verify security policies, but they can also be used by bad actors to perform recon.

What is a Port?

No, these are not the physical ports on the back of your machine. In this context, think of a port as a (virtual) point on a computer where information is exchanged between multiple programs, devices, and the internet. To ensure consistency across different devices, ports are assigned port numbers. When someone runs a port scan, it’s like they’re knocking on your door to see if anybody answers. This reveals which port(s) are open and listening, and it also reveals the presence of security devices.

Port scanning can provide information such as:

  1. Services that are running
  2. Users who own services
  3. Whether anonymous logins are allowed
  4. Which network services require authentication

Port numbers range from 0 through to 65,536 and are ranked in according to popularity. Ports 0-1,023 are “well-known” ports typically reserved for internet usage (but can also have specialized purposes). These ports, which are assigned by the Internet Assigned Numbers Authority (IANA), are held by businesses and Structured Query Language (SQL) services.

Ports 1,024-49,151 are “registered ports,” and they are registered by software companies. Ports 49,152-65,536 are considered dynamic and private ports and can be used by everyone. There are also ports that, if open, indicated that the system is infected due to the port popularity with certain Trojans and viruses.

Some of the most popular and most frequently used ports include:

  • Port 20 (udp) – File Transfer Protocol (FTP) for data transfer
  • Port 22 (tcp) – Secure Shell (SSH) protocol for secure logins, ftp, and port forwarding
  • Port 23 (tcp) – Telnet protocol for unencrypted text commutations
  • Port 53 (udp) – Domain Name System (DNS) translates names of all computers on internet to IP addresses
  • Port 80 (tcp) – World Wide Web HTTP

Types of Scans

The port scanner that I wrote is a very simple ping scanner, but there are other types of scans. Here is a list:

  1. Ping scans: The simplest scanning technique. Ping scans send ICMP requests to various servers in an attempt to get a response. Pings can be blocked and disabled via firewall.
  2. Vanilla scan: Another basic technique; attempts to connect to all of the 65,536 ports at once. It sends a SYN flag (a connect request), and when it receives a SYN-ACK response (acknowledgment of connection), it responds with an ACK flag. This SYN, SYN-ACK, ACK exchange comprises a TCP handshake and can be easily detected because a full connections are always logged by firewalls.
  3. SYN scan: AKA a half-open scan, this scan sends a SYN flag to the target and waits for a SYN-ACK response. If there is a response, the scanner does not respond back (meaning the TCP connection was not completed). This means interaction is not logged but the sender learns if the port is open. This is a technique that attackers use to find vulnerabilities.
  4. XMAS and FIN scans: Christmas tree scans (XMAS scans) and FIN scans are discrete attack methods. XMAS scans get their name from the set of flags turned on within a packet which, when viewed in Wireshark, seem to blink like Christmas lights. XMAS scans sends a set of flags that, if responded to, can disclose information about the firewall and open ports. In a FIN scan, an unsolicited FIN flag (used to end an established session) will be sent to a port. The system’s response to this seemingly random flag may reveal the state of the port or information about the firewall. For example, a closed port that receives an unsolicited FIN packet will respond with a RST (an instantaneous abort) packet, but an open port will ignore it.
  5. FTP bounce scan: This type of scan allows for the sender to disguise their location by using an FTP server to bounce a packet. 
  6. Sweep scan: This is a preliminary port scanning technique in which pings are sent to the same port across several computers on a network to identify which are active. This does not share information about the port’s state, but it does inform the sender whether any systems are in use.

On Concurrency Control

I’ve been doing a fair amount of self-education into how to design and build data-intensive systems, and one of the key concepts that keeps cropping up is that of concurrency control. The topic of concurrency is super-important in the context of data management and processing, so here’s a little deep dive. 

In its simplest form, concurrent computing (as opposed to sequential computing or parallel computing) is a scheduling technique in which multiple processes are simultaneously executing during overlapping time periods. If these processes are accessing the same data, there is a possibility that one process may alter something and interfere with or disrupt the other process. In the world of transactional databases, this could mean that two transactions may access the same data at the same time and affect the consistency and correctness of the overall system. 

How, you may ask? There are a few common scenarios:

  1. The Lost Update: A transaction writes an initial value that other transactions need to in order operate correctly. However, a second transaction goes in and writes a second value on top of the first value. This means that when other concurrent transactions go in to read the first value, they read the wrong value and end up with incorrect results.
  1. The Dirty Read: A transaction writes a value but then is later aborted. In this case, the value should disappear upon abort, but without concurrency control, other transactions may come in and read the value that should have been deleted (e.g. “dirty read). 
  1. The Incorrect Summary: Suppose one transaction is calculating some value or summary data over all of the values in some data set. If a second transaction updates some of that data, this can lead to incorrect results depending on the timing of the update and whether or not the update result has already been included in the summary (or not).

As you can imagine, this is Really Bad in the context of financial systems, healthcare, and other critical environments.

Enter concurrency control. The idea is that systems must be designed with rules and methods in place to maintain the consistency of components operating concurrently and thus the consistency and correctness of the whole system. I’ll save the technical specifics of race conditions, deadlines, and resource starvation for another post. For now, just know that there are a few approaches.

Optimistic Concurrency

Optimistic concurrency is a strategy where we save the checking of whether a transaction meets the isolation and integrity rules until the end, without blocking on any of its read/write operations. It is optimistic, after all! It assumes that the rules are being met!

However, if there is a violation (i.e. the record is dirty), then the transaction is aborted and restarted. This obviously incurs additional overhead, but if there aren’t a ton of transactions being aborted, then being optimistic is usually a good strategy.

(Being optimistic in general is a good strategy but I digress…)

Pessimistic Concurrency

You can probably guess how this one works. Like the overly dramatic, angsty Daria Morgendorffer from the late-90s MTV animated sitcom Daria, pessimistic concurrency assumes (just assumes!) there will be an integrity violation. In this case, the entire operation will be blocked until the possibility of the violation disappears. This approach has much better integrity than optimistic locking but requires you to be careful with your application design to avoid deadlocks.

Semi-Optimistic Concurrency

Can’t decide? You’re in luck! With Semi-optimistic concurrency, operations are blocked only in some situations if it is deemed they might violate some of the rules. Operations are not blocked in other situations, but rules checking is still done at the end as in the optimistic scenario.

So which is best? It depends. Different concurrency models provide different performance depending on the transaction type, computing parallelism, and a host of factors. A general rule of thumb is to use the optimistic model in an environment where there is expected to be low contention for data (e.g. ingest pipelines, high-volume systems, multi-tier distributed architectures, etc.) and stick to pessimistic concurrency in situations where conflicts happen frequently.