socat/doc/socat-addresschain.html
2009-04-03 11:30:01 +02:00

320 lines
13 KiB
HTML

<!-- source: socat-exec.html -->
<!-- Copyright Gerhard Rieger 2009 -->
<html><head>
<title>Socat address chains</title>
<link rel="stylesheet" type="text/css" href="dest-unreach.css">
</head>
<body>
<h1>Socat address chains</h1>
<a name="introduction"/>
<h2>Introduction</h2>
<p>Socat version 2 can concatenate multiple modules and transfer data between
them bidirectionally.
<p>
<a name="example1"/>
<h2>Example 1: OpenSSL via HTTP proxy</h2>
<span class="frame"><span class="shell">
socat - "OPENSSL,verify=0 | PROXY:secure.domain.com:443 | TCP:proxy.domain.com:8080"
</span></span>
<p>This command does the following: socat connects to proxy.domain.com on port
8080 and sends a proxy CONNECT request for secure.domain.com port 443; this is
similar to the proxy address available in version 1. Once the proxy server
acknowledges successful
connection to the target (SSL) server, socat starts SSL negotiation and then
transfers data between its stdio and the SSL server.
</p>
<a name="basics"/>
<h2>Address chain basics</h2>
<p>socat version 1 was able to open two addresses and transfer data between
them. "Addresses" could be just sockets or other file descriptors, or could
be a little more complex like proxy client or OpenSSL server and client. It
was, though desirable, practically not possible to combine complex address
types, or to use other socket types than the predefined ones (usually TCP)
with complex addresses.
</p>
<p>socat version 2 has been designed to overcome these limitations. First, the
complex address types are now separated from the underlying file descriptor
types. Second, complex addresses that are now called <em>inter addresses</em>
can be concatenated to an <em>address chain</em>; however, an <em>endpoint
address</em> that just provides file descriptors must be the last component
of an address chain.
</p>
<p>The socat invocation takes two address chains, opens them, and transfers
data between them.
</p>
<p>An address chain consists of zero or more inter addresses and one endpoint
address, all separated by the pipe character '|'. When starting socat from
the command line these characters and the optional spaces must be protected
from the shell; it is recommended to put each address chain under double
quotes.
</p>
<p>The (bidirectional) inter addresses that are available with a socat
implementation can be listed with the following command:
</p>
<span class="frame"><span class="shell">
socat -h |egrep 'b ..b groups='</span></span>
<p>A full socat 2.0.0-b3 program provides the following inter addresses:
</p>
<table border=1>
<tr><th>name</th><th>description</th></tr>
<tr><td>NOP</td><td>transfers data unmodified</td></tr>
<tr><td>OPENSSL-CLIENT</td><td>performs OpenSSL client negotiation, then
encrypts/decrypts data</td></tr>
<tr><td>OPENSSL-SERVER</td><td>performs OpenSSL server negotiation, then
encrypts/decrypts data </td></tr>
<tr><td>PROXY</td><td>performs proxy CONNECT client negotiation, then
transfers data unmodified</td></tr>
<tr><td>SOCKS4</td><td>performs socks 4 client negotiation, then
transfers data unmodified</td></tr>
<tr><td>SOCKS4A</td><td>performs socks 4a client negotiation, then
transfers data unmodified</td></tr>
<tr><td>SOCKS5</td><td>performs socks 5 TCP client negotiation, then
transfers data unmodified</td></tr>
<tr><td>TEST</td><td>appends &gt; to forward, and &lt; to reversely
transferred blocks</td></tr>
<tr><td>EXEC</td><td>invokes a program
(see <a href="socat-exec.html">socat-exec.html</a>), then transfers data unmodified</td></tr>
<tr><td>SYSTEM</td><td>invokes the shell (see <a href="socat-exec.html">socat-exec.html</a>), then transfers data unmodified</td></tr>
</table>
<a name="reverse"/>
<h2>Reverse address use</h2>
<p>Inter addresses have two interfaces. In most cases one of
these can be seen as a <em>data</em> interface, where arbitrary data
traffic may occur, and the other as <em>protocol</em> interface where the
transferred data has to follow some rules like socks and HTTP protocol, or
valid encryption.
</p>
<p>Bidirectional inter addresses are usually implemented such that their data
interface is on the "left" side, and the protocol interface on the "right"
side.
</p>
<p>It may be convenient to build an address chain where one or more inter
addresses work in the reverse direction, so their protocol side is connected
to left neighbor in the chain using the protocol, and the data side is
connected to the right neighbor for raw data transfer. socat allows to use
inter addresses in <em>reverse</em> direction by preceding their keyword with
&circ;.
</p>
<a name="example2"/>
<h2>Example 2:</h2>
<p>Endpoint addresses that fork should usually build the first socat address
chain, without inter addresses. For creating an SSL to TCP gateway that
handles multiple connections the following command line does the job:
</p>
<span class="frame"><span class="shell">
socat TCP-LISTEN:443,reuseaddr,fork "^OPENSSL-SERVER,cert=server.pem | TCP:somehost:80"
</span></span>
<p>Without the reverse usage of the SSL server address, socat would "speak"
clear text with the clients that connected to its left address, and SSL to
somehost.
</p>
<a name="unidirectional"/>
<h2>Unidirectional data transfer</h2>
<p>Like in socat version 1, it is possible to specify unidirectional transfers
with version 2. Use socat options <a href="socat.html#OPTION_u">-u</a> or
<a href="socat.html#OPTION_U">-U</a>.
</p>
<p>Unidirectional transfer must be supported by the involved inter addresses;
e.g., SSL requires a bidirectional channel for negotiation of encryption
parameters etc.
</p>
<p>It is possible to mix uni- and bidirectional transfers within one address
chain: Think of a simple file transfer over SSL.
</p>
<p>The socat help function can tell us which address types support which kinds
of transfer:</p>
<span class="frame"><span class="shell">
socat -h |egrep 'openssl-server'</span></span>
<p>gives the following output:
</p>
<p><pre> openssl-server rwb b groups=CHILD,RETRY,OPENSSL
openssl-server:&lt;port&gt; rwb groups=FD,SOCKET,LISTEN,CHILD,RETRY,RANGE,IP4,IP6,TCP,OPENSSL</pre>
</p>
<p>The <tt>rwb &nbsp; b</tt> flags mean that this address type can handle readonly,
writeonly, and bidirectional transfers on its left (data) side, but only
bidirectional on its right (protocol) side.
</p>
<p>The second line describes the (version 1) endpoint form: no right side
traffic kinds are specified because this address type establishes its protocol
communication itself.
</p>
<a name="dual"/>
<h2>Dual inter addresses</h2>
<p>In socat version 1 it was already possible to combine two unidirectional
addresses to one bidrectional address. This idea has been extended in version
2: Two unidirectional inter addresses can be combined to one bidirectional
transfer unit.
</p>
<p><em>Note: in version 1, the dual specification was like
</em><tt>righttoleft!!lefttoright</tt><em>. In version 2, it is:
</em><tt>lefttoright%righttoleft</tt><em>. This is the only major incompatibility
between versions 1 and 2.</em>
</p>
<p>With the few already available inter address types, this feature has no
practical use except with <a href="socat-exec.html">exec and system</a> type
addresses. However, the general function shall be described using the
hypothetical inter address types <tt>gzip</tt> and <tt>gunzip</tt>.
</p>
<p>Let us design these inter address types: <tt>gzip</tt> is a module that
reads arbitrary data on its left ("data") side, compresses it, and writes the
compressed data to its right (protocol side) neighbor.
<!-- Data that arrives onits right side is uncompressed and passed to the
left neighbor. -->
</p>
<p><tt>gunzip</tt> reads gzip compressed data on its left side and writes the
raw uncompressed data on its right side.
</p>
<p>socat can combine these to provide a bidirectional compress/decompress
function:<br>
<tt>gzip%gunzip</tt>
</p>
<p>Data coming from the left is passed through gzip and sent to the right;
data coming from the right is passed through gunzip and sent to the left.
</p>
<p>When the reverse functionality is desired this arrangement does the job:<br>
<tt>gunzip%gzip</tt>
</p>
<a name="fork"/>
<h2>fork</h2>
<p>socat provides the <tt>fork</tt> address option for uses like network
servers where multiple clients can connect and are handled in parallel in
different socat sub processes.
</p>
<p>When the sub processes should work independently (share no socat file
descriptors) the fork option must be applied to the last component of the
first address chain. For better readability it is advisable to have only the
"left" endpoint address in the left chain and put all intermediate addresses
into the right chain.
</p>
<a name="understanding"/>
<h2>Understanding chain implementation</h2>
<p>The idea of concatenated modules in socat is not new. But a few attempts to
completely rewrite and enhance the socat transfer engine
were never completed. At last, it was decided to choose an approach that
requires only moderate changes to socats transfer engine and the existing
address types.
</p>
<p>Think of several socat1 like processes somehow combined - with an abstract
operator || :
</p>
<span class="frame"><span class="shell">
socat - openssl || socat - proxy:secure.domain.com || socat - tcp:proxy.domain.com:8080
</span></span>
<p>The solution was to put all these into one process but have each socat engine
run in its own thread. The transfer between the engines goes over socket
pairs, so the engines see file descriptors as usual. The main work then was
to implement the functionality for opening address chains which includes
parsing, creating socket pairs and threads, combining the addresses, taking
care of unidirectional, dual, and reverse addresses etc.
</p>
<p>Here is the socat version 2 command line of example 1:<br>
<tt>socat - "OPENSSL,verify=0 | PROXY:secure.domain.com:443 | TCP:proxy.domain.com:8080"</tt>
<p>A schematic representation of how this is realized in socat:<br>
<tt>STDIO - engine[thread 0] - OPENSSL - socket pair - (FD) - engine[thread 1]
- PROXY - socket pair - (FD) - engine[thread 2] - TCP</tt>
</p>
<p>where FD means a trivial address similar to the FD (file descriptor) address
type.
</p>
<p>For debugging address chains it proved useful to write down two lines and to note the actual file descriptor numbers:</p>
<pre> STDIO ^ OPENSSL | ^ PROXY | ^ TCP
0,1 ^ 6 | 7 ^ 4 | 5 ^ 3</pre>
<p>The symbol <b>&circ;</b> means a socat transfer engine.
</p>
<p>Now the implementation of the reverse address feature should be easier to
understand. While a forward address is put to the right side of its
engine, a reverse address is just put to the left side. Example 2 can be
explained so:
</p>
<p>Example 2 command line:<br>
<tt>socat TCP-LISTEN:443,reuseaddr,fork "^OPENSSL-SERVER,cert=server.pem |
TCP:somehost:80"</tt>
</p>
<p>Schematic representation:<br>
<tt>TCP-LISTEN - engine[thread 0] - (FD) - socket pair - OPENSSL-SERVER -
engine[thread 1] - TCP</tt>
</p>
<p>Debug schema:<br>
<pre>
TCP-L ^ | SSL-SERV ^ TCP
3 ^ 5 | 6 ^ 4</pre>
<a name="commtypes"/>
<h2>Communication types</h2>
<p>For communication between the address modules of consecutive transfer
engines socat provides pairs (or quadruples) of file descriptors. You may
think about these as two normal UNIX pipes (fifos), one for left-to-right and
the other for right-to-left data transfer.
</p>
<p>There are a few requirements that these file descriptors should fulfill,
however they are different depending on the libraries used by the inter
address modules (e.g. libopenssl) or by external programs that are involved
(see <a href="socat-exec.html">socat-exec.html</a>).
</p>
<p>The factors to consider for these file dscriptors are:
</p>
<ul>
<li>Half close: when a module terminates communication on its write channel,
its read channel should still stay open.</li>
<li>Half close method: A module might half close a connection
using <tt>close()</tt> or <tt>shutdown()</tt> methods.</li>
<li>Buffering: The output buffering behaviour of some modules can be
influenced by the type of file descriptor</li>
<li>INET: Some external programs require a TCP/IPv4 file descriptor</li>
</ul>
<p>This table lists the available communication types and their
properties:</p>
<table border=1>
<tr><th>comm.type</th><th>half close with close()</th><th>allows shutdown</th><th>avoids buffering</th><th>TCP/IPv4</th></tr>
<tr><td>socketpairs</td><td>OK</td><td>OK</td><td>no</td><td>no</td></tr>
<tr><td>socketpair</td><td>no</td><td>OK</td><td>no</td><td>no</td></tr>
<tr><td>pipes</td><td>OK</td><td>no</td><td>no</td><td>no</td></tr>
<tr><td>ptys</td><td>OK</td><td>no</td><td>yes</td><td>no</td></tr>
<tr><td>tcp</td><td>no</td><td>yes</td><td>no</td><td>yes</td></tr>
</table>
<p>The default is socketpairs.
</p>
<p>The overall communication type can be chosen using the <a href="socat.html#option_c"><tt>-c</tt></a> socat
option. With socat 2.0.0-b3 it is not possible to use different communication
types in one process (exception: right side of exec/system modules)
</p>
<small>Copyright: Gerhard Rieger 2009</small><br>
<small>License: <a href="http://www.fsf.org/licensing/licenses/fdl.html">GNU Free Documentation License (FDL)</a></small>
</p>
</body>
</html>