The non-blocking IO system was designed to solve three problems:
1. Deadlock that can occur when both sides of a connection are blocked
writing because all buffers between are full.
2. The inability to use a write deadline with a TLS.Conn without killing
the connection.
3. Efficiently check if a connection has been closed before writing.
This reduces the cases where the application doesn't know if a query
that does a INSERT/UPDATE/DELETE was actually sent to the server or
not.
However, the nbconn package is extraordinarily complex, has been a
source of very tricky bugs, and has OS specific code paths. It also does
not work at all with underlying net.Conn implementations that do not
have platform specific non-blocking IO syscall support and do not
properly implement deadlines. In particular, this is the case with
golang.org/x/crypto/ssh.
I believe the deadlock problem can be solved with a combination of a
goroutine for CopyFrom like v4 used and a watchdog for regular queries
that uses time.AfterFunc.
The write deadline problem actually should be ignorable. We check for
context cancellation before sending a query and the actual Write should
be almost instant as long as the underlying connection is not blocked.
(We should only have to wait until it is accepted by the OS, not until
it is fully sent.)
Efficiently checking if a connection has been closed is probably the
hardest to solve without non-blocking reads. However, the existing code
only solves part of the problem. It can detect a closed or broken
connection the OS knows about, but it won't actually detect other types
of broken connections such as a network interruption. This is currently
implemented in CheckConn and called automatically when checking a
connection out of the pool that has been idle for over one second. I
think that changing CheckConn to a very short deadline read and changing
the pool to do an actual Ping would be an acceptable solution.
Remove nbconn and non-blocking code. This does not leave the system in
an entirely working state. In particular, CopyFrom is broken, deadlocks
can occur for extremely large queries or batches, and PgConn.CheckConn
is now a `select 1` ping. These will be resolved in subsequent commits.
The test was relying on sending so big a message that the write blocked.
However, it appears that on Windows the TCP connections over localhost
have an very large or infinite sized buffer. Change the test to simply
set the deadline to the current time before triggering the write.
The first 5 fake non-blocking reads are limited to 1 byte. This should
ensure that there is a measurement of a read where bytes are already
waiting in Go or the OS's read buffer.
The reason for a high max wait time was to ensure that reads aren't
cancelled when there is data waiting for it in Go or the OS's receive
buffer. Unfortunately, there is no way to know ahead of time how long
this should take.
This new code uses 2x the fastest successful read time as the max read
time. This allows the code to adapt to whatever host it is running on.
https://github.com/jackc/pgx/issues/1481
Previously, a batch with 10 unique parameterized statements executed
100 times would entail 11 network round trips. 1 for each prepare /
describe and 1 for executing them all. Now pipeline mode is used to
prepare / describe all statements in a single network round trip. So it
would only take 2 round trips.
While this test always worked on my machine, it flickered in CI. And to
be fair the test can't guarantee the condition it is testing. Work
around this by trying many times before admitting failure.
This eliminates an edge case that can cause a deadlock and is a
prerequisite to cheaply testing connection liveness and to recoving a
connection after a timeout.
https://github.com/jackc/pgconn/issues/27
Squashed commit of the following:
commit 0d7b0dddea1575e9fd72592665badb8cbdd581cc
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 25 13:15:05 2022 -0500
Add test for non-blocking IO preventing deadlock
commit 79d68d23d38bb03ddb8bf13cb45792430eaf959a
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 18 18:23:24 2022 -0500
Release CopyFrom buf when done
commit 95a43139c7b0b7557898c4480e5b3e42417ee3c0
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 18 18:22:32 2022 -0500
Avoid allocations with non-blocking write
commit 6b63ceee076794bc4380495a55dd414dbbd08a43
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 18 17:46:49 2022 -0500
Simplify iobufpool usage
commit 60ecdda02e5a24c894df4f58d31c485b90de5d5b
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 18 11:51:59 2022 -0500
Add true non-blocking IO
commit 7dd26a34a182d4aacaed3bf8c09f9cc48a7b6156
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 20:28:23 2022 -0500
Fix block when reading more than buffered
commit afa702213f1b6d24c976406448301b2be53b7f70
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 20:10:23 2022 -0500
More TLS support
commit 51655bf8f40321d5f89bc3c02dd55fba0ac6aa49
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 17:46:00 2022 -0500
Steps toward TLS
commit 2b80beb1ed75f0f58db8188b87753dbc26b62098
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 13:06:29 2022 -0500
Litle more TLS support
commit 765b2c6e7b034ff6ffab3974579fd6ee7add593b
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 12:29:30 2022 -0500
Add testing of TLS
commit 5b64432afbed9224f9512cc46624c88e7ebec625
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 09:48:19 2022 -0500
Introduce testVariants in prep for TLS
commit ecebd7b103d4a9125c61e83f3651b950658b0b84
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 09:32:14 2022 -0500
Handle and test read of previously buffered data
commit 09c64d8cf3ca5be1a31bef46bf78fa5cb9fae831
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 09:04:48 2022 -0500
Rename nbbconn to nbconn
commit 73398bc67a7b7bd1aa044fb9b0546f4198ef92d2
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 08:59:53 2022 -0500
Remove backup files
commit f1df39a29d23ae4e5175b92c69697f2bf9b4e112
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 08:58:05 2022 -0500
Initial passing tests
commit ea3cdab234343fc9761d9b7966c5346179cd1b01
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat Jun 4 08:38:57 2022 -0500
Fix connect timeout
commit ca22396789d120ff556f9704f4470268fbc8c0d8
Author: Jack Christensen <jack@jackchristensen.com>
Date: Thu Jun 2 19:32:55 2022 -0500
wip
commit 2e7b46d5d7454daf0859dd48f8a8e190995164c5
Author: Jack Christensen <jack@jackchristensen.com>
Date: Mon May 30 08:32:43 2022 -0500
Update comments
commit 7d04dc5caa80cb147929b6f65bab60a27baaff89
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat May 28 19:43:23 2022 -0500
Fix broken test
commit bf1edc77d70465b4097a59c08c581033d2033ac6
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat May 28 19:40:33 2022 -0500
fixed putting wrong size bufs
commit 1f7a855b2e4d1e14f85ac5f5683e2b93db0a4bd9
Author: Jack Christensen <jack@jackchristensen.com>
Date: Sat May 28 18:13:47 2022 -0500
initial not quite working non-blocking conn
Use an internal buffer in pgproto3.Frontend and pgproto3.Backend instead
of directly writing to the underlying net.Conn. This will allow tracing
messages as well as simplify pipeline mode.