I am working on an application that reads a lot of hstore values, and
have discovered that scanning it is fairly slow. I'm working on some
improvements, but first I wanted a better benchmark. This adds more
realistic data, and extends it to cover the three APIs: database/sql,
and pgconn.Rows.Scan with both text and binary protocols.
Remove unneeded string to []byte to string conversion, which saves 2
allocs and should make Hstore text scanning slightly faster.
The Hstore.Scan() function takes a string as input, converts it to
[]byte, and calls scanPlanTextAnyToHstoreScanner.Scan(). That
function converts []byte back to string and calls parseHstore. This
refactors scanPlanTextAnyToHstoreScanner.Scan into
scanPlanTextAnyToHstoreScanner.scanString so the database/sql Scan
function can call it directly, bypassing this conversion.
The added Benchmark shows this saves 2 allocs for longer strings, and
saves about 5% CPU overall on my M1 Pro. benchstat output:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new.txt │
│ sec/op │ sec/op vs base │
HstoreScan-10 1.334µ ± 2% 1.257µ ± 2% -5.77% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ B/op │ B/op vs base │
HstoreScan-10 2.094Ki ± 0% 1.969Ki ± 0% -5.97% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
HstoreScan-10 36.00 ± 0% 34.00 ± 0% -5.56% (p=0.000 n=10)
Postgres on Mac OS X has a bug in how it parses hstore text values
that causes it to misinterpret some Unicode values as spaces. This
causes values sent by pgx to be misinterpreted. To avoid this, always
quote hstore values, which is how Postgres serializes them itself.
The test change fails on Mac OS X without this fix.
While I suspect this should not be performance critical for any
application, I added a quick benchmark to test the performance of the
encoding. This change actually makes encoding slightly faster on my
M1 Pro. The output from the benchstat program on this banchmark is:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new-quotes.txt │
│ sec/op │ sec/op vs base │
HstoreSerialize/text-10 207.1n ± 0% 142.3n ± 1% -31.31% (p=0.000 n=10)
HstoreSerialize/binary-10 100.10n ± 0% 99.64n ± 1% -0.45% (p=0.013 n=10)
geomean 144.0n 119.1n -17.31%
I have also attempted to fix the Postgres bug, but it will take a
long time for this fix to get upstream:
https://www.postgresql.org/message-id/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com
Tests should timeout in a reasonable time if something is stuck. In
particular this is important when testing deadlock conditions such as
can occur with the copy protocol if both the client and the server are
blocked writing until the other side does a read.
Before this change, the Hstore text protocol did not quote keys or
values containing non-space whitespace ("\r\n\v\t"). This causes
inserts with these values to fail with errors like:
ERROR: Syntax error near "r" at position 17 (SQLSTATE XX000)
The previous version also quoted curly braces ("{}"), but they don't
seem to require quoting.
It is possible that it would be easier to just always quote the
values, which is what Postgres does when encoding its text protocol,
but this is a smaller change.
The parseHstore function did not check the return value from
p.Consume() after a ', ' sequence. It expects a doublequote '"' that
starts the next key, but would accept any character. This means it
accepted invalid input such as:
"key1"=>"b", ,key2"=>"value"
Add a unit test that covers this case
Fix a couple of the nearby error strings while looking at this.
Found by looking at staticcheck warnings:
pgtype/hstore.go:434:6: this value of end is never used (SA4006)
pgtype/hstore.go:434:6: this value of r is never used (SA4006)
The existing test registers pgtype.Hstore in the text map, then uses
the query modes that use the binary protocol. The existing test did
not use the text parsing code. Add a version of the test that uses
pgtype.Hstore as the input and output argument in all query modes,
and tests it without registering the codec.
When running queries with the hstore type registered, and with simple
mode queries, the scan implementation does not correctly distinguish
between NULL and empty. Fix the implementation and add a test to
verify this.
Table types have system / hidden columns like tableoid, cmax, xmax, etc.
These are not included when sending or receiving composite types.
https://github.com/jackc/pgx/issues/1576
When using `scany` I encountered the following case. This seems to fix it.
Looks like null `jsonb` columns cause the problem. If you create a table like below you can see that the following code fails. Is this expected?
```sql
CREATE TABLE test (
a int4 NULL,
b int4 NULL,
c jsonb NULL
);
INSERT INTO test (a, b, c) VALUES (1, null, null);
```
```go
package main
import (
"context"
"log"
"github.com/georgysavva/scany/v2/pgxscan"
"github.com/jackc/pgx/v5"
)
func main() {
var rows []map[string]interface{}
conn, _ := pgx.Connect(context.Background(), , ts.PGURL().String())
// this will fail with can't scan into dest[0]: cannot scan NULL into *interface {}
err := pgxscan.Select(context.Background(), conn, &rows, `SELECT c from test`)
// this works
// err = pgxscan.Select(context.Background(), conn, &rows, `SELECT a,b from test`)
if err != nil {
panic(err)
}
log.Printf("%+v", rows)
}
```
It should pass a FlatArray[T] to the next step instead of a
anySliceArrayReflect. By using a anySliceArrayReflect, an encode of
[]github.com/google/uuid.UUID followed by []string into a PostgreSQL
uuid[] would crash. This was caused by a EncodePlan cache collision
where the second encoding used part of the cached plan of the first.
In proper usage a cache collision shouldn't be able to occur. If this
assertion proves incorrect it will be necessary to add an optional
interface to ScanPlan and EncodePlan that marks the plan as ineligable
for caching. But I have been unable to construct a failing case, and
given that ScanPlans have been cached for quite some time now without
incident I do not think it is possible. This issue only occurred due to
the bug in *wrapSliceEncodePlan[T].Encode.
https://github.com/jackc/pgx/issues/1502
This significantly reduces memory allocations in paths that repeatedly
encode the same type of values such as CopyFrom.
https://github.com/jackc/pgx/issues/1481
This seems a bit of a hack. It fixes the problems demonstrated in my previous commit.
Maybe there's a cleaner way?
Associated: https://github.com/jackc/pgx/issues/1426
Demonstrate the problem with the tests:
...for negative decimal values e.g. -0.01
This causes errors when encoding to JSON:
"json: error calling MarshalJSON for type pgtype.Numeric"
It also causes scan failures of sql.NullFloat64:
"converting driver.Value type string ("0.-1") to a float64"
As reported here: https://github.com/jackc/pgx/issues/1426
This improves handling of unregistered types. In general, they should
"just work". But there are performance benefits gained and some edge
cases avoided by registering types. Updated documentation to mention
this.
https://github.com/jackc/pgx/issues/1445
Previously on the minimum condition the error would be:
"is greater than maximum"
Also add encoding/json import into the .erb template as the import was
missing after running rake generate.
Previously, an infinite value was returned as a string. Other types
that can be infinite such as Timestamptz return a
pgtype.InfinityModifier. This change brings them into alignment.
A `driver.Valuer()` results in a `string` that the `Codec` for the
PostgreSQL type doesn't know how to handle. That string is scanned into
whatever the default type for that `Codec` is. That new value is
encoded. If the new value is the same type as the original type than an
infinite loop occured. Check that the types are different.
https://github.com/jackc/pgx/issues/1331
This is tricky due to driver.Valuer returning any. For example, we can
plan for fmt.Stringer because it always returns a string.
Because of this driver.Valuer was always handled as the last option. But
with pgx v5 now having the ability to find underlying types like a
string and supporting fmt.Stringer it meant that driver.Valuer was
often not getting called because something else was found first.
This change tries driver.Valuer immediately after the initial PlanScan
for the Codec. So a type that directly implements a pgx interface should
be used, but driver.Valuer will be prefered before all the attempts to
handle renamed types, pointer deferencing, etc.
fixes https://github.com/jackc/pgx/issues/1319
fixes https://github.com/jackc/pgx/issues/1311
e.g. 2000-01-01 instead of 2000-1-1. PostgreSQL accepted it without
zeroes but our text decoder didn't. This caused a problem when we needed
to take a value and encode to text so something else could parse it as
if it had come from the PostgreSQL server in text format. e.g.
database/sql compatibility.
The preferred format may not be possible for certain arguments. For
example, the preferred format for numeric is binary. But if
shopspring/decimal is being used without jackc/pgx-shopspring-decimal
then it will use the database/sql/driver.Valuer interface. This will
return a string. That string should be sent in the text format.
A similar case occurs when encoding a []string into a non-text
PostgreSQL array such as uuid[].
Reverted almost all of 976b1e0.
Still may consider a way to get DecodeValue to be strongly typed but
that feature isn't worth the complications of generics. Especially in
that applying this style to ArrayCodec would make Conn.LoadType
impossible for arrays.
Added ValueRoundTripTest to pgxtest
Removed pgtype/testutil
pgtype tests now run with all (applicable) query modes. This gives
better coverage than before and revealed several bugs which are also
fixed in this commit.
Otherwise, wrapper types get cached. Wrapper types are expected to fail
most of the time. These failures should not be cached. In addition,
wrappers wrap multiple different types so it doesn't make sense to cache
results of a wrapper.
PostgreSQL doesn't define float8range out of the box though it can
easily be created by the user. However, it is still convenient to treat
a numrange as a float8range.