Commit Graph

19 Commits (63422c7d6cfe092af402f48e16729acd1e3bae1c)

Author SHA1 Message Date
Evan Jones dc94db6b3d pgtype.Hstore: add a round-trip test for binary and text codecs
This ensures the output of Encode can pass through Scan and produce
the same input. This found two two minor problems with the text
codec. These are not bugs: These situations do not happen when using
pgx with Postgres. However, I think it is worth fixing to ensure the
code is internally consistent.

The problems with the text codec are:

* It did not correctly distinguish between nil and empty. This is not
  a problem with Postgres, since NULL values are marked separately,
  but the binary codec distinguishes between them, so it seems like
  the text codec should as well.
* It did not output spaces between keys. Postgres produces output in
  this format, and the parser now only strictly parses the Postgres
  format. This is not a bug, but seems like a good idea.
2023-06-29 17:25:47 -05:00
Evan Jones bc8b1ca320 remove the single backing string optimization
This is a bit slower than using this optimization, but ensures this
version does not change garbage collection behavior. This does still
using a single []string for all the *string value pointers because
that is what text parsing already does. This makes the two behave
similarly.

benchstat results of master versus this version:

                               │  orig.txt   │   new-binary-no-share-string.txt    │
                               │   sec/op    │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10   82.11µ ± 1%   81.71µ ± 2%        ~ (p=0.280 n=10)
HstoreScan/text-10               83.30µ ± 1%   82.45µ ± 1%   -1.02% (p=0.000 n=10)
HstoreScan/binary-10             15.99µ ± 2%   10.12µ ± 1%  -36.67% (p=0.000 n=10)
geomean                          47.82µ        40.86µ       -14.56%

                               │   orig.txt   │   new-binary-no-share-string.txt    │
                               │     B/op     │     B/op      vs base               │
HstoreScan/databasesql.Scan-10   56.23Ki ± 0%   56.23Ki ± 0%       ~ (p=0.128 n=10)
HstoreScan/text-10               65.12Ki ± 0%   65.12Ki ± 0%       ~ (p=0.541 n=10)
HstoreScan/binary-10             21.09Ki ± 0%   19.87Ki ± 0%  -5.75% (p=0.000 n=10)
geomean                          42.58Ki        41.75Ki       -1.95%

                               │  orig.txt  │    new-binary-no-share-string.txt    │
                               │ allocs/op  │ allocs/op   vs base                  │
HstoreScan/databasesql.Scan-10   744.0 ± 0%   744.0 ± 0%        ~ (p=1.000 n=10) ¹
HstoreScan/text-10               743.0 ± 0%   743.0 ± 0%        ~ (p=1.000 n=10) ¹
HstoreScan/binary-10             464.0 ± 0%   316.0 ± 0%  -31.90% (p=0.000 n=10)
geomean                          635.4        559.0       -12.02%
¹ all samples are equal

benchstat results of the version with one string and this version:

                               │ new-binary-share-everything.txt │    new-binary-no-share-string.txt    │
                               │             sec/op              │    sec/op     vs base                │
HstoreScan/databasesql.Scan-10                       81.80µ ± 1%    81.71µ ± 2%        ~ (p=1.000 n=10)
HstoreScan/text-10                                   82.77µ ± 1%    82.45µ ± 1%        ~ (p=0.063 n=10)
HstoreScan/binary-10                                 7.330µ ± 2%   10.124µ ± 1%  +38.13% (p=0.000 n=10)
geomean                                              36.75µ         40.86µ       +11.18%

                               │ new-binary-share-everything.txt │   new-binary-no-share-string.txt    │
                               │              B/op               │     B/op      vs base               │
HstoreScan/databasesql.Scan-10                      56.23Ki ± 0%   56.23Ki ± 0%       ~ (p=0.232 n=10)
HstoreScan/text-10                                  65.12Ki ± 0%   65.12Ki ± 0%       ~ (p=0.218 n=10)
HstoreScan/binary-10                                20.73Ki ± 0%   19.87Ki ± 0%  -4.11% (p=0.000 n=10)
geomean                                             42.34Ki        41.75Ki       -1.39%

                               │ new-binary-share-everything.txt │     new-binary-no-share-string.txt     │
                               │            allocs/op            │  allocs/op   vs base                   │
HstoreScan/databasesql.Scan-10                        744.0 ± 0%    744.0 ± 0%         ~ (p=1.000 n=10) ¹
HstoreScan/text-10                                    743.0 ± 0%    743.0 ± 0%         ~ (p=1.000 n=10) ¹
HstoreScan/binary-10                                  41.00 ± 0%   316.00 ± 0%  +670.73% (p=0.000 n=10)
geomean                                               283.0         559.0        +97.53%
¹ all samples are equal
2023-06-16 15:31:37 -05:00
Evan Jones 2de94187f5 hstore: Make binary parsing 2X faster
* use []string for value string pointers: one allocation instead of
  one per value.
* use one string for all key/value pairs, instead of one for each.

After this change, one Hstore will share two allocations: one string
and one []string. The disadvantage is that it cannot be deallocated
until all key/value pairs are unused. This means if an application
takes a single key or value from the Hstore and holds on to it, its
memory footprint will increase. I would guess this is an unlikely
problem, but it is possible.

The benchstat results from my M1 Max are below.

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
                               │   orig.txt   │               new.txt               │
                               │    sec/op    │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10    82.11µ ± 1%   82.66µ ± 2%        ~ (p=0.436 n=10)
HstoreScan/text-10                83.30µ ± 1%   84.24µ ± 3%        ~ (p=0.165 n=10)
HstoreScan/binary-10             15.987µ ± 2%   7.459µ ± 6%  -53.35% (p=0.000 n=10)
geomean                           47.82µ        37.31µ       -21.98%

                               │   orig.txt   │               new.txt               │
                               │     B/op     │     B/op      vs base               │
HstoreScan/databasesql.Scan-10   56.23Ki ± 0%   56.23Ki ± 0%       ~ (p=0.324 n=10)
HstoreScan/text-10               65.12Ki ± 0%   65.12Ki ± 0%       ~ (p=0.675 n=10)
HstoreScan/binary-10             21.09Ki ± 0%   20.73Ki ± 0%  -1.70% (p=0.000 n=10)
geomean                          42.58Ki        42.34Ki       -0.57%

                               │  orig.txt   │               new.txt                │
                               │  allocs/op  │ allocs/op   vs base                  │
HstoreScan/databasesql.Scan-10    744.0 ± 0%   744.0 ± 0%        ~ (p=1.000 n=10) ¹
HstoreScan/text-10                743.0 ± 0%   743.0 ± 0%        ~ (p=1.000 n=10) ¹
HstoreScan/binary-10             464.00 ± 0%   41.00 ± 0%  -91.16% (p=0.000 n=10)
geomean                           635.4        283.0       -55.46%
¹ all samples are equal
2023-06-16 15:31:37 -05:00
Evan Jones 07670dddca do not share the original input string
This allows the original input string to be garbage collected, so it
should not change the memory footprint. This is a slower than the
version that shares a string, but only a small amount. It is still
faster than binary parsing (until that is optimized).

benchstat difference of original versus this version:

                               │  orig.txt   │     new-do-not-share-string.txt     │
                               │   sec/op    │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10   82.11µ ± 1%   14.24µ ± 2%  -82.66% (p=0.000 n=10)
HstoreScan/text-10               83.30µ ± 1%   14.97µ ± 1%  -82.03% (p=0.000 n=10)
HstoreScan/binary-10             15.99µ ± 2%   15.80µ ± 0%   -1.16% (p=0.024 n=10)
geomean                          47.82µ        14.99µ       -68.66%

                               │   orig.txt   │     new-do-not-share-string.txt      │
                               │     B/op     │     B/op      vs base                │
HstoreScan/databasesql.Scan-10   56.23Ki ± 0%   20.11Ki ± 0%  -64.24% (p=0.000 n=10)
HstoreScan/text-10               65.12Ki ± 0%   29.00Ki ± 0%  -55.47% (p=0.000 n=10)
HstoreScan/binary-10             21.09Ki ± 0%   21.09Ki ± 0%        ~ (p=0.722 n=10)
geomean                          42.58Ki        23.08Ki       -45.80%

                               │  orig.txt  │     new-do-not-share-string.txt      │
                               │ allocs/op  │ allocs/op   vs base                  │
HstoreScan/databasesql.Scan-10   744.0 ± 0%   340.0 ± 0%  -54.30% (p=0.000 n=10)
HstoreScan/text-10               743.0 ± 0%   339.0 ± 0%  -54.37% (p=0.000 n=10)
HstoreScan/binary-10             464.0 ± 0%   464.0 ± 0%        ~ (p=1.000 n=10) ¹
geomean                          635.4        376.8       -40.70%
¹ all samples are equal

benchstat difference of the shared string versus not:

                               │ new-share-string.txt │     new-do-not-share-string.txt     │
                               │        sec/op        │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10            10.57µ ± 2%   14.24µ ± 2%  +34.69% (p=0.000 n=10)
HstoreScan/text-10                        11.60µ ± 2%   14.97µ ± 1%  +29.03% (p=0.000 n=10)
HstoreScan/binary-10                      15.87µ ± 2%   15.80µ ± 0%        ~ (p=0.280 n=10)
geomean                                   12.48µ        14.99µ       +20.07%

                               │ new-share-string.txt │     new-do-not-share-string.txt      │
                               │         B/op         │     B/op      vs base                │
HstoreScan/databasesql.Scan-10           11.68Ki ± 0%   20.11Ki ± 0%  +72.17% (p=0.000 n=10)
HstoreScan/text-10                       20.58Ki ± 0%   29.00Ki ± 0%  +40.93% (p=0.000 n=10)
HstoreScan/binary-10                     21.08Ki ± 0%   21.09Ki ± 0%        ~ (p=0.427 n=10)
geomean                                  17.17Ki        23.08Ki       +34.39%

                               │ new-share-string.txt │      new-do-not-share-string.txt       │
                               │      allocs/op       │  allocs/op   vs base                   │
HstoreScan/databasesql.Scan-10             44.00 ± 0%   340.00 ± 0%  +672.73% (p=0.000 n=10)
HstoreScan/text-10                         44.00 ± 0%   339.00 ± 0%  +670.45% (p=0.000 n=10)
HstoreScan/binary-10                       464.0 ± 0%    464.0 ± 0%         ~ (p=1.000 n=10) ¹
geomean                                    96.49         376.8       +290.47%
2023-06-16 15:30:54 -05:00
Evan Jones d48d36dc02 pgtype/hstore: Make text parsing about 6X faster
I am working on an application that uses hstore types, and we found
that returning the values is slow, particularly when using the text
protocol, such as when using database/sql. This improves parsing to
be about 6X faster (currently faster than binary). The changes are:

* referencing the original string instead of copying into new strings
  (very large win)
* using string.IndexByte to scan double quoted strings: it has
  architecture-specific assembly implementations, and most of the
  time is spent in key/value strings.
* estimating the number of key/value pairs to allocate the correct
  size of the slice and map up front. This reduces the number of
  allocations and bytes allocated by a factor of 2, and was a small
  CPU win.
* parsing directly into the Hstore, rather than copying into it.

This parser is stricter than the old one. It only accepts hstore
strings serialized by Postgres. The old one was already stricter
than Postgres's own parser, but previously accepted any whitespace
character after a comma. This one only accepts space. Example:

  "k1"=>"v1",\t"k2"=>"v2"

Postgres only ever uses ", " as the separator. See hstore_out:
https://github.com/postgres/postgres/blob/master/contrib/hstore/hstore_io.c

The result of using benchstat to compare the benchmark on my M1 Pro
with the following command line in below. The new text parser is now
faster than the binary parser. I will improve the binary parser in a
separate change.

for i in $(seq 10); do go test ./pgtype -run=none -bench=BenchmarkHstoreScan -benchtime=1s >> new.txt; done

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
                               │  orig.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10   82.11µ ± 1%   10.51µ ± 0%  -87.20% (p=0.000 n=10)
HstoreScan/text-10               83.30µ ± 1%   11.49µ ± 1%  -86.20% (p=0.000 n=10)
HstoreScan/binary-10             15.99µ ± 2%   15.77µ ± 1%   -1.35% (p=0.007 n=10)
geomean                          47.82µ        12.40µ       -74.08%

                               │   orig.txt   │               new.txt                │
                               │     B/op     │     B/op      vs base                │
HstoreScan/databasesql.Scan-10   56.23Ki ± 0%   11.68Ki ± 0%  -79.23% (p=0.000 n=10)
HstoreScan/text-10               65.12Ki ± 0%   20.58Ki ± 0%  -68.40% (p=0.000 n=10)
HstoreScan/binary-10             21.09Ki ± 0%   21.09Ki ± 0%        ~ (p=0.378 n=10)
geomean                          42.58Ki        17.18Ki       -59.66%

                               │  orig.txt   │               new.txt                │
                               │  allocs/op  │ allocs/op   vs base                  │
HstoreScan/databasesql.Scan-10   744.00 ± 0%   44.00 ± 0%  -94.09% (p=0.000 n=10)
HstoreScan/text-10               743.00 ± 0%   44.00 ± 0%  -94.08% (p=0.000 n=10)
HstoreScan/binary-10              464.0 ± 0%   464.0 ± 0%        ~ (p=1.000 n=10) ¹
geomean                           635.4        96.49       -84.81%
¹ all samples are equal
2023-06-16 15:30:54 -05:00
Evan Jones 1b68b5970e pgtype/hstore: Save 2 allocs in database/sql Scan implementation
Remove unneeded string to []byte to string conversion, which saves 2
allocs and should make Hstore text scanning slightly faster.

The Hstore.Scan() function takes a string as input, converts it to
[]byte, and calls scanPlanTextAnyToHstoreScanner.Scan(). That
function converts []byte back to string and calls parseHstore. This
refactors scanPlanTextAnyToHstoreScanner.Scan into
scanPlanTextAnyToHstoreScanner.scanString so the database/sql Scan
function can call it directly, bypassing this conversion.

The added Benchmark shows this saves 2 allocs for longer strings, and
saves about 5% CPU overall on my M1 Pro. benchstat output:

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
              │  orig.txt   │              new.txt               │
              │   sec/op    │   sec/op     vs base               │
HstoreScan-10   1.334µ ± 2%   1.257µ ± 2%  -5.77% (p=0.000 n=10)

              │   orig.txt   │               new.txt               │
              │     B/op     │     B/op      vs base               │
HstoreScan-10   2.094Ki ± 0%   1.969Ki ± 0%  -5.97% (p=0.000 n=10)

              │  orig.txt  │              new.txt              │
              │ allocs/op  │ allocs/op   vs base               │
HstoreScan-10   36.00 ± 0%   34.00 ± 0%  -5.56% (p=0.000 n=10)
2023-06-07 15:35:22 -05:00
Evan Jones ee04d4a74d pgtype/hstore: Avoid Postgres Mac OS X parsing bug
Postgres on Mac OS X has a bug in how it parses hstore text values
that causes it to misinterpret some Unicode values as spaces. This
causes values sent by pgx to be misinterpreted. To avoid this, always
quote hstore values, which is how Postgres serializes them itself.
The test change fails on Mac OS X without this fix.

While I suspect this should not be performance critical for any
application, I added a quick benchmark to test the performance of the
encoding. This change actually makes encoding slightly faster on my
M1 Pro. The output from the benchstat program on this banchmark is:

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
                          │   orig.txt   │           new-quotes.txt            │
                          │    sec/op    │   sec/op     vs base                │
HstoreSerialize/text-10      207.1n ± 0%   142.3n ± 1%  -31.31% (p=0.000 n=10)
HstoreSerialize/binary-10   100.10n ± 0%   99.64n ± 1%   -0.45% (p=0.013 n=10)
geomean                      144.0n        119.1n       -17.31%

I have also attempted to fix the Postgres bug, but it will take a
long time for this fix to get upstream:

https://www.postgresql.org/message-id/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com
2023-06-07 15:29:25 -05:00
Evan Jones eab316e200 pgtype.Hstore: Fix quoting of whitespace; Add test
Before this change, the Hstore text protocol did not quote keys or
values containing non-space whitespace ("\r\n\v\t"). This causes
inserts with these values to fail with errors like:

    ERROR: Syntax error near "r" at position 17 (SQLSTATE XX000)

The previous version also quoted curly braces ("{}"), but they don't
seem to require quoting.

It is possible that it would be easier to just always quote the
values, which is what Postgres does when encoding its text protocol,
but this is a smaller change.
2023-05-16 07:02:55 -05:00
Evan Jones 8ceef73b84 pgtype.parseHstore: Reject invalid input; Fix error messages
The parseHstore function did not check the return value from
p.Consume() after a ', ' sequence. It expects a doublequote '"' that
starts the next key, but would accept any character. This means it
accepted invalid input such as:

    "key1"=>"b", ,key2"=>"value"

Add a unit test that covers this case
Fix a couple of the nearby error strings while looking at this.

Found by looking at staticcheck warnings:

    pgtype/hstore.go:434:6: this value of end is never used (SA4006)
    pgtype/hstore.go:434:6: this value of r is never used (SA4006)
2023-05-15 18:10:20 -05:00
Evan Jones d8b38b28be pgtype/hstore.go: Remove unused quoteHstore{Element,Replacer}
These are unused. The code uses quoteArrayElement instead.
2023-05-13 10:03:22 -05:00
Evan Jones 2a86501e86 Fix hstore NULL versus empty
When running queries with the hstore type registered, and with simple
mode queries, the scan implementation does not correctly distinguish
between NULL and empty. Fix the implementation and add a test to
verify this.
2023-05-13 09:34:30 -05:00
Jack Christensen f14fb3d692 Replace interface{} with any 2022-04-09 09:12:55 -05:00
Jack Christensen d13f651810 Finish importing pgio as internal package 2022-02-21 14:35:20 -06:00
Jack Christensen 9c538cd4a9 Remove actualTarget argument 2022-02-21 09:30:01 -06:00
Jack Christensen 1f2f239d09 Renamed pgtype.ConnInfo to pgtype.Map 2022-02-21 09:13:09 -06:00
Jack Christensen 5ed95dcd1c Expose wrap functions on ConnInfo
- Remove rarely used ScanPlan.Scan arguments
- Plus other refactorings and fixes that fell out of this change.
- Plus rows Scan now handles checking for changed type.
2022-01-22 17:50:19 -06:00
Jack Christensen a6863a7dd2 Convert Hstore to Codec 2022-01-15 17:47:37 -06:00
Jack Christensen fcc9dcc960 Convert text to Codec
This also entailed updating and deleting types that depended on Text.
2022-01-08 13:13:26 -06:00
Jack Christensen 44214b7854 Import to pgx main repo in pgtype subdir 2021-12-04 13:07:54 -06:00