Hi Team,
I deployed MemSQL on Azure Kubernetes and after running several queries successfully, I’ve noticed that the memory was not released and last query was failed:
Logs:
node 2020-09-02 14:09:35.786 WARN: Failed to free 131072 bytes of memory at address 0x7fafeade1000 to kernel. Error 12: Cannot allocate memory
node 2020-09-02 14:09:35.786 WARN: Failed to free 131072 bytes of memory at address 0x7fafeb5c1000 to kernel. Error 12: Cannot allocate memory
node 2020-09-02 14:09:35.786 WARN: Failed to free 131072 bytes of memory at address 0x7fafebd81000 to kernel. Error 12: Cannot allocate memory
node 2020-09-02 14:09:35.786 WARN: Failed to free 131072 bytes of memory at address 0x7fafec5a1000 to kernel. Error 12: Cannot allocate memory
node 2020-09-02 14:09:35.786 WARN: Failed to free 131072 bytes of memory at address 0x7fafecda1000 to kernel. Error 12: Cannot allocate memory
node 2020-09-02 14:44:22.576 WARN: [33 messages supressed Partition metadata is out of sync on table `testdb__validation_6`.`testtable`. Execution will continue but codegen will be slowed
node 2020-09-02 14:44:26.729 WARN: Failed to allocate 4096 bytes of memory from the operating system (Error 12: Cannot allocate memory). This is usually due to a misconfigured operating system or virtualization technology. See https://docs.memsql.com/troubleshooting/latest/memory-errors.
node 2020-09-02 14:44:26.729 ERROR: Failure to allocate IFN thunk page
node 2020-09-02 14:44:26.729 ERROR: Nonfatal buffer manager memory allocation failure.
node 2020-09-02 14:44:26.729 ERROR: Threads_cached : 139
node 2020-09-02 14:44:26.729 ERROR: Threads_connected : 172
node 2020-09-02 14:44:26.729 ERROR: Threads_created : 199
node 2020-09-02 14:44:26.729 ERROR: Threads_running : 1
node 2020-09-02 14:44:26.729 ERROR: Threads_background : 1
node 2020-09-02 14:44:26.729 ERROR: Threads_idle : 112
node 2020-09-02 14:44:26.729 ERROR: Ready_queue : 0
node 2020-09-02 14:44:26.729 ERROR: Idle_queue : 0
node 2020-09-02 14:44:26.729 ERROR: Context_switches : 1506
node 2020-09-02 14:44:26.729 ERROR: Context_switch_misses : 0
node 2020-09-02 14:44:26.729 ERROR: Columnstore_ingest_management_estimated_segments_to_flush : 0
node 2020-09-02 14:44:26.729 ERROR: Columnstore_ingest_management_estimated_memory : 0.000 MB
node 2020-09-02 14:44:26.729 ERROR: Threads_waiting_for_disk_space : 0
node 2020-09-02 14:44:26.729 ERROR: Total_server_memory : 27076.9 (+107.6) MB
node 2020-09-02 14:44:26.729 ERROR: Total_io_pool_memory : 7.9 MB
node 2020-09-02 14:44:26.729 ERROR: Free_io_pool_memory : 0.0 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_thread_stacks : 200.000 MB
node 2020-09-02 14:44:26.730 ERROR: Malloc_active_memory : 619.462 (+52.228) MB
node 2020-09-02 14:44:26.730 ERROR: Malloc_transaction_cached_memory : 267.883 MB
node 2020-09-02 14:44:26.730 ERROR: Linux_resident_memory : 22581.028 (+184.524) MB
node 2020-09-02 14:44:26.730 ERROR: Linux_resident_shared_memory : 111.500 (+5.000) MB
node 2020-09-02 14:44:26.730 ERROR: Buffer_manager_memory : 25756.8 MB
node 2020-09-02 14:44:26.730 ERROR: Buffer_manager_cached_memory : 23202.1 (-386.9) MB
node 2020-09-02 14:44:26.730 ERROR: Buffer_manager_unrecycled_memory : 7.9 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_skiplist_tower : 127.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable : 46.250 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_table_primary : 100.625 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_deleted_version : 106.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_internal_key_node : 37.625 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_hash_buckets : 185.034 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_table_metadata_cache : 1.250 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_code_generator : 404.271 (+404.271) MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_unit_images : 125.823 (+55.414) MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_unit_ifn_thunks : 1.715 (+0.008) MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_object_code_images : 31.072 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_compiled_unit_sections : 19.571 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_databases_list_entry : 1.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_plan_cache : 3.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_warnings : 14.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_replication : 1.125 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_sharding_partitions : 0.125 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_mmap_file : 80.000 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_protocol_packet : 21.375 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_large_incremental : 17.521 (+17.521) MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_table_autostats : 129.601 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_system_tasks : 0.125 (+0.125) MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_table_memory : 732.134 MB
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_16 : allocs:139991 alloc_MB:2.1 buffer_MB:3.4 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_24 : allocs:37920 alloc_MB:0.9 buffer_MB:1.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_32 : allocs:14096 alloc_MB:0.4 buffer_MB:2.2 cached_buffer_MB:1.5
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_40 : allocs:1301 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_48 : allocs:1758 alloc_MB:0.1 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_56 : allocs:273 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_64 : allocs:70 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_72 : allocs:2881 alloc_MB:0.2 buffer_MB:0.4 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_80 : allocs:820 alloc_MB:0.1 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_88 : allocs:210 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_104 : allocs:229 alloc_MB:0.0 buffer_MB:0.4 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_128 : allocs:2474 alloc_MB:0.3 buffer_MB:2.8 cached_buffer_MB:1.1
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_160 : allocs:1999 alloc_MB:0.3 buffer_MB:0.4 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_200 : allocs:58 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_248 : allocs:24 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_312 : allocs:1047 alloc_MB:0.3 buffer_MB:7.8 cached_buffer_MB:1.9
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_384 : allocs:7 alloc_MB:0.0 buffer_MB:2.1 cached_buffer_MB:1.9
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_480 : allocs:122 alloc_MB:0.1 buffer_MB:11.0 cached_buffer_MB:1.9
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_600 : allocs:20 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_752 : allocs:5 alloc_MB:0.0 buffer_MB:2.0 cached_buffer_MB:1.9
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_936 : allocs:12 alloc_MB:0.0 buffer_MB:0.6 cached_buffer_MB:0.5
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_1168 : allocs:6 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_1480 : allocs:6 alloc_MB:0.0 buffer_MB:1.5 cached_buffer_MB:1.4
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_1832 : allocs:6 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_2288 : allocs:0 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.1
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_2832 : allocs:28 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.730 ERROR: Alloc_variable_bucket_3528 : allocs:6 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_4504 : allocs:9 alloc_MB:0.0 buffer_MB:0.8 cached_buffer_MB:0.6
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_5680 : allocs:15 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_6224 : allocs:3 alloc_MB:0.0 buffer_MB:0.6 cached_buffer_MB:0.4
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_7264 : allocs:0 alloc_MB:0.0 buffer_MB:0.4 cached_buffer_MB:0.4
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_9344 : allocs:122 alloc_MB:1.1 buffer_MB:1.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_65472 : allocs:1 alloc_MB:0.1 buffer_MB:0.1 cached_buffer_MB:0.0
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_bucket_130960 : allocs:25 alloc_MB:3.1 buffer_MB:5.0 cached_buffer_MB:1.9
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_cached_buffers : 15.4 (+0.1) MB
node 2020-09-02 14:44:26.731 ERROR: Alloc_variable_allocated : 9.4 MB
node 2020-09-02 14:44:26.731 ERROR: GCed_versions_last_sweep : 0
node 2020-09-02 14:44:26.731 ERROR: Average_garbage_collection_duration : 14 ms
K8S agentpools:
- memsql - 8 cores and 32gb (master)
- memsqlleaf - 32 cores and 128gb (leaf)
‘MemsqlCluster’ K8S Deployment object:
kind: MemsqlCluster
metadata:
name: memsql-cluster
spec:
license: xyz
adminHashedPassword: "xyz"
nodeImage:
repository: memsql/node
tag: centos-7.1.7-27ea2acf75
redundancyLevel: 2
monitoringSpec:
memsqlPusherSpec:
enable: true
mode: Cluster
organizationName: cv
kafkaBootstrapServer: kafka.monitoring.svc.cluster.local:9092
kafkaProtocol: plaintext
pusherSecretName: pushersecret
serviceSpec:
objectMetaOverrides:
labels:
custom: label
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
aggregatorSpec:
count: 1
height: 0.8
storageGB: 32
storageClass: managed-premium
objectMetaOverrides:
annotations:
optional: annotation
labels:
optional: label
leafSpec:
count: 2
height: 3.6
storageGB: 1024
storageClass: managed-premium
objectMetaOverrides:
annotations:
optional: annotation
labels:
optional: label
schedulingDetails:
master:
nodeSelector:
agentpool: memsql
aggregator:
nodeSelector:
agentpool: memsql
leaf-ag1:
nodeSelector:
agentpool: memsqlleaf
leaf-ag2:
nodeSelector:
agentpool: memsqlleaf
Please advise,
Chen